Message boards :
Number crunching :
ADRIA WUs *still* have serious problems
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
I sure wish BOINC allowed for inclusion of screen shots! 8/16/2022 11:27:13 PM | GPUGRID | Starting task 2-ADRIA_FS_RNAfmnrb_EFL6_2-1-2-RND3133_1 from the event log (I have saved screen shots to document this) date time progress elapsed remaining 8/18 10:01 A [b]57.799%[/b] 1d 10:34:05 1d 01:14:17 8/18 10:56 A [b]57.799%[/b] 1d 11:29:30 1d 01:54:45 8/18 11:00 A [b]57.799%[/b] 1d 11:32:26 1d 01:56:54 ...At this point, I suspended the WU for 1hr and 20 min. then restarted ...the BOINC 'elapsed' timer mysteriously 'shortened' the actual run time, 8/18 12:19 P [b]57.799%[/b] 23:01:10 1d 14:28:10 by 12 hr, 31min LUCKILY, it does seem this tactic of pausing and restarting will save the WU, https://gpugrid.net/result.php?resultid=33001055 and it is slated to finish after 1d 15:58 of run time on a GTX 1660-Ti YET this faulty WU wasted 12 1/2 hours of GPU time PLEASE fix the problem(s) with ADRIA tasks LLP, PhD, Prof Engr |
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
what does "Detected memory leaks!" mean?? https://gpugrid.net/result.php?resultid=33001055 Stderr output <core_client_version>7.16.20</core_client_version> <![CDATA[ <stderr_txt> 23:27:19 (7328): wrapper (7.9.26016): starting 23:27:19 (7328): wrapper: running bin/acemd3.exe (--boinc --device 0) Detected memory leaks! Dumping objects -> |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
what does "Detected memory leaks!" mean?? Read the FAQs https://gpugrid.net/forum_thread.php?id=5272
|
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
Thanks for your reply, but "Please ignore the message. ..." is a ludicrous answer by GPUGrid. "Such messages are always present in Windows" I'm not sure what "Such messages" is supposed to mean, but with over 40 years of working with computers, a Masters and a PhD and being a registered PE (a licensed Professional Engineer) ... I have never seen any other "such messages" "It's completely harmless.... not related to successful" well, there was no other error message in the task output, yet the task 'stalled' and wasted over 12 hours of GPU time on a not-that-bad GPU (NVidia GTX 1660 Ti) If the GPUGrid project is willing to ask for and accept the in-kind donations of people's GPU time, then GPUGrid has an obligation to do what they can to resolve problematic tasks and code |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
Thanks for your reply, but "Please ignore the message. ..." is a ludicrous answer by GPUGrid. I'm not sure what you're on about. you've only completed this one single task. and it was completed successfully. and you received credit for it. what do you mean by "wasted over 12 hours of GPU time"? these tasks are VERY long running. 12hrs seems normal for that relatively weak GPU. and the ACEMD3 tasks can vary in length depending on what it's doing. there was a time when they only took 20mins, and a time where it took 24hrs. just depends on the work so what's the problem exactly? it looks like your complaining about a valid/successful task. i see nothing wrong with this task.
|
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
please read Message 59133 - Posted: 19 Aug 2022 | 6:18:01 UTC Quite clearly, you have not. you've only completed this one single task Wow. So, I have a total "Credit: 11,845,453" by having completed one single task. Amazing. Indeed, I used to give high preference to GPUGrid among GPU projects because I felt its scientific merits deserved it, despite this project giving far, far less credit per GPU hour than a number of other projects (e.g., PrimeGrid, SRBase) what do you mean by "wasted over 12 hours of GPU time"? please read Message 59133 - Posted: 19 Aug 2022 | 6:18:01 UTC the first post in this thread. LLP, PhD, Prof. Engr. |
|
Send message Joined: 14 Feb 20 Posts: 16 Credit: 27,395,983 RAC: 0 Level ![]() Scientific publications
|
date _ time _ progress _ elapsed _ remaining 8/18 10:01 A 57.799% 1d 10:34:05 1d 01:14:17 8/18 10:56 A 57.799% 1d 11:29:30 1d 01:54:45 8/18 11:00 A 57.799% 1d 11:32:26 1d 01:56:54 ...At this point, I suspended the WU for 1hr and 20 min. then restarted ...the BOINC 'elapsed' timer mysteriously 'shortened' the actual run time, 8/18 12:19 P 57.799% 23:01:10 1d 14:28:10 by 12 hr, 31min The above is from the event log (I have saved screen shots to document this) Thus, this WU was 'hung up' for who knows how long. At the very minimum, this WU wasted over 12 1/2 hours of GPU time GPUGrid admins: 1 PLEASE fix all ongoing problem(s) with GPUGrid tasks 2 PLEASE use the Notices tab in BIONC Manager to communicate info (or direct link to such info) regarding needed patches, mods, etc for GPUGird WUs to run properly Thank you |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
It's clear that the spamming of your credentials hasn't aided in your critical thinking ability. It should be very clear that I was referencing RECENTLY completed tasks. Or rather I should say the singular "task". Not sure how you can extrapolate ONE issue on ONE system to be an endemic problem with all ADRIA tasks. Wow. since this has not come up as a wide spread issue, it's much more likely to be an issue with your system and nothing wrong with the tasks. I have completed thousands of these tasks with this application and earned billions of credits from them. and not once has this happened. ACEMD3 tasks (of various campaigns, ADRIA or otherwise), are particularly stressful for the GPU as compared to many other projects, and stress areas of the GPU that other projects might not. and it's not uncommon to have driver crashes in Windows as a result of that stress. when the driver crashes in windows and tries to recover, I could see that hanging up a GPU task in BOINC. whenever a task is suspended and resumed in BOINC, that triggers it to restart from the last checkpoint (which is why the timer reset). you should up date your drivers (looks like they aren't recent), update Windows, and verify your system is clean of dust or other issues that might cause thermal issues like bad airflow to the GPU. and finally could be a faulty GPU or power problem or some other hardware issue with the system. try less condescension and outrage, and more big picture thinking and problem solving that Engineers are known for. NASA Engineer.
|
©2025 Universitat Pompeu Fabra