Message boards :
Graphics cards (GPUs) :
Error units - Noelia
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Noelia, welcome aboard. Been looking for your wus, finally got some. Unfortunately, I just had 11 Noelia wu's crash after about 13 secs each with a computational error. Entries from my event log for one wu are below 7/25/2012 9:03:49 PM | GPUGRID | Starting task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 using acemdlong version 616 (cuda42) in slot 6 7/25/2012 9:04:04 PM | GPUGRID | Computation for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 finished 7/25/2012 9:04:04 PM | GPUGRID | Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_1 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent 7/25/2012 9:04:04 PM | GPUGRID | Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_2 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent 7/25/2012 9:04:04 PM | GPUGRID | Output file run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2_3 for task run2_replica29-NOELIA_sh2fragment_run-0-4-RND8749_2 absent Wu's are of this variety: run9_replica7-NOELIA_sh2fragment_run-0-4-RND8072_2 Workunit 3598139 Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> Thought you'd want to know - let me know if I should have forwarded other details. Edit: Win7 64, 2x560ti, AMD FX 6100 6 core @ 3.3 GHZ, 850 watt psu |
|
Send message Joined: 23 Nov 10 Posts: 14 Credit: 8,017,535,732 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hmm, I've experience the same error with two back-to-back NOELIA WUs. My PC finished a PAOLA WU just before the NOELIA WUs with no error. For now, I'll suspend the GPUGRID project until something is posted about this... |
[PUGLIA] kidkidkid3Send message Joined: 23 Feb 11 Posts: 101 Credit: 1,589,749,957 RAC: 876 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi Noelia, same error (twice) also for me in http://www.gpugrid.net/result.php?resultid=5664840 http://www.gpugrid.net/result.php?resultid=5664472 <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> I'll stop or cancel your WU until something is posted about this error. k. Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King) |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ditto. All run2 WUs if I'm not mistaken and all failing on all other sent hosts. Definitely a problem with these. |
|
Send message Joined: 15 Apr 10 Posts: 123 Credit: 1,004,473,861 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
yep some errors, also gpu utilization is pretty low, currently at 79% on my GTX 470 task rundig8_run5-NOELIA_smd2-1-5-RND4856_0 using acemdlong version 616 (cuda42) |
|
Send message Joined: 5 Jul 12 Posts: 35 Credit: 393,375 RAC: 0 Level ![]() Scientific publications
|
Hi guys, I apologize for this inconvenience. It is the first time I run the system after doing the equilibration phase in acemdbeta (the first step commented on this other thread: http://www.gpugrid.org/forum_thread.php?id=3088 ), and works quite differently as when we run it locally, so that's why all the simulations where crashing within a few seconds. Now the procedure is automatized and this should not be a problem in the future when running this way. Thank you for you time :) |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Shouldn't these workunits processed by the 6.47 beta client? |
|
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just had another one recently. It was sent about 5 or so hours ago. I hope these are out of the system now, because I've produced 53 errors on these tasks. Just seems like a lot of wasted bandwidth on my end and on yours. I understand things happen, but please take these out of the hopper if you have not done so already. Cheers |
|
Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Started processing one replacement Noelia wu. run10_replica37-NOELIA_sh2fragment_fixed-0-4-RND1582_0 It is only 10% complete after about 90 mins. At start-up the wu projected 9:36 to complete but is now on track for a bit over 14 hours, and prolly significantly more. http://www.gpugrid.net/workunit.php?wuid=3601656 This wu will never qualify for max bonus bc there just isn't enough time to process and return within 24 hours. Same problem that Nathan wus had back in Feb/March if I remember correctly. I'll let this wu run another couple of hours, see how it is tracking, then update this post. In the meanwhile, may I suggest you visit with Nathan on proc time as he has lived through this before & was able to adjust the wu's so proc returned to "8-12 hours on fastest cards." On my 560 ti's his wu's typically take about 8 hours to crunch and another 10-12 mins to upload. Thank you! |
|
Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Edit: after 4.5 hours, still on track to finish in a bit over 14 hours |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'll let this wu run another couple of hours, see how it is tracking, then update this post. After the release of the GTX 6xx series, I wouldn't consider a GTX 560 Ti as one of "the fastest cards". Besides the GTX 560 Ti 448, it has only 256 usable shaders (by the GPUGrid client) because it is a CC2.1 card (while the Ti 448 'limited edition' is a CC2.0 card, so all of it's shaders can be used by the GPUGrid client). At the moment the fastest cards are: GTX 690, 680, 670, 590, 580, 570, 480, 470, 560 Ti 448, 465 |
|
Send message Joined: 19 Jun 12 Posts: 11 Credit: 51,704,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
No argument on which cards are currently fastest out there. Nevertheless, the 560ti can comfortably finish all existing tasks in 8-12 hours with the exception of this latest group from Noelia. That's all I meant. |
©2025 Universitat Pompeu Fabra