Message boards :
Number crunching :
New NOELIA Longruns
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 3 Oct 11 Posts: 100 Credit: 5,879,292,399 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All new NOELIA longrun tasks errored out on my pc's immediately after start. Other tasks run ok. EDIT: CUDA31 tasks run ok, CUDA42 not. |
|
Send message Joined: 27 Nov 11 Posts: 11 Credit: 1,021,749,297 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All new NOELIA longrun tasks errored out on my pc's immediately after start. Me too, exactly the same. All NOELIA's fail so far. Both CUDA31 & CUDA42 NOELIA'a All other WU's CUDA42 work fine. GTX580, win7 x64 |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What is wrong with these units? They have all crashed. http://www.gpugrid.net/results.php?hostid=127986&offset=0&show_names=1&state=0&appid= |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here, I've had 3 error out, 2 on a GTX 670 and 1 on a GTX 560 with another queued right now.
|
|
Send message Joined: 19 Jan 11 Posts: 13 Credit: 294,225,579 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So for, 3 out of 3 failed on a GTX 275, Windows XP Pro 32, cuda31 tasks http://www.gpugrid.net/results.php?hostid=124381 run1_replica3-NOELIA_sh2fragment_run-0-4-RND4005_1 run4_replica1-NOELIA_sh2fragment_run-0-4-RND5679_2 run1_replica48-NOELIA_sh2fragment_run-0-4-RND0084_0 |
|
Send message Joined: 5 Jun 09 Posts: 38 Credit: 2,880,758,878 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Error in all NOELIA's tasks on GTX 580 and GTX 570. run3_replica47-NOELIA_sh2fragment_run-0-4-RND8455_4 3597875 117522 26 Jul 2012 | 1:11:26 UTC 26 Jul 2012 | 1:17:34 UTC Error while computing 10.10 2.07 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run2_replica33-NOELIA_sh2fragment_run-0-4-RND3214_3 3597817 117522 26 Jul 2012 | 0:59:06 UTC 26 Jul 2012 | 1:05:16 UTC Error while computing 10.11 1.92 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run1_replica34-NOELIA_sh2fragment_run-0-4-RND4425_3 3597770 117522 26 Jul 2012 | 0:16:07 UTC 26 Jul 2012 | 0:22:12 UTC Error while computing 10.07 1.83 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run7_replica15-NOELIA_sh2fragment_run-0-4-RND0291_4 3598026 117522 26 Jul 2012 | 0:46:39 UTC 26 Jul 2012 | 0:52:50 UTC Error while computing 10.13 1.67 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run6_replica15-NOELIA_sh2fragment_run-0-4-RND1911_4 3597982 117522 26 Jul 2012 | 0:09:58 UTC 26 Jul 2012 | 0:16:07 UTC Error while computing 10.08 1.84 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run6_replica14-NOELIA_sh2fragment_run-0-4-RND6550_3 3597981 117522 26 Jul 2012 | 0:34:26 UTC 26 Jul 2012 | 0:40:31 UTC Error while computing 11.07 2.07 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run7_replica46-NOELIA_sh2fragment_run-0-4-RND8233_1 3598055 117522 26 Jul 2012 | 0:04:58 UTC 26 Jul 2012 | 0:09:58 UTC Error while computing 10.06 2.25 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run1_replica50-NOELIA_sh2fragment_run-0-4-RND3470_4 3597789 117522 26 Jul 2012 | 1:05:16 UTC 26 Jul 2012 | 1:11:26 UTC Error while computing 10.07 1.92 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run6_replica6-NOELIA_sh2fragment_run-0-4-RND3986_2 3598016 117522 26 Jul 2012 | 0:22:12 UTC 26 Jul 2012 | 0:28:21 UTC Error while computing 10.32 1.78 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run2_replica17-NOELIA_sh2fragment_run-0-4-RND4720_4 3597800 117522 26 Jul 2012 | 0:52:50 UTC 26 Jul 2012 | 0:59:06 UTC Error while computing 10.06 1.87 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run7_replica19-NOELIA_sh2fragment_run-0-4-RND9229_1 3598030 117522 26 Jul 2012 | 0:28:21 UTC 26 Jul 2012 | 0:34:26 UTC Error while computing 10.09 1.44 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run9_replica39-NOELIA_sh2fragment_run-0-4-RND8136_2 3598123 117522 26 Jul 2012 | 0:40:31 UTC 26 Jul 2012 | 0:46:39 UTC Error while computing 10.07 1.89 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run5_replica8-NOELIA_sh2fragment_run-0-4-RND4219_1 3597974 117522 25 Jul 2012 | 22:11:24 UTC 26 Jul 2012 | 0:04:58 UTC Error while computing 10.30 2.00 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) run7_replica38-NOELIA_sh2fragment_run-0-4-RND5635_1 3597691 101457 25 Jul 2012 | 21:55:52 UTC 26 Jul 2012 | 0:27:04 UTC Error while computing 15.40 1.72 --- Long runs (8-12 hours on fastest card) v6.16 (cuda42) |
[PUGLIA] kidkidkid3Send message Joined: 23 Feb 11 Posts: 101 Credit: 1,589,743,957 RAC: 360 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi Noelia, same error (twice) also for me in http://www.gpugrid.net/result.php?resultid=5664840 http://www.gpugrid.net/result.php?resultid=5664472 <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> I'll stop or cancel your WU. k. Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King) |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Just had a look at my tasks. Looks like all the sh2_fragment long work units are failing for everybody, not just me. Obviously a bad batch of work units seeing as everybody are failing them. I have sent her a PM so hopefully they will sort things out soon. BOINC blog |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
These NOELIA_sh2fragment units are all crashing with the same error message: <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> MDIO: cannot open file "restart.coor" ERROR: file deven.cpp line 1106: # Energies have become nan called boinc_finish </stderr_txt> ]]> Isn't it time to cancel this batch of units already? |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just completed the first NOELIA_sh2fragment unit successfully. See link below: http://www.gpugrid.net/workunit.php?wuid=3601869 Whatever you did to fix the bug, worked. I have 2 more such units still crunching. Hopefully, they will be successful too. |
SMTB1963Send message Joined: 27 Jun 10 Posts: 38 Credit: 524,420,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Whatever you did to fix the bug, worked. Me too! |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just had a failure after 7hrs of a NOELIA run9 replica21 task on a reliable (up to now) card. The card is now working ok on a PAOLA task. <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> MDIO: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1574. acemd.2562.x64.cuda42: swanlibnv2.cpp:59: void swan_assert(int): Assertion `a' failed. SIGABRT: abort called Stack trace (15 frames): ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42(boinc_catch_signal+0x4d)[0x551f6d] /lib/x86_64-linux-gnu/libc.so.6(+0x364c0)[0x7fa96d2cf4c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fa96d2cf445] /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fa96d2d2bab] /lib/x86_64-linux-gnu/libc.so.6(+0x2f10e)[0x7fa96d2c810e] /lib/x86_64-linux-gnu/libc.so.6(+0x2f1b2)[0x7fa96d2c81b2] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42[0x482916] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42[0x4848da] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42[0x44d4bd] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42[0x44e54c] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42[0x41ec14] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42(sin+0xb6c)[0x407d6c] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42(sin+0x256)[0x407456] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fa96d2ba76d] ../../projects/www.gpugrid.net/acemd.2562.x64.cuda42(sinh+0x49)[0x4072f9] Exiting... </stderr_txt> |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just completed the first NOELIA_sh2fragment unit successfully. See link below: Two more of these units completed successfully. See links below: http://www.gpugrid.net/workunit.php?wuid=3601862 http://www.gpugrid.net/workunit.php?wuid=3601963 Though one took about 16 hours to complete, while the other two took about 9 to 10 hours. |
|
Send message Joined: 3 Oct 11 Posts: 100 Credit: 5,879,292,399 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just had a failure after 7hrs of a NOELIA run9 replica21 task on a reliable (up to now) card. The card is now working ok on a PAOLA task. Exactly the same on my GTX580 under Linux. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
and I've another failed after 7 hrs, as it did before me. Considering aborting all NOELLA tasks now :-( http://www.gpugrid.net/workunit.php?wuid=3601979 |
rittermSend message Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Considering aborting all NOELLA tasks now :-( Me, too...I thought maybe I was okay after this one finished successfully: run10_replica21-NOELIA_sh2fragment_fixed-0-4-RND7749_0 But then the one I had queued up next crashed after 14-hours plus: run9_replica1-NOELIA_sh2fragment_fixed-0-4-RND6355_1 Both had the "MDIO: cannot open file 'restart.coor'" message in the stderr output. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Both had the "MDIO: cannot open file 'restart.coor'" message in the stderr output. This is a false error message. It appears in every task, even in the successful ones. BTW these "fixed" NOELIA tasks are running fine on all of my hosts. Perhaps you should lower your GPU clock (calibrate your overclocking settings to the more demanding CUDA 4.2 tasks, because these extra long tasks are even more demanding). |
rittermSend message Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Perhaps you should lower your GPU clock (calibrate your overclocking settings to the more demanding CUDA 4.2 tasks, because these extra long tasks are even more demanding). Even if I'm running at stock speeds? Other than two NOELIA's, I've had few, if any, comp errors with this card that weren't attributable to "bad" tasks. |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So far all the sh2fragment_fixed have been working on my two GTX670's. Make sure the work units have "fixed" in their name, otherwise they are probably the bad ones we already know about. They vary a bit in size, but have been taking around 8 hours (which is what the old cuda 3.1 long wu were taking before). BOINC blog |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Perhaps you should lower your GPU clock (calibrate your overclocking settings to the more demanding CUDA 4.2 tasks, because these extra long tasks are even more demanding). Yes. But if you are running your cards at stock speeds, I'd rather try to increase the GPU core voltage by 25mV (if your GPU temperatures allows to do so). |
©2025 Universitat Pompeu Fabra