Message boards :
Number crunching :
Too many errors (may have bug)
Message board moderation
| Author | Message |
|---|---|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My task, g240-TONI_CAPBIND99SB-48-200-RND7652 and Other TONI WU’s have the same bug. If anyone gets this TONI failure, restart your computer. It looks like the error is influencing other tasks! My system, AMD 64 X2 Dual Core Processor 5200+ [Family 15 Model 67 Stepping 2] (2 processors) NVIDIA GeForce GTX 470 (1279MB) driver: 25896 Microsoft Windows XP Professional x86 Edition, Service Pack 3, (05.01.2600.00) TONI task failed after 3sec (similar failures from other crunchers) My next work unit behaved strangely; Checked in on it when it was about ¾ the way through its run. The GPU card temperature was 53deg C and the task was running at 83%. This just does not add up. On a Fermi a task running at 83% would have the GPU over 70deg C. Rebooted the system and the task is now running at 98% and the temp is just over 70deg C (with the fan turned up). Ton, thanks for the warning. |
|
Send message Joined: 4 Jun 08 Posts: 4 Credit: 5,174,815 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I aborted 2 in the last week that seemed to run forever. |
|
Send message Joined: 16 Apr 09 Posts: 163 Credit: 921,733,849 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have been seeing a raft of computation errors on several systems - mostly running Windows XP and 9800GT cards. This seems to be a revisit of a problem which plagued the GPUGrid applications a year ago for me. Since it has affected ALL of my systems running that combination, and I have NOT encountered similar problems with three other BOINC projects which utilize the same GPU on the same systems (SETI, Dnetc, Collatz), for now, rather than simply 'shoulder shrug' and try, try again, I am backing off of GPUGrid for now. I would hope that there would be some responsive comment on this out here, but historically, what I have seen here is either denial (it must be my hardware or software -- even though other BOINC GPU apps complete properly), or non-response. Then, perhaps in a week or two, I will try again, and some unacknowleged change will have been made and all will be well again. |
|
Send message Joined: 13 Jul 09 Posts: 64 Credit: 2,922,790,120 RAC: 80 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm getting over a 50% error rate across a several quads. The majority of the cards are GTS-250s, but there is a GTX-275 and a couple 8800GTs thrown in also. Same machines also do either DNETC or Collatz w/o issues. One of the worst is crunch30. crunch35 has both a GTS-250 and a GTX-275. Can anyone provide any insight to this? UPDATE: Worse than I thought... 7 pages of valid results, 18 pages of errors. - da shu @ HeliOS, "A child's exposure to technology should never be predicated on an ability to afford it." |
|
Send message Joined: 16 Apr 09 Posts: 163 Credit: 921,733,849 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As I noted before -- the normal scenario here is when problems like this crop up -- there is a real limit to the amount of response we can expect... I'm getting over a 50% error rate across a several quads. The majority of the cards are GTS-250s, but there is a GTX-275 and a couple 8800GTs thrown in also. Same machines also do either DNETC or Collatz w/o issues. |
|
Send message Joined: 4 Jun 08 Posts: 4 Credit: 5,174,815 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I just aborted another one that ran on to long. Between the probs others are having and the ones I am experiencing I am pulling out until the bad WU's clear the catch. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Bill, you aborted a long task about 60% through its run. All your tasks are running well on both systems. |
|
Send message Joined: 4 Jun 08 Posts: 4 Credit: 5,174,815 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Sorry, I already moved off, be back later. Trying to run SETI since 1999 has made me gun shy (crazy). So many projects so little time. Thanks for the response, I generally get criticism on other sites so I don't post. |
SaengerSend message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
I got an error on a TONI_CAPBIND as well, like many others. I can't see the original error in this thread any longer, as the WU is already purged, but here is my stderr: <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 98 (0x62, -158) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GT 240" # Clock rate: 1.34 GHz # Total amount of global memory: 536150016 bytes # Number of multiprocessors: 12 # Number of cores: 96 MDIO ERROR: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: file mdioload.cpp line 80: Unable to read bincoordfile 14:58:03 (10049): called boinc_finish </stderr_txt> ]]> It failed after 2 seconds, so no real harm done, except that now a CASHIF_HIVPR is running, so far without problems. I hope it won't be affected, as I can't restart the puter without thrashing a checkpointless RNA-world WU after 24h, and I don't want to do that ;) If this is a problem with the work generator (some input file not generated properly) perhaps it should be looked into somehow. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki |
Fred J. VersterSend message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Found some errors <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.57 GHz # Total amount of global memory: 523829248 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel [mshake_position_kernel_1] [2] [10,1,1][64,1,1] Assertion failed: 0, file swanlib_nv.cpp, line 281 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> g199r2-TONI_KKi4-2-200-RND6081_0 . x199y2-TONI_KKi4- 2-200-RND6081_0 Knight Who Says Ni N! |
©2025 Universitat Pompeu Fabra