Message boards :
Number crunching :
Lots of errors
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm starting to get a high number or short tasks that error out. Can someone explain why this is happening and how I can fix it? Have changed no settings. Here is the log from one of the tasks. WinXP SP3 dual 750Ti http://www.gpugrid.net/result.php?resultid=14202446 |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am getting errors too, but mine are with GERARD_EQUI WU's. Three had errors, two finished ok. It seems to be bad batch. https://www.gpugrid.net/result.php?resultid=14210451 895456x4-GERARD_EQUI_26Apr_CXCL-0-1-RND0321_4 Workunit 10949024 Created 26 May 2015 | 23:34:38 UTC Sent 26 May 2015 | 23:34:54 UTC Received 26 May 2015 | 23:50:42 UTC Server state Over Outcome Computation error Client state Compute error Exit status -97 (0xffffffffffffff9f) Unknown error number Computer ID 30790 Report deadline 31 May 2015 | 23:34:54 UTC Run time 87.09 CPU time 77.31 Validate state Invalid Credit 0.00 Application version Short runs (2-3 hours on fastest card) v8.47 (cuda65) Stderr output <core_client_version>7.4.42</core_client_version> <![CDATA[ <message> (unknown error) - exit code -97 (0xffffff9f) </message> <stderr_txt> # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2047MB # Capability : 3.0 # PCI ID : 0000:05:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2047MB # Capability : 3.0 # PCI ID : 0000:05:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2047MB # Capability : 3.0 # PCI ID : 0000:05:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU [GeForce GTX 690] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 690 # ECC : Disabled # Global mem : 2047MB # Capability : 3.0 # PCI ID : 0000:04:00.0 # Device clock : 1019MHz # Memory clock : 3004MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU 0 : 63C # GPU 1 : 73C # The simulation has become unstable. Terminating to avoid lock-up (1) |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm getting some nasty errors too, with the GERARD_EQUI_26Apr_CXCL tasks. They're causing major TDRs, which in turn then make the computer have hardware acceleration problems in other tasks (like web browsing, or gaming), and also cause driver problems where the clocks never go back to 3d-mode clocks. Admins: Please look into which batches need to be revoked, to prevent these problems. It's a major headache, for me at least. 1154144x3-GERARD_EQUI_26Apr_CXCL-0-1-RND9216_7 http://www.gpugrid.net/result.php?resultid=14210052 895456x5-GERARD_EQUI_26Apr_CXCL-0-1-RND9089_5 http://www.gpugrid.net/result.php?resultid=14210507 |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All my tasks are now erroring out. Suspending this project for now until this issue is resolved. |
|
Send message Joined: 12 Apr 15 Posts: 1 Credit: 49,381,475 RAC: 0 Level ![]() Scientific publications ![]()
|
I've actually been having issues with the Graphics drivers themselves crashing and windows having to recover. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here. Now have five GERARD_EQUI_26Apr_CXCL tasks crashed. |
|
Send message Joined: 26 Mar 14 Posts: 101 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Could you please post your errors in this thread? I will cancel the batch if they persist. Thanks for your patience... |
titoSend message Joined: 21 May 09 Posts: 22 Credit: 2,002,780,169 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
https://www.gpugrid.net/result.php?resultid=14210324 Short WU errored after 80sek at 750Ti. |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Could you please post your errors in this thread? I will cancel the batch if they persist. Thanks for your patience... https://www.gpugrid.net/result.php?resultid=14210504 |
|
Send message Joined: 26 Mar 14 Posts: 101 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
We detected an unexpected parameterization error in some of the simulations and we just cancelled them. Sorry for any inconvience caused and thank your for reporting it to us! If you find any other errors please do not hesitate to tell us (hopefully this particular issue is already resolved). |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Excellent. Thank you!! |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All short run tasks still failing here. Links to last 4 https://www.gpugrid.net/result.php?resultid=14213349 https://www.gpugrid.net/result.php?resultid=14211957 https://www.gpugrid.net/result.php?resultid=14211914 https://www.gpugrid.net/result.php?resultid=14211712 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
nanoprobe: What is the exact make/model of your GPU? Do the tasks still fail when the Boost clock is set to the reference clock? My hunch is that your GPU is overclocked too much, either by the factory or by you. "The simulation has become unstable. Terminating to avoid lock-up" ... generally means that you are overclocking too much, or have a hardware problem... from my experience. |
|
Send message Joined: 20 Jul 14 Posts: 732 Credit: 130,089,082 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Excellent. Thank you!! +1 :) [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
nanoprobe: Cards are PNY 750Ti. No factory O/C. No six pin PCI-E power plugs. 60Watt load @99%. They've been running stock out of the box since I bought them and I've been running the short tasks on these cards since I got them and have never had the failure rate I've been experiencing lately. If it was one card producing all/most of the errors then I would suspect the card but the tasks are failing on both cards. |
|
Send message Joined: 17 Jan 09 Posts: 2 Credit: 22,278,452 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
1232906x8-GERARD_EQUI_26Apr_CXCL-0-1-RND1418_4 causes a lot of crash of gpu drivers. Stopped! |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
nanoprobe: Can you supply the exact model of the GPU, to confirm that it's not factory-overclocked? Alternatively, could you use GPU-Z to confirm that the GPU Clock and Default Clock say 1020 MHz (which is the stock speed of a GTX 750 Ti, per http://en.wikipedia.org/wiki/GeForce_700_series) If it's anything above 1020, then it is in fact overclocked, and I recommend using EVGA Precision X to downclock it back to reference 1020 MHz, to see if it helps. I'm getting frustrated trying to help by offering advice that gets ignored. |
|
Send message Joined: 26 Feb 12 Posts: 184 Credit: 222,376,233 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
nanoprobe: WOW! Let me offer you some advise. If it doesn't concern life, death or health then is surely isn't worth getting frustrated over. FWIW the issue seems to have cleared up. The faulty WUs have been taken care of. Thanks for your help. |
|
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have had over 20 WUs fail...on my GTX 660 Ti devices. I will stop gettings tasks and now go to bed.... |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
nanoprobe: There were some faulty WUs, but they have nothing to do with tasks erroring out with "Simulation has become unstable." messages and no other error messages. Errors like yours are usuall a result of overclocking too much. Please keep my advice (lower clocks to reference clocks) in mind, the next time you try to troubleshoot those errors. Good luck, Jacob |
©2026 Universitat Pompeu Fabra