Message boards :
Number crunching :
Crashes running gerard wus
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 22 Nov 09 Posts: 114 Credit: 589,114,683 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I tried to run several, that were not the "very long" wus over the weekend. Each of them were Gerard wus. I do not know whether this is a bug or not, but each one of the three I ran erred out on my GTX460 causing my PC to crash, and the last one somehow deleted the GPUGRID project from my list of active projects such that when I rejoined with this machine, it reset the project and "abandoned" another Gerard. Here are the links for the wus that crashed: http://www.gpugrid.net/result.php?resultid=13828124 http://www.gpugrid.net/result.php?resultid=13826449 http://www.gpugrid.net/result.php?resultid=13826422 I note that 13828124 also erred on a Titan Black. Thoughts / comments? |
|
Send message Joined: 4 Jul 09 Posts: 76 Credit: 114,610,402 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had one that was marked Invalid for me an Error while Computing for another. Is now active on a third. http://www.gpugrid.net/workunit.php?wuid=10656205 |
|
Send message Joined: 17 Mar 10 Posts: 23 Credit: 1,173,824,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I too see a very high failure rate with GERARD_CXCL12 tasks (something like 70% fails). Other tasks run fine (well, there is the occasional failure with NOELIA tasks...). I also had one GERARD_CXCL12 task stuck for some 40 hours or so. This is clearly a very unstable batch. Edit: to be precise: the GERARD_BENTRYP tasks appear to be OK, this only concerns GERARD_CXCL12 tasks. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I don't think there is anything wrong with the GERARD_CXCL12 work units, they just push the card hard. On four GTX 750 Ti's (on two machines), I have had 18 successes and 0 failures. On one GTX 660 Ti, I have had 8 successes and 0 failures. On one GTX 660, I have had 12 successes and 1 failure. And the one that failed completed successfully on another machine. All of these cards run very cool, but on the GTX 660 I have not bothered to increase the power limit above 100% using Nvidia Inspector as I usually do. I see that the power routinely bumps up against this limit, which is a sure sign that the GERARD_CXCL12 work units are pushing it hard (all results under Win7 64-bit). So my usual advice applies: don't overclock your cards, etc., etc. |
|
Send message Joined: 17 Mar 10 Posts: 23 Credit: 1,173,824,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My card runs at 79C, well within the tolerance limits. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My card runs at 79C, well within the tolerance limits. It is not just temperature. Overclocking (even factory overclocking) can, and frequently does cause errors. Also, if you bump up against the power limit, the card will automatically limit the current to the GPU, in order to protect it from excessive temperature. That limit can also cause errors, since the GPU is not getting the current it needs to run at full speed. Your choices then are to increase the power limit (if the card is not running too hot), or reduce the clock. Usually reducing the GPU clock will eliminate the errors, but sometimes the memory clock needs to be reduced as well. By the way, sometimes it is all the above. Each card is different. |
|
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For the GERARD_CXCL12 tasks on my machines: 32 successes on 2-GTX980's (no failures) 18 suceesses on GTX780Ti (no failures) |
|
Send message Joined: 20 Jul 14 Posts: 732 Credit: 130,089,082 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GERARD_CXCL12_3GG_CGENFF2 4 OK (no failure) GERARD_CXCL12_3GG_CGENFF3 4 OK (no failure) GERARD_CXCL12_3GG_FX_LigAssay21 4 OK (no failure) GERARD_CXCL12_LIG4_CGENFF2 4 OK (no failure) I don't think there is anything wrong with the GERARD_CXCL12 work units, they just push the card hard. +100 It is not just temperature. Overclocking (even factory overclocking) can, and frequently does cause errors. +100 [CSF] Thomas H.V. Dupont Founder of the team CRUNCHERS SANS FRONTIERES 2.0 www.crunchersansfrontieres |
|
Send message Joined: 22 Nov 12 Posts: 72 Credit: 14,040,706,346 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
These all failed immediately: http://www.gpugrid.net/result.php?resultid=13853350 http://www.gpugrid.net/result.php?resultid=13853243 http://www.gpugrid.net/result.php?resultid=13853233 http://www.gpugrid.net/result.php?resultid=13849021 http://www.gpugrid.net/result.php?resultid=13848998 http://www.gpugrid.net/result.php?resultid=13848977 http://www.gpugrid.net/result.php?resultid=13848871 Any ideas? |
|
Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Likewise been having problems with GERARD_CXCL12 (XP x86, driver: 344.65). Error SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1965. Going by this thread looking like Gerard is stressing the cards hard - going to try down clocking |
©2025 Universitat Pompeu Fabra