Crashes running gerard wus

Message boards : Number crunching : Crashes running gerard wus
Message board moderation

To post messages, you must log in.

AuthorMessage
wiyosaya

Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40183 - Posted: 16 Feb 2015, 19:07:48 UTC
Last modified: 16 Feb 2015, 19:08:13 UTC

I tried to run several, that were not the "very long" wus over the weekend. Each of them were Gerard wus. I do not know whether this is a bug or not, but each one of the three I ran erred out on my GTX460 causing my PC to crash, and the last one somehow deleted the GPUGRID project from my list of active projects such that when I rejoined with this machine, it reset the project and "abandoned" another Gerard.

Here are the links for the wus that crashed:
http://www.gpugrid.net/result.php?resultid=13828124
http://www.gpugrid.net/result.php?resultid=13826449
http://www.gpugrid.net/result.php?resultid=13826422

I note that 13828124 also erred on a Titan Black.

Thoughts / comments?
ID: 40183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
poppageek
Avatar

Send message
Joined: 4 Jul 09
Posts: 76
Credit: 114,610,402
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40188 - Posted: 16 Feb 2015, 20:33:09 UTC

I had one that was marked Invalid for me an Error while Computing for another. Is now active on a third.

http://www.gpugrid.net/workunit.php?wuid=10656205
ID: 40188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Mar 10
Posts: 23
Credit: 1,173,824,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40196 - Posted: 17 Feb 2015, 16:06:41 UTC
Last modified: 17 Feb 2015, 16:10:02 UTC

I too see a very high failure rate with GERARD_CXCL12 tasks (something like 70% fails). Other tasks run fine (well, there is the occasional failure with NOELIA tasks...). I also had one GERARD_CXCL12 task stuck for some 40 hours or so. This is clearly a very unstable batch.

Edit: to be precise: the GERARD_BENTRYP tasks appear to be OK, this only concerns GERARD_CXCL12 tasks.
ID: 40196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40198 - Posted: 17 Feb 2015, 18:24:49 UTC
Last modified: 17 Feb 2015, 18:26:40 UTC

I don't think there is anything wrong with the GERARD_CXCL12 work units, they just push the card hard.

On four GTX 750 Ti's (on two machines), I have had 18 successes and 0 failures.

On one GTX 660 Ti, I have had 8 successes and 0 failures.

On one GTX 660, I have had 12 successes and 1 failure. And the one that failed completed successfully on another machine.

All of these cards run very cool, but on the GTX 660 I have not bothered to increase the power limit above 100% using Nvidia Inspector as I usually do. I see that the power routinely bumps up against this limit, which is a sure sign that the GERARD_CXCL12 work units are pushing it hard (all results under Win7 64-bit).

So my usual advice applies: don't overclock your cards, etc., etc.
ID: 40198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pvh

Send message
Joined: 17 Mar 10
Posts: 23
Credit: 1,173,824,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40199 - Posted: 17 Feb 2015, 20:02:24 UTC

My card runs at 79C, well within the tolerance limits.
ID: 40199 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40200 - Posted: 17 Feb 2015, 20:11:56 UTC - in response to Message 40199.  
Last modified: 17 Feb 2015, 20:18:42 UTC

My card runs at 79C, well within the tolerance limits.

It is not just temperature. Overclocking (even factory overclocking) can, and frequently does cause errors.

Also, if you bump up against the power limit, the card will automatically limit the current to the GPU, in order to protect it from excessive temperature. That limit can also cause errors, since the GPU is not getting the current it needs to run at full speed. Your choices then are to increase the power limit (if the card is not running too hot), or reduce the clock. Usually reducing the GPU clock will eliminate the errors, but sometimes the memory clock needs to be reduced as well.

By the way, sometimes it is all the above. Each card is different.
ID: 40200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40202 - Posted: 17 Feb 2015, 21:51:05 UTC

For the GERARD_CXCL12 tasks on my machines:

32 successes on 2-GTX980's (no failures)
18 suceesses on GTX780Ti (no failures)
ID: 40202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Thomas H.V. DUPONT

Send message
Joined: 20 Jul 14
Posts: 732
Credit: 130,089,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 40203 - Posted: 18 Feb 2015, 7:47:41 UTC

GERARD_CXCL12_3GG_CGENFF2 4 OK (no failure)
GERARD_CXCL12_3GG_CGENFF3 4 OK (no failure)
GERARD_CXCL12_3GG_FX_LigAssay21 4 OK (no failure)
GERARD_CXCL12_LIG4_CGENFF2 4 OK (no failure)

I don't think there is anything wrong with the GERARD_CXCL12 work units, they just push the card hard.

+100

It is not just temperature. Overclocking (even factory overclocking) can, and frequently does cause errors.

+100
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres
ID: 40203 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RaymondFO*

Send message
Joined: 22 Nov 12
Posts: 72
Credit: 14,040,706,346
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40214 - Posted: 19 Feb 2015, 18:20:10 UTC - in response to Message 40203.  
Last modified: 19 Feb 2015, 18:37:27 UTC

ID: 40214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GoodFodder

Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 40278 - Posted: 27 Feb 2015, 16:35:36 UTC

Likewise been having problems with GERARD_CXCL12 (XP x86, driver: 344.65).

Error SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1965.

Going by this thread looking like Gerard is stressing the cards hard - going to try down clocking
ID: 40278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Crashes running gerard wus

©2025 Universitat Pompeu Fabra