BEWARE: 2p0m-SDOERR_OPMamber6P2-0-1-RND4183

Message boards : Number crunching : BEWARE: 2p0m-SDOERR_OPMamber6P2-0-1-RND4183
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45898 - Posted: 24 Dec 2016, 17:41:11 UTC

If you get one of these, be careful. This bad WU failed on 9 machines and had the additional insult of putting my 1060 in a state where it failed the next WU. Luckily I happened to catch it because of the super long DL times here and rebooted, fixing the problem. Here's the WU, there are most likely more like this floating around:

https://www.gpugrid.net/workunit.php?wuid=12205143

It's possible that it only locks up the GPU with the CUDA80 app as all the other machines were running CUDA65.
ID: 45898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45909 - Posted: 25 Dec 2016, 6:15:16 UTC - in response to Message 45898.  

This bad WU failed on 9 machines and had the additional insult of putting my 1060 in a state where it failed the next WU.

Edit: now 10 machines...
ID: 45909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile caffeineyellow5
Avatar

Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 45924 - Posted: 26 Dec 2016, 2:30:09 UTC - in response to Message 45909.  

This bad WU failed on 9 machines and had the additional insult of putting my 1060 in a state where it failed the next WU.

Edit: now 10 machines...

Probably 8 errors and 2 'ghost' downloads. I usually have at least 1 'ghost' in the system at all times on my main system. It shows there is a 7th WU out there, but the machine itself shows that task as not being on it and the logs and xml shows that WU never even existed.
ID: 45924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : BEWARE: 2p0m-SDOERR_OPMamber6P2-0-1-RND4183

©2025 Universitat Pompeu Fabra