GPU units failing

Author	Message
cadbane Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 0 Level Scientific publications	Message 50313 - Posted: 28 Aug 2018, 13:01:10 UTC Looks like there is a problem with current GPU Workunits: http://gpugrid.net/workunit.php?wuid=14409360 I've had around 6 failing in a short time, my quota is now exceeded. Linux here. ID: 50313 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 50315 - Posted: 28 Aug 2018, 13:30:34 UTC - in response to Message 50313. True, " Error reading parmtop file " means something amiss in the WU. ID: 50315 · Rating: 0 · rate: / Reply Quote

tullio Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level Scientific publications	Message 50319 - Posted: 28 Aug 2018, 13:51:03 UTC They all fail immediately,both in Windows and Linux. Tullio ID: 50319 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 309,194 Level Scientific publications	Message 50321 - Posted: 28 Aug 2018, 15:50:34 UTC same thing here on Windows: ERROR: file mdioload.cpp line 229: Error reading parmtop file :-( ID: 50321 · Rating: 0 · rate: / Reply Quote

MartinKanne Send message Joined: 27 Dec 16 Posts: 6 Credit: 53,210,225 RAC: 0 Level Scientific publications	Message 50324 - Posted: 28 Aug 2018, 18:15:08 UTC in my case, short runs are working, long runs not (Windows) ID: 50324 · Rating: 0 · rate: / Reply Quote

MartinKanne Send message Joined: 27 Dec 16 Posts: 6 Credit: 53,210,225 RAC: 0 Level Scientific publications	Message 50325 - Posted: 28 Aug 2018, 18:15:13 UTC in my case, short runs are working, long runs not (Windows) ID: 50325 · Rating: 0 · rate: / Reply Quote

cadbane Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 0 Level Scientific publications	Message 50326 - Posted: 28 Aug 2018, 18:37:46 UTC Is it a good idea to pull the bad batch out of the queue, since they have 100% error rate, according to the server status page? They also have weird file names compared to "normal" workunits, fx AGUAGUAGUA etc. I don't know if that is of any significance though. ID: 50326 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 309,194 Level Scientific publications	Message 50327 - Posted: 28 Aug 2018, 19:29:39 UTC - in response to Message 50326. Is it a good idea to pull the bad batch out of the queue ... it definitely is; particularly in view of the fact that if a (low) number of such tasks has failed on a given host, this host won't receive any new tasks (regardless of good or bad ones) for the next 24 hours. So these hosts are "punished", although they didn't do anything wrong. ID: 50327 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 309,194 Level Scientific publications	Message 50329 - Posted: 29 Aug 2018, 10:58:58 UTC Last modified: 29 Aug 2018, 11:00:08 UTC Can anyone explain me why while the Project Status Page shows hundreds of unsent "PABLO_2IDP..." tasks, all my hosts download only these faulty "PABLO_prod_1_UUAUACCUACCA_350K_2_ru" (and similar) tasks, although from them there are only a handful unsent left? This is rather annoying :-( What I don't understand: once it became clear that they all are faulty - why have they not been withdrawn from the queue? As a consequence, I have given up crunching GPUGRID for the time being, and will turn to other projects. ID: 50329 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 50330 - Posted: 29 Aug 2018, 11:33:22 UTC - in response to Message 50329. It would help if GPUGrid used a short beta test for new work units. The bad ones would show up in a hurry, and it would avoid filling up the main cache with them. It always takes time to clear out the faulty ones in BOINC; I don't think there is a fast way to do it. ID: 50330 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 309,194 Level Scientific publications	Message 50338 - Posted: 30 Aug 2018, 10:42:17 UTC - in response to Message 50330. It would help if GPUGrid used a short beta test for new work units. The bad ones would show up in a hurry ... in fact, I am surprised that new tasks are not being testet at all before they are sent to the download queue :-( ID: 50338 · Rating: 0 · rate: / Reply Quote