Message boards :
Number crunching :
GPU units failing
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Looks like there is a problem with current GPU Workunits: http://gpugrid.net/workunit.php?wuid=14409360 I've had around 6 failing in a short time, my quota is now exceeded. Linux here. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
True, " Error reading parmtop file " means something amiss in the WU. |
|
Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level ![]() Scientific publications
|
They all fail immediately,both in Windows and Linux. Tullio |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
same thing here on Windows: ERROR: file mdioload.cpp line 229: Error reading parmtop file :-( |
|
Send message Joined: 27 Dec 16 Posts: 6 Credit: 53,210,225 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
in my case, short runs are working, long runs not (Windows) |
|
Send message Joined: 27 Dec 16 Posts: 6 Credit: 53,210,225 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
in my case, short runs are working, long runs not (Windows) |
|
Send message Joined: 7 Jun 09 Posts: 24 Credit: 1,149,643,416 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Is it a good idea to pull the bad batch out of the queue, since they have 100% error rate, according to the server status page? They also have weird file names compared to "normal" workunits, fx AGUAGUAGUA etc. I don't know if that is of any significance though. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Is it a good idea to pull the bad batch out of the queue ... it definitely is; particularly in view of the fact that if a (low) number of such tasks has failed on a given host, this host won't receive any new tasks (regardless of good or bad ones) for the next 24 hours. So these hosts are "punished", although they didn't do anything wrong. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Can anyone explain me why while the Project Status Page shows hundreds of unsent "PABLO_2IDP..." tasks, all my hosts download only these faulty "PABLO_prod_1_UUAUACCUACCA_350K_2_ru" (and similar) tasks, although from them there are only a handful unsent left? This is rather annoying :-( What I don't understand: once it became clear that they all are faulty - why have they not been withdrawn from the queue? As a consequence, I have given up crunching GPUGRID for the time being, and will turn to other projects. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It would help if GPUGrid used a short beta test for new work units. The bad ones would show up in a hurry, and it would avoid filling up the main cache with them. It always takes time to clear out the faulty ones in BOINC; I don't think there is a fast way to do it. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It would help if GPUGrid used a short beta test for new work units. The bad ones would show up in a hurry ... in fact, I am surprised that new tasks are not being testet at all before they are sent to the download queue :-( |
©2025 Universitat Pompeu Fabra