Message boards :
Graphics cards (GPUs) :
Problems with WU's
Message board moderation
| Author | Message |
|---|---|
darkstarz1Send message Joined: 18 Sep 09 Posts: 2 Credit: 108,415,090 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Running GPU-Grid without any problems until the last few days, now got over a dozen failed WU's, and with these messages; 28/03/2010 11:56:29 GPUGRID Computation for task a62-TONI_HERG79a-15-100-RND2348_0 finished 28/03/2010 11:56:29 GPUGRID Output file a62-TONI_HERG79a-15-100-RND2348_0_1 for task a62-TONI_HERG79a-15-100-RND2348_0 absent 28/03/2010 11:56:29 GPUGRID Output file a62-TONI_HERG79a-15-100-RND2348_0_2 for task a62-TONI_HERG79a-15-100-RND2348_0 absent 28/03/2010 11:56:29 GPUGRID Output file a62-TONI_HERG79a-15-100-RND2348_0_3 for task a62-TONI_HERG79a-15-100-RND2348_0 absent 28/03/2010 11:56:30 GPUGRID Started upload of a62-TONI_HERG79a-15-100-RND2348_0_0 28/03/2010 11:56:30 GPUGRID Started upload of a62-TONI_HERG79a-15-100-RND2348_0_4 28/03/2010 11:56:31 GPUGRID Finished upload of a62-TONI_HERG79a-15-100-RND2348_0_0 28/03/2010 11:56:31 GPUGRID Finished upload of a62-TONI_HERG79a-15-100-RND2348_0_4 28/03/2010 11:56:31 GPUGRID Started upload of a62-TONI_HERG79a-15-100-RND2348_0_7 28/03/2010 11:56:32 GPUGRID Finished upload of a62-TONI_HERG79a-15-100-RND2348_0_7 28/03/2010 11:57:20 GPUGRID Sending scheduler request: To fetch work. 28/03/2010 11:57:20 GPUGRID Reporting 1 completed tasks, requesting new tasks for GPU 28/03/2010 11:57:25 GPUGRID Scheduler request completed: got 1 new tasks 28/03/2010 11:57:27 GPUGRID Started download of a449-TONI_HERG79a-15-LICENSE 28/03/2010 11:57:27 GPUGRID Started download of a449-TONI_HERG79a-15-COPYRIGHT 28/03/2010 11:57:29 GPUGRID Finished download of a449-TONI_HERG79a-15-LICENSE 28/03/2010 11:57:29 GPUGRID Finished download of a449-TONI_HERG79a-15-COPYRIGHT 28/03/2010 11:57:29 GPUGRID Started download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_1 28/03/2010 11:57:29 GPUGRID Started download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_2 28/03/2010 11:57:33 GPUGRID Finished download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_1 28/03/2010 11:57:33 GPUGRID Finished download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_2 28/03/2010 11:57:33 GPUGRID Started download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_3 28/03/2010 11:57:33 GPUGRID Started download of a449-TONI_HERG79a-15-pdb_file 28/03/2010 11:57:36 GPUGRID Finished download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_3 28/03/2010 11:57:36 GPUGRID Started download of a449-TONI_HERG79a-15-psf_file 28/03/2010 11:57:37 GPUGRID Finished download of a449-TONI_HERG79a-15-psf_file 28/03/2010 11:57:37 GPUGRID Started download of a449-TONI_HERG79a-15-par_file 28/03/2010 11:57:40 GPUGRID Finished download of a449-TONI_HERG79a-15-pdb_file 28/03/2010 11:57:40 GPUGRID Started download of a449-TONI_HERG79a-15-conf_file_enc 28/03/2010 11:57:41 GPUGRID Finished download of a449-TONI_HERG79a-15-conf_file_enc 28/03/2010 11:57:41 GPUGRID Started download of a449-TONI_HERG79a-15-metainp_file 28/03/2010 11:57:42 GPUGRID Finished download of a449-TONI_HERG79a-15-metainp_file 28/03/2010 11:57:42 GPUGRID Started download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_7 28/03/2010 11:57:43 GPUGRID Finished download of a449-TONI_HERG79a-15-a449-TONI_HERG79a-14-100-RND5529_7 28/03/2010 11:57:52 GPUGRID Finished download of a449-TONI_HERG79a-15-par_file 28/03/2010 11:57:52 GPUGRID Starting a449-TONI_HERG79a-15-100-RND5529_0 28/03/2010 11:57:52 GPUGRID Starting task a449-TONI_HERG79a-15-100-RND5529_0 using acemd2 version 603 28/03/2010 11:58:30 GPUGRID Computation for task a449-TONI_HERG79a-15-100-RND5529_0 finished 28/03/2010 11:58:30 GPUGRID Output file a449-TONI_HERG79a-15-100-RND5529_0_1 for task a449-TONI_HERG79a-15-100-RND5529_0 absent 28/03/2010 11:58:30 GPUGRID Output file a449-TONI_HERG79a-15-100-RND5529_0_2 for task a449-TONI_HERG79a-15-100-RND5529_0 absent 28/03/2010 11:58:30 GPUGRID Output file a449-TONI_HERG79a-15-100-RND5529_0_3 for task a449-TONI_HERG79a-15-100-RND5529_0 absent 28/03/2010 11:58:31 GPUGRID Started upload of a449-TONI_HERG79a-15-100-RND5529_0_0 28/03/2010 11:58:31 GPUGRID Started upload of a449-TONI_HERG79a-15-100-RND5529_0_4 28/03/2010 11:58:33 GPUGRID Finished upload of a449-TONI_HERG79a-15-100-RND5529_0_0 28/03/2010 11:58:33 GPUGRID Finished upload of a449-TONI_HERG79a-15-100-RND5529_0_4 28/03/2010 11:58:33 GPUGRID Started upload of a449-TONI_HERG79a-15-100-RND5529_0_7 28/03/2010 11:58:34 GPUGRID Finished upload of a449-TONI_HERG79a-15-100-RND5529_0_7 Any ideas anyone ? |
darkstarz1Send message Joined: 18 Sep 09 Posts: 2 Credit: 108,415,090 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
core_client_version>6.10.18</core_client_version> <![CDATA[ ...and this, example from 1 invalid WU :<message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.51 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> Validate state Invalid |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Does the problem persist after rebooting? Ps. Moving to the "GPU" thread. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Before (working) # There is 1 device supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.35 GHz After (not working) # There is 1 device supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.51 GHz <-------- |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The TONI_HERG tasks seem to be particularly problematic - see the hERG: information and issues thread. [But since the run of errors I posted there, I have more recently had some successful runs - with no change in the host configuration] |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
He gets errors on all types of WUs. Please check clock rate. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
He gets errors on all types of WUs. Please check clock rate. The clock rate is clearly a problem. But the message log in the OP, and hence the issue which prompted him to post in the first place, is exclusively about TONI_HERG. I deliberately didn't speculate on the cause of the problem, just pointed out the correlation. From my POV, the jury's still out on whether T_H stresses GPUs more than other tasks, and hence selectively culls the weaker/hotter/badly configured specimens, or whether there's a bug in the application (a code-path which is only followed by particular parameter sets, for example). |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
I didn't mean to be rude. Certain types of WUs may indeed turn out to be more sensitive to a variety of factors (including exposing rare bugs in drivers/hardware combinations, which are close to impossible to spot). My impression is that, at least since the new application, the global error rate of HERGs is in line with the others. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had another task crash on me tonight. Guess which type it was... a68-TONI_HERG77a-17-100-RND2481_1 In this case, very many thanks (and that's genuine, not sarcastic). The aftermath solved a SETI Beta problem which has been bugging me, and the BOINC Alpha bug-report mailing list, for the last three weeks. I learned something new to me, and I think largely forgotten by the BOINC developers. It's in an area of code which is about to undergo major change: hopefully the write-up I've been able to submit as a result of this crash will enable safeguards to be built into the new code to replace the old ones which will no longer function. |
©2026 Universitat Pompeu Fabra