Message boards :
Number crunching :
Error on ethTRYP
Message board moderation
| Author | Message |
|---|---|
rittermSend message Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I recently returned to GPUGrid after a long absence and haven't been having any problems (that I know of) until this ethTRYP workunit errored out overnight. Stderr out is: <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 570" # Clock rate: 1.46 GHz # Total amount of global memory: 1275658240 bytes # Number of multiprocessors: 15 # Number of cores: 120 MDIO: cannot open file "restart.coor" ERROR: get_Dvec() element 0 (b) called boinc_finish </stderr_txt> ]]> I'm not sure if these are to be expected every once in a while or if it's indicative of a problem with the host or GPU. Thanks for any words of advice. MarkR |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
From the crunchers perspective it's just a fact that you get the odd task failure. There are some generic things you can do to try to deal with increasing amounts of failures, but one failure in say 50 isn't a big issue; overall performance is reduced by no more than 2%. It might be your setup (overheating, clocks too high, or something else you can change), a bug (for Ignasi to fix), or deprecated clients creating problems in results from previous generations of tasks that silently cause WU corruption on upload. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unfortunately there's a rate of 22% of failure for these tasks. Sources are several, from abortions via GUI to unknown app errors. Most of these WUs are running fine so I'd discard the bug cause. You may check out skgiven's suggestions. cheers, i |
rittermSend message Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for the feedback, you guys. I don't overclock and I don't believe I have any overheating issues in this host. If I see more errors, I'll try some of the other things like dedicating a CPU, changing the driver, etc. Otherwise, I'll just chalk this one up to bad luck. MarkR |
rittermSend message Joined: 31 Jul 09 Posts: 88 Credit: 244,413,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just an FYI in case it's worthwhile, the WU I referenced in my original post resulted in compute errors for two other hosts before being successfully completed. And, although it's technically off-topic, this metTRYP errored out for my host and two others and has been sent to a fourth. [edit]Now four errors and sent to a fifth host... :-)[/edit] |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just an FYI in case it's worthwhile, the WU I referenced in my original post resulted in compute errors for two other hosts before being successfully completed. Yeah, I was one of them :(. My second error out of 60 odd tasks. didn't mind this one too much as it failed after only 6500 seconds. http://Thishttp://www.gpugrid.net/workunit.php?wuid=3174532 one though failed after 37000 seconds :/ took 5 errors and 1 abortion before finally being completed by the 6th user. |
|
Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just an FYI in case it's worthwhile, the WU I referenced in my original post resulted in compute errors for two other hosts before being successfully completed. Yeah, I was one of them :(. My second error out of 60 odd tasks. didn't mind this one too much as it failed after only 6500 seconds. this one though failed after 37000 seconds :/ took 5 errors and 1 abortion before finally being completed by the 6th user. |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Some tasks end up being corrupted and error out serially and end up dying. Again, causes are a mystery but the impact is very small on the batch overall. I apologize for the inconveniences anyway. i |
©2026 Universitat Pompeu Fabra