Message boards :
Number crunching :
Dozens of Failed Tasks
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My systems are currently failing more tasks than ever. It looks like most of the work units failed on other systems as well. Do we have a large number of corrupt work units or have I done something wrong on my setups? I just put a new GTX 580 in my farm this weekend and was disappointed to see all of the failed WUs. Again, when I look at the failed WUs, they usually failed 3 or 4 times on other systems as well. Any help is appreciated. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Hi, from a quick look it seems that only one of your hosts http://www.gpugrid.net/results.php?hostid=119703 has recent failed tasks. New driver? Have you checked the thread on monitor-off corruption? |
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
thank you for the quick reply. When I spot check the tasks that failed, most of them look like they failed on other computers as well. http://www.gpugrid.net/workunit.php?wuid=3231807 http://www.gpugrid.net/workunit.php?wuid=3231830 http://www.gpugrid.net/workunit.php?wuid=3228376 http://www.gpugrid.net/workunit.php?wuid=3227851 http://www.gpugrid.net/workunit.php?wuid=3231792 http://www.gpugrid.net/workunit.php?wuid=3231127 There was a discussion of monitor off issues with the 295 drivers but I did not see a resolution. I am happy to make changes to help. thx |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Your issue is most likely with 295.73. On W7 systems we are recommending that people avoid the 295.x drivers. Most drivers from 260 to 285 should work. I would recommend downloading one, then fully uninstalling 295, restart and then install the downloaded driver. http://www.gpugrid.net/workunit.php?wuid=3231830 All errors on W7 with 295 drivers, except one (258 - too old)! Seems to be the case with for most of that list of errors. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It looks like 275.33 fixed everything. Now back to crunching!! Thank you |
dskagcommunitySend message Joined: 28 Apr 11 Posts: 463 Credit: 979,266,958 RAC: 84,915 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Never touch a running system =) good to have a cruncher for real science back :) DSKAG Austria: http://www.dskag.at
|
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am back to the 275.33 drivers but downclocking has become an issue. I never saw this problem before I upgraded and then downgraded the drivers. 295.73 appears to fix the downclocking but does not work with GPUGrdid WUs. 275.33 will downclock but WUs continue to run. Does anyone have a resolution for the downclocking issue on Windows 7? I saw a batch file for Linux but nothing for Windows. thank you |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Some sort of 'threadsafe' exit code might do it, though it might also change the code to the extent that the research methods alter; something that reviewers might not be so keen on. Obviously the recent app updates, adding instructions to enable some task types, didn't include this 'threadsafte' code. Maybe next time. This thread might be useful. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Can someone looke at this WU http://www.gpugrid.net/workunit.php?wuid=3261113 and provide some insight into the failure? It is really hurts when they run for 13,000 seconds and fail. Everyone else failed the task as well but they have the 295.73 drivers. thank you. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The failure was, ERROR: # Energies have become nan This means the Energy value being calculated is 'Not A Number'. I think this may mean zero or the value just went out of some stipulated check range, and as a result are being described as nan. Many of Boinc's 'Error codes' are not actually errors, and few elucidate the issue let alone suggest a solution to crunchers. Consider reducing your shader clocks. Energies have become nan thread. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 19 Apr 09 Posts: 2 Credit: 11,426,878 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am suddenly having a bunch of acemd2 tasks fail. This is the first time this has happened since I started running GPUGRID. http://www.gpugrid.net/results.php?userid=21556&offset=0&show_names=0&state=5&appid= |
|
Send message Joined: 26 Dec 10 Posts: 115 Credit: 416,576,946 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This computer http://www.gpugrid.net/show_host_detail.php?hostid=119703 failed a few WUs in a row folloed by at least 2 successful WUs. I did not change anything. My GPU typically never exceeds 71C. Is my GPU just running a little too fast? I had about 10 successful WUs prior to these errors. I read the thread on nan and I don't have a heat issue but maybe I just need to pull the performance down a little to avoid the nan condition. Thank you |
©2026 Universitat Pompeu Fabra