Message boards :
Number crunching :
Bad batch of TONI-AGGd tasks
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Both my active hosts - GTX470 and GTX670 - are showing "Energies have become nan" errors for all recent tasks in this batch (short queue). http://www.gpugrid.net/results.php?hostid=43404&offset=0&show_names=1&state=5&appid=18 http://www.gpugrid.net/results.php?hostid=132158&offset=0&show_names=1&state=5&appid=18 Replication numbers are up to _5, _6, _7 - all wingmates are affected too. |
|
Send message Joined: 4 Jan 09 Posts: 13 Credit: 1,382,704,222 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same for me. The problem started November 1st around 22:00 UTC. I had to stop temporarily GPUgrid as I receive only WUs of this faulting "Toni" batch. http://www.gpugrid.net/workunit.php?wuid=3796015 |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just had to download 5 before I got a NOELIA that would run. That brings my total to 8 failed TONI wu's, it sucks because GPUGRID doesn't know that a task has failed and one of my video cards sits idle for sometime.
|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Yes, these are wrong, I have just notified Toni. gdf |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Sorry guys, I cancelled them. I was fooled because some of them went ok. Those which fail, appear to do so at the start. Thanks for the patience. T |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks Toni for you're quick action, they seem to be running fine now on all my GPU's.
|
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Thanks to you. It took a week to debug, but was very instructive in the end. Fixed workunits are called "AGGd2", and should have high GPU usage. |
|
Send message Joined: 11 Feb 12 Posts: 1 Credit: 4,090,110 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Doesn't appear to be working for me. Just noticed my industrial quantity of errors. Two slightly different water cooled 580s running at stock speeds. Two examples: http://www.gpugrid.net/result.php?resultid=6049716 and http://www.gpugrid.net/result.php?resultid=6049638 Thoughts or suggestions? GPUGrid suspended pending advice!!!! |
microchipSend message Joined: 4 Sep 11 Posts: 110 Credit: 326,102,587 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I also get TONI WUs that error out, with either "Energies have become nan" errors or, after a short period of crunching, with "output file absent" errors. This is really starting to annoy me. More so as I also get NOELIA WUs that run till the end only to report "output file absent" |
|
Send message Joined: 15 Jan 09 Posts: 3 Credit: 171,242,754 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm also getting Toni and NOELIA WU's failing after several hous. I'm setting to no new work for a few days to see how this shakes out. |
©2025 Universitat Pompeu Fabra