all WUs downloaded recently produce "computation error" right away

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46881 - Posted: 14 Apr 2017, 20:17:53 UTC - in response to Message 46879.  

These are simply the failed workunits waiting to be resend to another host, but there's none to send to, because all have used up their dailiy quota.

Which means that all the WUs that were faulty to begin with, will be "recycled", so to speak; and at some point, there will be several thousand faulty WUs in the queue :-(
So I am curious how this pile of junk will be successfully cleaned up :-)
ID: 46881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom Miller

Send message
Joined: 21 Nov 14
Posts: 5
Credit: 1,081,640,766
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwat
Message 46882 - Posted: 14 Apr 2017, 22:01:48 UTC

My first failure was at

14:21:01 UTC on the 14th.

50+ and counting.
ID: 46882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46883 - Posted: 14 Apr 2017, 22:06:43 UTC

Here's an interesting one: WU 12499196.

Three consecutive failures with exit status -44, as we're all seeing. All of those were with the v8.48, cuda65 application.

But the fourth has gone to my (one and only) GTX 1050 Ti running the v9.15, cuda80 application. And it's running just fine - even better than fine, blisteringly fast.

There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning.
ID: 46883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46884 - Posted: 14 Apr 2017, 22:43:35 UTC - in response to Message 46883.  

I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5
ID: 46884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46885 - Posted: 14 Apr 2017, 22:58:34 UTC - in response to Message 46869.  
Last modified: 14 Apr 2017, 23:02:23 UTC

I experience the same behavior.
I would like to add that from my experience the workunits in progress will also fail with this error if you restart your PC, right after the restart.


I'd like to further clarify that.

If you suspend in-progress tasks, then resume them, they will fail. I just lost tons of work that way :) I smile, because it's all I can do. It happens. Just wanted to add that suspending and restarting the task itself, is also a problem.

Backup projects (attached with 0 resource share) are starting to kick in for me.
ID: 46885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46886 - Posted: 14 Apr 2017, 23:12:53 UTC - in response to Message 46885.  
Last modified: 14 Apr 2017, 23:22:19 UTC

I experience the same behavior.
I would like to add that from my experience the workunits in progress will also fail with this error if you restart your PC, right after the restart.


I'd like to further clarify that.

If you suspend in-progress tasks, then resume them, they will fail. I just lost tons of work that way :) I smile, because it's all I can do. It happens. Just wanted to add that suspending and restarting the task itself, is also a problem.
I'm aware of that problem, but that gives a different error message in stderr.txt
EDIT: maybe I don't remember it right, and the error code / message is the same, but my tasks did not error out after a restart earlier.
ID: 46886 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46887 - Posted: 14 Apr 2017, 23:15:22 UTC - in response to Message 46884.  
Last modified: 14 Apr 2017, 23:17:58 UTC

I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5

Sampling through a few of the highest-RAC users on my way to bed, it looks as if all their 970/980 cards are erroring tasks, but all their 1070/1080 cards are working normally. There's a debug clue in there somewhere.

Edit - including Retvari's single active 1080, host 23631
ID: 46887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46888 - Posted: 14 Apr 2017, 23:18:10 UTC - in response to Message 46883.  

Here's an interesting one: WU 12499196.

Three consecutive failures with exit status -44, as we're all seeing. All of those were with the v8.48, cuda65 application.

But the fourth has gone to my (one and only) GTX 1050 Ti running the v9.15, cuda80 application. And it's running just fine - even better than fine, blisteringly fast.

My GTX 1080 is working fine with the 9.15 app under Windows 10.

There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning.

I don't think so.
It's more likely that some dll stopped working after a given date, that is 04.14.2017.
It could be a licensing limitation, or other time limit which is expired.
ID: 46888 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46889 - Posted: 14 Apr 2017, 23:43:29 UTC - in response to Message 46888.  
Last modified: 14 Apr 2017, 23:44:07 UTC

There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning.

I don't think so.
It's more likely that some dll stopped working after a given date, that is 04.14.2017.
It could be a licensing limitation, or other time limit which is expired.

I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app.
ID: 46889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46890 - Posted: 15 Apr 2017, 0:03:20 UTC

Great find, Retvari! That should help the devs to solve it as quickly as they can!

On a lighter note, I found another easy workaround too, here:
https://www.youtube.com/watch?v=dQw4w9WgXcQ
ID: 46890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46892 - Posted: 15 Apr 2017, 6:39:10 UTC - in response to Message 46889.  

I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app.

I've tried to do this, however, I got stuck with "the computer has finished the daily quota of 1 task" - HOW NICE :-(

Slowly but surely I am kind of fed up by GPUGRID. I'm getting more and more impression (like one of the posters above) that they don't take their work serious enough :-(
ID: 46892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [PUGLIA] kidkidkid3
Avatar

Send message
Joined: 23 Feb 11
Posts: 101
Credit: 1,589,749,957
RAC: 876
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46895 - Posted: 15 Apr 2017, 7:38:56 UTC - in response to Message 46890.  

Great find, Retvari! That should help the devs to solve it as quickly as they can!



Peace and love, thanks great Retvari, good Easter to all ... be patient !
K.

Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing.
(Martin Luther King)
ID: 46895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46896 - Posted: 15 Apr 2017, 8:03:17 UTC - in response to Message 46895.  

... be patient !

I am afraid that my patience is overstreched by now - every month a major problem which makes GPUGRID crunching impossible for several days :-(((
ID: 46896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46897 - Posted: 15 Apr 2017, 8:23:04 UTC - in response to Message 46889.  

So there's a date limit somewhere in the 8.48 app.

My suspicion (unverified) is that the problem might lie with tcl84.dll

That's been replaced with tcl86.dll in v9.14/5, and https://www.activestate.com/activetcl seem to have a rather curious licencing regime:

Business and Enterprise Editions provide access to older Tcl versions:

Although non-production use is permitted for free using our latest Community Edition versions, use of legacy versions on non-production and/or production machines requires a Business Edition or Enterprise Edition license.

I'll play around with some options later.
ID: 46897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46898 - Posted: 15 Apr 2017, 9:42:21 UTC - in response to Message 46897.  

So there's a date limit somewhere in the 8.48 app.

My suspicion (unverified) is that the problem might lie with tcl84.dll

That's been replaced with tcl86.dll in v9.14/5...

I've tried to replace tcl84.dll with tcl86.dll by renaming the latter (and setting don't check file sizes in cc_config.xml), but then I got a different error:
There are no child processes to wait for.
 (0x80) - exit code 128 (0x80)

See this task.
ID: 46898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 46899 - Posted: 15 Apr 2017, 9:53:54 UTC - in response to Message 46889.  

Zoltan wrote:
I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching!

For me, this worked on the two Windows 10 PCs.

On my main crunching PC with two GTX980Ti and XP, I unfortunately had the "limit of daily tasks" problem (as mentioned earlier here), on the other one with the GTX750Ti and XP, after changing the date backwards (to 04.13.2017), none of the buttons on the left side of the BOINC manager did react any more. So I could not do what I had intended.
Only after changing the date back to real, the BOINC manager worked again. So no chance to apply this "date trick" on XP, at least not on mine :-(
ID: 46899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46900 - Posted: 15 Apr 2017, 10:33:28 UTC - in response to Message 46898.  

So there's a date limit somewhere in the 8.48 app.

My suspicion (unverified) is that the problem might lie with tcl84.dll

That's been replaced with tcl86.dll in v9.14/5...

I've tried to replace tcl84.dll with tcl86.dll by renaming the latter (and setting don't check file sizes in cc_config.xml), but then I got a different error:
There are no child processes to wait for.
 (0x80) - exit code 128 (0x80)

I had the same idea, but tried a different route: I wrapped up the existing files in an app_info.xml, and then changed the tcl file reference to supply a copy of tcl86.dll

No dice: instead, I got error

0xC000007B
STATUS_INVALID_IMAGE_FORMAT

(task 16233669 - confirmed that this related to the tcl change with some offline tests)

This machine is Windows 7 with a GTX 970 and (currently) a maximum cuda 7.0 driver. Next steps will be to try a cuda 8.0 driver and see what the project sends me: if it's still v8.48, I'll try putting v9.15 into an app_info.
ID: 46900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46902 - Posted: 15 Apr 2017, 11:55:39 UTC - in response to Message 46900.  

Sad to report that both approaches failed. A normal work fetch got me v8.48 even with a cuda 8.0 driver, and it failed with the clock error as before.

A full v9.15 file set (copied from my GTX 1050Ti machine, also running Windows 7/64) under app_info.xml gave repeated iterations of

15/04/2017 12:31:21 | GPUGRID | [cpu_sched] Starting task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 using acemdlong version 915 (cuda80) in slot 1
15/04/2017 12:31:24 | GPUGRID | Task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 exited with zero status but no 'finished' file
15/04/2017 12:31:24 | GPUGRID | If this happens repeatedly you may need to reset the project.

- the app quits silently with no error number, and doesn't even have time to start writing a stderr.txt file or to write anything to the _0_0 result file (aka 'progress.log'). The only evidence that the app has even tried to run is a 'canary' file in the slot directory. The only diagnostics output I can get is from a command prompt:

D:\BOINCdata\slots\1>acemd.915-80.exe
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Created context 0 on GPU 0
SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448.
# SWAN swan_assert 0

Card data is

15/04/2017 12:28:16 | | CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 368.81, CUDA version 8.0, compute capability 5.2, 4096MB, 3066MB available, 4087 GFLOPS peak)

I think I'm stuck until the staff are back in the lab.
ID: 46902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 46903 - Posted: 15 Apr 2017, 12:08:18 UTC

I talked with Matt. He says that it's probably the license that time-expired. Updating the drivers will get the cuda 8 app which should fix it.
ID: 46903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stefan
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 46904 - Posted: 15 Apr 2017, 12:24:29 UTC

For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it
ID: 46904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : all WUs downloaded recently produce "computation error" right away

©2025 Universitat Pompeu Fabra