Message boards :
Number crunching :
all WUs downloaded recently produce "computation error" right away
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
| Author | Message |
|---|---|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
These are simply the failed workunits waiting to be resend to another host, but there's none to send to, because all have used up their dailiy quota. Which means that all the WUs that were faulty to begin with, will be "recycled", so to speak; and at some point, there will be several thousand faulty WUs in the queue :-( So I am curious how this pile of junk will be successfully cleaned up :-) |
|
Send message Joined: 21 Nov 14 Posts: 5 Credit: 1,081,640,766 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
My first failure was at 14:21:01 UTC on the 14th. 50+ and counting. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 318 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's an interesting one: WU 12499196. Three consecutive failures with exit status -44, as we're all seeing. All of those were with the v8.48, cuda65 application. But the fourth has gone to my (one and only) GTX 1050 Ti running the v9.15, cuda80 application. And it's running just fine - even better than fine, blisteringly fast. There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I experience the same behavior. I'd like to further clarify that. If you suspend in-progress tasks, then resume them, they will fail. I just lost tons of work that way :) I smile, because it's all I can do. It happens. Just wanted to add that suspending and restarting the task itself, is also a problem. Backup projects (attached with 0 resource share) are starting to kick in for me. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm aware of that problem, but that gives a different error message in stderr.txtI experience the same behavior. EDIT: maybe I don't remember it right, and the error code / message is the same, but my tasks did not error out after a restart earlier. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 318 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have updated drivers Richard on my 980ti but it won't pick up new app. I have reset project and still 8.48 cuda 6.5 Sampling through a few of the highest-RAC users on my way to bed, it looks as if all their 970/980 cards are erroring tasks, but all their 1070/1080 cards are working normally. There's a debug clue in there somewhere. Edit - including Retvari's single active 1080, host 23631 |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's an interesting one: WU 12499196. My GTX 1080 is working fine with the 9.15 app under Windows 10. There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning. I don't think so. It's more likely that some dll stopped working after a given date, that is 04.14.2017. It could be a licensing limitation, or other time limit which is expired. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There was an announcement this week that v9.15 was now available to all supported GPU generations: my older ones haven't picked it up, probably because I haven't updated my drivers recently. But just maybe, the current tasks require v9.15? That's one to test in the morning. I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Great find, Retvari! That should help the devs to solve it as quickly as they can! On a lighter note, I found another easy workaround too, here: https://www.youtube.com/watch?v=dQw4w9WgXcQ |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! Yes, the 8.48 app. So there's a date limit somewhere in the 8.48 app. I've tried to do this, however, I got stuck with "the computer has finished the daily quota of 1 task" - HOW NICE :-( Slowly but surely I am kind of fed up by GPUGRID. I'm getting more and more impression (like one of the posters above) that they don't take their work serious enough :-( |
[PUGLIA] kidkidkid3Send message Joined: 23 Feb 11 Posts: 101 Credit: 1,589,749,957 RAC: 876 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Great find, Retvari! That should help the devs to solve it as quickly as they can! Peace and love, thanks great Retvari, good Easter to all ... be patient ! K. Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King) |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... be patient ! I am afraid that my patience is overstreched by now - every month a major problem which makes GPUGRID crunching impossible for several days :-((( |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 318 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So there's a date limit somewhere in the 8.48 app. My suspicion (unverified) is that the problem might lie with tcl84.dll That's been replaced with tcl86.dll in v9.14/5, and https://www.activestate.com/activetcl seem to have a rather curious licencing regime: Business and Enterprise Editions provide access to older Tcl versions: I'll play around with some options later. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So there's a date limit somewhere in the 8.48 app. I've tried to replace tcl84.dll with tcl86.dll by renaming the latter (and setting don't check file sizes in cc_config.xml), but then I got a different error: There are no child processes to wait for. (0x80) - exit code 128 (0x80) See this task. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Zoltan wrote: I've downloaded 4 new tasks with my main cruncher PC, then I've set the date on this PC to 04.13.2017, and I've started the GPUGrid tasks. Guess what? It's crunching! For me, this worked on the two Windows 10 PCs. On my main crunching PC with two GTX980Ti and XP, I unfortunately had the "limit of daily tasks" problem (as mentioned earlier here), on the other one with the GTX750Ti and XP, after changing the date backwards (to 04.13.2017), none of the buttons on the left side of the BOINC manager did react any more. So I could not do what I had intended. Only after changing the date back to real, the BOINC manager worked again. So no chance to apply this "date trick" on XP, at least not on mine :-( |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 318 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So there's a date limit somewhere in the 8.48 app. I had the same idea, but tried a different route: I wrapped up the existing files in an app_info.xml, and then changed the tcl file reference to supply a copy of tcl86.dll No dice: instead, I got error 0xC000007B STATUS_INVALID_IMAGE_FORMAT (task 16233669 - confirmed that this related to the tcl change with some offline tests) This machine is Windows 7 with a GTX 970 and (currently) a maximum cuda 7.0 driver. Next steps will be to try a cuda 8.0 driver and see what the project sends me: if it's still v8.48, I'll try putting v9.15 into an app_info. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 318 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sad to report that both approaches failed. A normal work fetch got me v8.48 even with a cuda 8.0 driver, and it failed with the clock error as before. A full v9.15 file set (copied from my GTX 1050Ti machine, also running Windows 7/64) under app_info.xml gave repeated iterations of 15/04/2017 12:31:21 | GPUGRID | [cpu_sched] Starting task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 using acemdlong version 915 (cuda80) in slot 1 15/04/2017 12:31:24 | GPUGRID | Task e14s3_e11s4p0f35-ADRIA_FOLDGREED10_crystal_ss_contacts_20_ubiquitin_4-0-1-RND0892_0 exited with zero status but no 'finished' file 15/04/2017 12:31:24 | GPUGRID | If this happens repeatedly you may need to reset the project. - the app quits silently with no error number, and doesn't even have time to start writing a stderr.txt file or to write anything to the _0_0 result file (aka 'progress.log'). The only evidence that the app has even tried to run is a 'canary' file in the slot directory. The only diagnostics output I can get is from a command prompt: D:\BOINCdata\slots\1>acemd.915-80.exe # ACEMD Molecular Dynamics Version [3212] # CUDA Synchronisation mode: BLOCKING # CUDA Synchronisation mode: BLOCKING # SWAN: Created context 0 on GPU 0 SWAN : FATAL : Cuda driver error 35 in file 'swanlibnv2.cpp' in line 448. # SWAN swan_assert 0 Card data is 15/04/2017 12:28:16 | | CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 368.81, CUDA version 8.0, compute capability 5.2, 4096MB, 3066MB available, 4087 GFLOPS peak) I think I'm stuck until the staff are back in the lab. |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
I talked with Matt. He says that it's probably the license that time-expired. Updating the drivers will get the cuda 8 app which should fix it. |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
For a more correct solution we will have to wait for Matt to update the old app next week. In the meanwhile as I said updating drivers should do it |
©2025 Universitat Pompeu Fabra