Message boards :
Number crunching :
Problem - Tasks error when exiting/resuming using 334.67 drivers
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
MJH / Admins: I'm getting several task errors (Windows 8.1 x64, 334.67 drivers) that say: <core_client_version>7.2.39</core_client_version> <![CDATA[ <message> The file exists. (0x50) - exit code 80 (0x50) </message> and the last line in the stderr.txt file is: # BOINC suspending at user request (exit) I think that suspending/resuming tasks isn't working very well. Tasks are erroring out, when being resumed. http://www.gpugrid.net/result.php?resultid=7747671 http://www.gpugrid.net/result.php?resultid=7749480 http://www.gpugrid.net/result.php?resultid=7750550 http://www.gpugrid.net/result.php?resultid=7751319 Can you please look into this? I'm not sure if it's the application, or if it's the new BETA drivers, or if it's an issue that has always been there. But I would like it fixed! Hoping you agree, and available to help, Jacob PS: I originally posted this in the 8.15 app thread, but decided to create a new thread here. Also, I'm not the only one having this problem. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
MJH: Have you noticed an increase in instability, with 334.67 drivers, when suspending/resuming tasks, or shutting down and restarting BOINC quickly? If so, is there any way to determine if the application is the problem, or if the driver is the problem? |
|
Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Confirm same Problems here with 332.21 Driver 589x-SANTI_MAR422cap310-12-32-RND9315_0 Arbeitspaket 5177762 Name 369x-SANTI_MAR422cap310-8-32-RND5608_0 Arbeitspaket 5175511 Simulation unstable. Flag 9 value 375 # The simulation has become unstable. Terminating to avoid lock-up |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This happened again, where suspending the task, then closing BOINC, resulted in the task error'ing: http://www.gpugrid.net/result.php?resultid=7810949 Can an admin please help to resolve this issue, or will it go unanswered? I'm willing to offer whatever it takes to help test to get it resolved. MJH? |
MumakSend message Joined: 7 Dec 12 Posts: 92 Credit: 225,897,225 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same issue here too... |
|
Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Snap! GTX660, Win7-64, Driver 311.06 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Anyone at GPUGrid care to fix this, like we did the previous suspend/resume problems? I'm willing to help test. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Have we any more complete idea of the cause yet? I've recently upgraded to the WHQL version of the driver (334.89) for my GTX 670: no crashes yet, but then I don't routinely suspend tasks once they've started. What I have noticed is the reduced CPU demand, and a welcome reduction in the runtime of the SIMAP tasks running at the same time. I note that stderr says The file exists. (0x50) - exit code 80 (0x50) but MLH's FAQ says * -80 Failed to recover after an access violation (Win32) Any signs of an access violation from Windows, Jacob? I'd be interested if the problem could be narrowed down to a more immediate cause. Candidates are Windows (I see Jacob using v8.1 - I have 7 here) Driver BOINC client (I see Jacob using alpha client v7.3.2) BOINC API (linked into application) Application and of course any combination of the above, plus probably more besides. My instinctive reaction on seeing the thread title was 'API', but I'm not so sure having looked at the full error messages. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I was able to get a task to fail by: - Run BOINC such that the GPU Task is processing - Right-click tray, choose "Snooze GPU" - Verify task now says "GPU suspended" - Right-click tray, "Exit", with "Stop running tasks" checked, click OK - Start BOINC They don't fail all the time, but... if you try those exact steps over and over, eventually you might get a failure. I'd like this thread to focus on failures that are a result of those steps above. I hope we can solve it, but we'll need help from MJH. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
They don't fail all the time, but... if you try those exact steps over and over, eventually you might get a failure. I have caused GPUgrid tasks to fail on restart by stopping and restarting BOINC quickly 3 or 4 times in a row on Linux but that was last year not with current app and drivers. If I think of it I'll try to replicate it on a newly started task but I'm not going to try it on a task I've put an hour into. If a single stop BOINC and restart cycle is causing crashes then that's worth fixing but if it happens only after several stop and restart cycles in quick succession then I wonder if it's worth fixing as that is not a likely operating scenario. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I run applications that I have setup as "exclusive applications" in BOINC. And sometimes I shut down BOINC. These, even in combination, should be supported, by the projects. And I hope to have this issue resolved eventually. :) |
MJHSend message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]()
|
I've scheduled some time to sort this out in a week or so, when I'll also be putting out Maxwell support. Matt |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you. I'm not sure how exactly to help, but I'm definitely willing. Last time, we iterated app versions with debug text to solve it, right? We might have to do something similar here. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I don't think this is a driver issue. I'd been error free for a long time but in the last 5 days have been seeing errors in SANTI_MAR WUs only. Some of them occur whenever BOINC is exited (gracefully, by exit dialog) for any reason. No other WU types are affected. At first I though the exit error was only on 1GB cards but now I see on other users that it's happening on 660 Ti cards also. The SANTI_MAR WUs also seem to be particularly sensitive to other conditions too and are failing at too high a rate IMO. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've had 10 SANTI_MAR failures on the same Linux system in the past 3weeks, http://www.gpugrid.net/results.php?hostid=159186&offset=0&show_names=1&state=5&appid= Other than that there has only been the one SANTI_bax2 failure and 2 WU's I aborted since Nov. They are all,
FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have almost every other day an error of a Santi WU on my 660. On the 770 and 780Ti no errors (yet). I agree with Beyond (nice new picture of dog) that it is not the drivers. Santi's seem to be "special". Coincidentally I found a crunchers tasks list with a Titan and all Santi's failed on that system, but the recent Noeilia's finished okay. Greetings from TJ |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Matt: It has been a while -- Have you made any progress on this? I'm still regularly failing tasks during suspend and resume operations, especially SANTI_MAR tasks. It's especially painful to see 2 tasks fail simultaneously, which happens to me, because I have 2 GPUs dedicated to GPUGrid computing. Then when the tasks fail, instantly 10-20 hours of work, dead, to "Computation Error". Frustrating. We need a fix! Please help! Posted: 25 Feb 2014 | 23:29:42 UTC I've scheduled some time to sort this out in a week or so, when I'll also be putting out Maxwell support. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just had another one fail. I had 19 hours invested into it, and needed to restart my machine. I had suspended the task, I had closed BOINC, I restarted the machine, I resumed the task, and poof, Computation Error. 19 hours, wasted. This is very very frustrating. Stderr output <core_client_version>7.3.10</core_client_version> <![CDATA[ <message> The file exists. (0x50) - exit code 80 (0x50) </message> <stderr_txt> # GPU [GeForce GTX 460] Platform [Windows] Rev [3203M] VERSION [42] # SWAN Device 1 : # Name : GeForce GTX 460 # ECC : Disabled # Global mem : 1024MB # Capability : 2.1 # PCI ID : 0000:08:00.0 # Device clock : 1526MHz # Memory clock : 1900MHz # Memory width : 256bit # Driver version : r334_00 : 33489 # GPU 0 : 67C # GPU 1 : 66C # GPU 2 : 78C # GPU 1 : 67C # GPU 1 : 68C # GPU 0 : 68C # GPU 1 : 69C # GPU 1 : 70C # GPU 0 : 69C # GPU 2 : 79C # BOINC suspending at user request (exit) # GPU [GeForce GTX 460] Platform [Windows] Rev [3203M] VERSION [42] # SWAN Device 1 : # Name : GeForce GTX 460 # ECC : Disabled # Global mem : 1024MB # Capability : 2.1 # PCI ID : 0000:08:00.0 # Device clock : 1526MHz # Memory clock : 1900MHz # Memory width : 256bit # Driver version : r334_00 : 33489 # GPU 0 : 66C # GPU 1 : 65C # GPU 2 : 73C # GPU 1 : 66C # GPU 2 : 75C # GPU 0 : 67C # BOINC suspending at user request (exit) </stderr_txt> ]]> |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And another one today. The file exists. (0x50) - exit code 80 (0x50) MJH? |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I just had another one fail. I had 19 hours invested into it, and needed to restart my machine. I had suspended the task, I had closed BOINC, I restarted the machine, I resumed the task, and poof, Computation Error. This same thing happens here on every SANTI_MAR WU when I have to exit BOINC and reboot for an update or whatever. 100% chance of error. Frustrating is the word. |
©2025 Universitat Pompeu Fabra