Message boards :
News :
WU: NOELIA_KLEBEs
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So far I've had one long wu error after 13s one short wu complete OK one long wu stall, ran for 2hr on a 660 but little progress, so got the bullet ![]() Be on your guard! |
Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Encountering issues with Noelia WUs: http://www.gpugrid.net/result.php?resultid=7218125 http://www.gpugrid.net/result.php?resultid=7218124 http://www.gpugrid.net/result.php?resultid=7218111 All had the same error, swanMemset failed |
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Since I saw a few error posts popping out about Noelia's new WU's and there was no official thread... I make this thread to collect them all. Once they all come to the office I will inform them. |
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've had 5 NOELIA's fail in the past 24 hours. |
Send message Joined: 5 Jul 12 Posts: 35 Credit: 393,375 RAC: 0 Level ![]() Scientific publications ![]() |
Hi, for some reason some of you are having problems with this WUs in the new application and we've moved them to the beta queue to have a proper look. I've also just sent 50 WU under the name KLEBEbeta with a much simpler configuration file. These simulations are really important, and fixing this bug will also help for future similar projects in drug discovery. Please report any problems you might have on groups KLEBEs and KLEBEbeta. |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Noelia, Thanks for moving to beta to try to fix the problems! I'm noticing a "new" problem with the NOELIA_KLEBEbeta tasks, though. Essentially, they get assigned to a GPU, and then when they try to "get going", BOINC shows that they run for about 15-40 seconds, and then the task resets back to the beginning (with Elapsed back to 0 seconds), and it retries. It just keeps retrying until failure. Additionally, if the user closes BOINC, the acemd.800-55.exe process for that task does not close properly (it still remains in the Task Manager's process list, even though all other related BOINC processes have exited normally). Also, looking at stderr.txt for one of the tasks that I aborted (http://www.gpugrid.net/result.php?resultid=7221709), said the following lines that might give a hint as to what's happening: swanMemset failed Can't acquire lockfile - exiting FILE_LOCK::unlock(): close failed.: No error I have not seen this behavior before today, so I think there is at least 1 new bug here. This happens both on my GTX 660 Ti, as well as my GTX 460, in Windows 8.1 Preview x64. The current task exhibiting this behavior is: 109nx4-NOELIA_KLEBEbeta-0-3-RND0846_0 I hope this information helps you to track it down to correct the problem(s) quickly, as right now my GPU is spinning in circles and doing no work. Are you able to reproduce the problem in your testing? If there's anything else you might need, please let us know. Thanks, Jacob |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
I've probably fixed the fault. There'll be an updated acemdbeta app very soon. MJH |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
801 is now live. |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Lol What a "luck" ^^ http://www.gpugrid.net/workunit.php?wuid=4729004 One of my maschine failed this, and i got exactly this wu to my next machine where it stucks O.o saw it now after 2 hours. Puh earliy enough before weekend ^^ DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for the prompt response. I got an 801 KLEBEbeta, and it's at least getting off the ground now. I hope that you can see that it is very difficult for us to know if the problem is in the task set, or if the problem is in the application. Will continue to monitor... |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here? MJH |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I tested suspending one of these KLEBEbeta tasks, and it caused a driver reset. So, the problem still persists. Can you please look into it more closely? The issue has to deal with how the KLEBE tasks are exiting - it seems they are not releasing the GPU in a timely fashion, as compared to every other GPU task I run (across all my GPU projects). Maybe compare the exit logic of a KLEBE task, versus the exit logic of other GPUGrid task types? |
Send message Joined: 15 May 11 Posts: 108 Credit: 297,176,099 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
On my Titan box I've gotten two of these NOELIAs. Both exhibited the same behavior. "8/29/2013 7:36:55 AM | GPUGRID | Task 063px38-NOELIA_KLEBEs-1-3-RND3786_0 exited with zero status but no 'finished' file" Over and over again without making much progress. So I pulled the trigger to kill them. So it's back to babysitting to make sure I only get NATHAN longs for the time being. That NOELIA sure has a reputation! ;-} Operator |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Jacob, The current beta addresses the "swanMemset failed" and "access violation" errors. The suspend problem I have not yet investigated. (Is it with 'suspend to memory' or 'suspend and exit'? ) MJH |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Jacob, There's a whole thread about it, where I posted in as much detail as I could about the problem, on 4/4/2013 (4+ months ago), here: http://www.gpugrid.net/forum_thread.php?id=3333 It happens whenever a NOELIA task (especially KLEBE) is suspended for any reason, including: - BOINC set to Snooze - BOINC set to Snooze GPU - BOINC set to Suspend - BOINC set to Suspend GPU - BOINC set to Suspend due to exclusive app running - BOINC set to Suspend GPU due to exclusive GPU app running - GPUGrid project set to Suspend - NOELIA KLEBE task set to Suspend - BOINC exited with "Stop running tasks" checked Something in the KLEBE exit logic has been causing driver resets and watchdog timeouts, for several months, for many of your Windows users. I sure hope you guys can work together to get a handle on it! Note: I do use the "Leave application in memory when suspended" setting, but so far as I know, that is irrelevant to GPU tasks. When a GPU task is suspended, BOINC has to remove it from memory, regardless of that user setting. It treats GPU tasks differently because there's no PageFile backing the GPU RAM. Thanks for looking into this. It's my biggest problem across all of my 20 BOINC projects. |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have two NOELIA KLEEBEbeata on my 770 and they start, then when 0.021% complete, no more progress but they keep running, 2h57m16s elapsed and 0h0m0s remaining. This was in app 8.00 and I have now aborted these WU's and try the new 8.01 app. I have now 1 with the 8.01 (cuda55) app (NOELIA KLEEBEbeta) and it is running normal. Twelve hour to finish, progress runs up, elapsed time runs up, and remaining runs down. Win7 x64, BOINC 7.0.64, driver 326.80 Greetings from TJ |
Send message Joined: 5 Jun 09 Posts: 38 Credit: 2,880,758,878 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have two NOELIA KLEEBEbeata on my 770 and they start, then when 0.021% complete, no more progress but they keep running, 2h57m16s elapsed and 0h0m0s remaining. This was in app 8.00 and I have now aborted these WU's and try the new 8.01 app. Same thing here , but after 30 minutes i stop it. http://www.gpugrid.net/result.php?resultid=7221521 |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here? Ok i started one with 8.01. But this can take some time even on my 670mhz 285gtx..I normaly dont run gpugrid on this anymore. It will need about 33hours. The short run 8.00 was ok on this card. I dont think anybody still uses a powerhungry 200series on long runs O.o DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
With the new 8.01 app they run normal! I noticed the following: On the 770 I have this one: 063px79-NOELIA_KLEEBEbeta2-0-3-RND678_0 MEM use: 1003MB Clock: 1097MHz (however I have set the clock to 1060MHz!) GPU load: 87% Temp: 65°C 7.5% done in 40 minutes On the 660 I have this one: 109nx37-NOELIA_KLEEBEbeta-0-3-RND0283_0 MEM use: 779MB Clock: 1045MHZ (as I set it) GPU load: ~88% Temp: 67°C 5.6% done in 40 minutes I now these are not the same WU's and the GPU's are not the same as well. But it is strange that the WU can manage to get the clock higher, or this must have been the result of the faulty WU that I aborted and not reboot afterwards. Due to the difference in memory load it can also be that cards with only 1MB can not do these WU's as before. That may result in some comments ;-) Greetings from TJ |
![]() ![]() Send message Joined: 28 Apr 11 Posts: 462 Credit: 958,266,958 RAC: 31,461 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Would anyone with a cc 1.3 card - Geforce GTX 200 series - please try some of the current acemdbeta v801 Noelia-KLEBE WUs and report back here? Oh and when somebody started one too on 200series, plz tell me, got my energybill today, so i would love to stop it the next hours when not needed :p DSKAG Austria Research Team: http://www.research.dskag.at ![]() |
©2025 Universitat Pompeu Fabra