Message boards :
Number crunching :
WU getting stuck after short time
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 24 Feb 09 Posts: 14 Credit: 1,261,660 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I have a WU, 441130, that is constantly getting stuck. Meaning that it will run for some time and then just stop for no obvious reason. BOINC still shows the WU as running, I see the app in task manager. Yet no progress is ever made. I've paused and restarted the WU several times but after 1-2 hours it gets stuck again. No changes to my system, this just started happening with this WU. SETI WU's are running w/o issue. I have two 9600 GT's non-SLI. The WU is stuck right now, also the temp on that GPU is normal. Not slightly higher when a WU is actually running. Any ideas or things I can put in the cc_config.xml to help debug this? Thanks, Jim |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a WU, 441130, that is constantly getting stuck. Meaning that it will run for some time and then just stop for no obvious reason. BOINC still shows the WU as running, I see the app in task manager. Yet no progress is ever made. What BOINC Version? If 6.6.20, try 6.5.0 or 6.6.23 ... |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am seeing something very similar with this wu 439696. I was running on 6.6.17. Just upgraded to 6.6.23 but it is still stopping (need to suspend/resume to kick it back into life for another few minutes) |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am seeing something very similar with this wu 439696. Check preferences for suspend when computer is in use. Goto perferences here, change something, change it back, save, on client do Update and see if that helps. |
K1atOdessaSend message Joined: 25 Feb 08 Posts: 249 Credit: 444,646,963 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think there is something funny going on with these IBUCH_KID WU's. I've seen errors from various users with these. Maybe they are more "sensitive" to something? |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm also seeing it on this wu 434995 which is a "KASHIF" wu. I'll try Paul's suggestion and see what happens. Add: - when I installed 6.6.23, "Use GPU while computer in use" was unchecked in local preferences, so I already had to change that to get GPUGRID to run at all. |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Still happening. Suspend/resume and it takes off again for what seems to be somewhere around 30 mins to an hour. I have a single 9800GT with Driver version 182.06. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Still happening. Suspend/resume and it takes off again for what seems to be somewhere around 30 mins to an hour. Well, I am at a loss... Sadly, I don't have that many problems so I have less experience ...:) |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have been running GPUGrid only for 6 weeks but this is the first processing problem I have encountered. I will try installing the new driver (185.85) to see whether that helps. Update: First WU errored out almost straight away. Second WU now running. Fingers crossed.. |
|
Send message Joined: 24 Feb 09 Posts: 14 Credit: 1,261,660 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I ended up just aborting that WU. I picked up another and haven't had a single problem with the new WU. I'm on v182.50 drivers now. Really have no clue what the issue was. My .02 as someone in the software development field... It would be nice to know from the GPUGrid/BOINC devs if there are some debug options that can be enabled on the client to get useful information when this happens. That way we (royal we) wouldn't sit for days restarting & restarting a failing WU with out getting any type of useful info or results from the efforts. Thanks, Jim |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Good news: After installing 189.85 no more tasks have hung. Bad news: The two tasks that suffered from the hang both errored out (one pretty quickly, the second after probably an hour or two further processing), though this may not be the new driver (see this thread. I now have a GIANNI running. It is up to 13% with no hang so far so ... |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Good news: The GIANNI finished and validated. Now one third of the way through a KASHIF (with an IBUCH up next). On a sample of 1, the 185.85 driver may also be a little faster (perhaps 5%). I also ran a SETI WU on 185.85 and it processed and validated fine. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Euhm the Gianni never been a problem with me just the newer ones like IBUCH and KASHIFF ones. And btw running seti and gpugrid has been for me a absolute no go. After i run 3 seti units all my gpugrid units died so i never run them without rebooting to switch to one or the other. |
|
Send message Joined: 21 Mar 09 Posts: 35 Credit: 591,434,551 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The SETI WU didn't seem to cause a problem (although I was prepared for that to be the case). The KASHIF WU finished fine and validated, and the IBUCH is now 36% done and seems to be running fine. Somewhere in between the KASHIF starting and the IBUCH starting the SETI WU processed and validated. |
|
Send message Joined: 24 Dec 08 Posts: 1 Credit: 8,653,364 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
thank you all for solution! |
©2025 Universitat Pompeu Fabra