Message boards :
Number crunching :
Stalled WUs?
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 21 Jan 10 Posts: 46 Credit: 1,388,234,528 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a WU which has been running for several days and seems to get stuck at a percentage complete. Then I shutdown BOINC and relaunch and the accumulated work disappears and it restarts from a much lower percentage. Rinse repeat. The WU is crunching but with no percentage progress. |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just abort it |
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
i had transfers stalled for weeks on a system that had "no more work" on gpugrid (board too slow). Aborting worked only until the next reboot. Only got rid by detaching and reattaching. may have been stuck for months as i rarely check that feature. maybe this was the "1" task the server status shows as ready to be sent. i finally got rid of it a few minutes ago. |
|
Send message Joined: 21 Jan 10 Posts: 46 Credit: 1,388,234,528 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Two more work units stalled which I had to abort. Methinks there's a systemic problem managing WUs. |
|
Send message Joined: 21 Jan 10 Posts: 46 Credit: 1,388,234,528 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It seems related to running Firefox (knowing it has h/w acceleration options) -- I'm still playing woth the settings but I can get GPUGRID work units to stall simply by opening up YouTube and playing a video. The WU percentage stops increasing but it still shows active. After 10hours I exit BOINC and restart and the hours worked drops back down to the point where it stalled. So it's still happening, and I can recreate the failure consistently. I've restarted the project to refresh resources. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It seems related to running Firefox (knowing it has h/w acceleration options) -- I'm still playing woth the settings but I can get GPUGRID work units to stall simply by opening up YouTube and playing a video.I suppose that this card is your GTX 980Ti. If it's overclocked, then you should reduce it's clock speed by 100MHz, to see if it makes it more stable. Your card reaches 78°C (172°F) which could be too much while using it for crunching and other purposes simultaneously. It is also recommended to dust off its fins with compressed air. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It seems related to running Firefox (knowing it has h/w acceleration options) -- I'm still playing woth the settings but I can get GPUGRID work units to stall simply by opening up YouTube and playing a video. I had to roll back driver to 385.41 which is the latest driver not to have issues with Firefox browser. It is on Nvidia forums, I had driver "stopped responding and recovered" while browsing with Firefox. |
|
Send message Joined: 21 Jan 10 Posts: 46 Credit: 1,388,234,528 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had to roll back driver to 385.41 which is the latest driver not to have issues with Firefox browser. It is on Nvidia forums, I had driver "stopped responding and recovered" while browsing with Firefox. That did it. However I have in my notes that 385.41 caused WU errors with Einstein@Home -- I'm awaiting for the project to issue me new WUs to verify. But as for GPUGRID, it fixed the problem. FWIW, it never crashed the driver or FFox -- it just caused GPUGRID WUs to stall but not error out. |
|
Send message Joined: 19 Apr 18 Posts: 1 Credit: 149,850 RAC: 0 Level ![]() Scientific publications
|
Just wanted to say I had this problem too. The project stalled three times from 0 to 50%, then I reduced my GTX 970's memory clock from 3800 MHz (which never had any issues running another project, Milkyway@Home) to the default of 3500 MHz and the last 50% didn't stall. I don't know if it's just a coincidence or not. I stopped running GPUGRID because of this, so if the admins want to know what project it was, just look at the last one I turned over. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A lot of the work on this project is far more demanding of video ram and gpu's than the projects you mention. Overclocking your vram was your problem not this projects. When you see "The simulation has become unstable. Terminating to avoid lock-up" it's almost always due to overclocking. Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline |
|
Send message Joined: 14 Oct 11 Posts: 31 Credit: 81,420,504 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have seen similar stalls. Just now: - Before suspending: 10 hr done, 10 min to go, yet 21% progress. - After resuming: 2 hr done, 8.5 hr to go (also 21% progress) Firefox was definitely playing YouTube, although I didn't investigate whether it was a cause. I think my driver (391.35) just came via Windows-10 update so could be quite old now. Can anyone confirm if later drivers resolve this? I'd prefer to go forwards rather than roll back, if possible. |
|
Send message Joined: 16 Jul 12 Posts: 98 Credit: 386,043,752 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I have seen similar stalls. Just now: I have a 1060 6GB with the 416.34 driver and don't have this issue, but I use chrome, not Firefox for youtube. Another thing, I use an extension that makes youtube stream in H.264 instead of VP9. I don't know how this compares to Firefox though, sorry. Maybe test it with Chrome. Here is the same extension, but for firefox, you should try it as well. Could help, could do nothing. https://addons.mozilla.org/en-US/firefox/addon/h264ify/ |
©2025 Universitat Pompeu Fabra