Message boards :
Number crunching :
Work unit stuck?
Message board moderation
| Author | Message |
|---|---|
[BAT] tutta55Send message Joined: 5 Apr 07 Posts: 11 Credit: 11,175,619 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a work unit that stays at 48.352%. CPU time is already at 9h26 and counting, which is at least double as much as usual. Time to completion counts UP as well. This is the WU http://www.gpugrid.net/workunit.php?wuid=156801 What to do? Wait? Abort? BOINC.BE: For Belgians who love the smell of glowing red cpu's in the morning Tutta55's Lair |
Stefan LedwinaSend message Joined: 16 Jul 07 Posts: 464 Credit: 298,573,998 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I never had a stuck WU, but have you already tried to stop and re-start BOINC? Or reboot the computer and see if it continues to crunch? pixelicious.at - my little photoblog |
[BAT] tutta55Send message Joined: 5 Apr 07 Posts: 11 Credit: 11,175,619 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I never had a stuck WU, but have you already tried to stop and re-start BOINC? Or reboot the computer and see if it continues to crunch? I already tried stop/restart. Now I also did a reboot. The situation has remained the same. CPU time counting up, but % not moving. |
[BAT] tutta55Send message Joined: 5 Apr 07 Posts: 11 Credit: 11,175,619 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I finally aborted the WU. There are some strange messages in the error log. A whole series of "MDIO ERROR: illegal value: incorrect value for stepnum". Maybe that can tell the devs something? |
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I finally aborted the WU. There are some strange messages in the error log. A whole series of "MDIO ERROR: illegal value: incorrect value for stepnum". Maybe that can tell the devs something? Seems to be quite random error as it's been already finished successfully. Thanks though, ignasi |
[BAT] tutta55Send message Joined: 5 Apr 07 Posts: 11 Credit: 11,175,619 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I finally aborted the WU. There are some strange messages in the error log. A whole series of "MDIO ERROR: illegal value: incorrect value for stepnum". Maybe that can tell the devs something? I can't say for sure, since I only noticed late that %done stopped moving, but I think it happened after a reboot of the system. I suspect that resuming the work unit has made it gone wrong. |
[BAT] tutta55Send message Joined: 5 Apr 07 Posts: 11 Credit: 11,175,619 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It happened again. A work unit with % done stuck: http://www.gpugrid.net/result.php?resultid=218594 This time I am sure it happened after a stop/restart. I had to reboot after a Windows update. After that the WU's % done stopped moving. The error log contains "MDIO ERROR: illegal value: incorrect value for stepnum" again. I have the impression something can go wrong when resuming a WU. Not in all cases though, most of the times WU's resume normally. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Do you suspend the WU or you just reboot the machine while the app is running? The state might get corrupted in the last case. gdf |
©2025 Universitat Pompeu Fabra