Advanced search

Message boards : Number crunching : WU No Progress

Author Message
sslickerson
Send message
Joined: 18 Nov 07
Posts: 57
Credit: 4,319,898
RAC: 0
Level
Ala
Scientific publications
watwat
Message 722 - Posted: 9 Jan 2008 | 14:47:02 UTC

This Work Unit 13594 ran for approx. 120 thousand seconds with no progress made. I aborted it. I watch each project fairly closely for just these types of errors but that just killed about 5,000 (thousand) credits. My question is, is this just a random error (it has happened to me twice in two months) or can we expect this to happen more often? If so, perhaps it is time to invest in a watchdog such as the one ROSETTA uses. That way if the program does not advance (I think it is 900 seconds for Rosetta) the WU is halted and another is started without the intervention of the user.

Just something to think about while we are still in BETA.

Timothy

sslickerson
Send message
Joined: 18 Nov 07
Posts: 57
Credit: 4,319,898
RAC: 0
Level
Ala
Scientific publications
watwat
Message 725 - Posted: 10 Jan 2008 | 7:11:08 UTC

Ok, so I attempted to run a yoyo WU about 16 hours ago and this too has resulted in zero progress. I aborted that WU and I have now started a PS3Grid WU and I will let it run overnight.

So I have had to abort 2 yoyo and 2 PS3Grid so far and the results for the current WU are already not too promising. This is clearly not an application problem but a BOINC USB build problem. What should I do? Reinstall the entire USB again (the third time actually)? Is anyone else having this problem?

Thanks,

Tim

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 727 - Posted: 10 Jan 2008 | 12:10:20 UTC - in response to Message 725.

Is anyone else having this problem?


I have seen it rarely in the past, but not just PS3GRID. I have had same problem on other hosts with other clients with other projects.

Whenever I have seen this it has been necessary to exit all boinc and restart boinc. Sometimes I would just restart the entire system.

Have you tried and restarts since problem started ?

Okan
Send message
Joined: 18 Sep 07
Posts: 7
Credit: 6,548,368
RAC: 0
Level
Ser
Scientific publications
watwat
Message 743 - Posted: 13 Jan 2008 | 1:05:18 UTC - in response to Message 727.

Is anyone else having this problem?


I have seen it rarely in the past, but not just PS3GRID. I have had same problem on other hosts with other clients with other projects.

Whenever I have seen this it has been necessary to exit all boinc and restart boinc. Sometimes I would just restart the entire system.

Have you tried and restarts since problem started ?


I just aborted a PS3 job after 37h worktime.
I had two pending PS3 jobs and did some yoyo work in between.
Will check tomorrow, if yoyo works fine (restarting Boinc as well now)

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 209
Credit: 4,095,161,456
RAC: 12,331,765
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 744 - Posted: 13 Jan 2008 | 2:53:08 UTC

I agree. Restarting BOINC, or rebooting has always fixed it for me. No need to abort.
____________
Reno, NV
Team: SETI.USA

sslickerson
Send message
Joined: 18 Nov 07
Posts: 57
Credit: 4,319,898
RAC: 0
Level
Ala
Scientific publications
watwat
Message 745 - Posted: 13 Jan 2008 | 2:55:16 UTC - in response to Message 743.

Is anyone else having this problem?


I have seen it rarely in the past, but not just PS3GRID. I have had same problem on other hosts with other clients with other projects.

Whenever I have seen this it has been necessary to exit all boinc and restart boinc. Sometimes I would just restart the entire system.

Have you tried and restarts since problem started ?


I just aborted a PS3 job after 37h worktime.
I had two pending PS3 jobs and did some yoyo work in between.
Will check tomorrow, if yoyo works fine (restarting Boinc as well now)


After the restart my PS3Grid WU are working again. The YoYo WU however were not.

sslickerson
Send message
Joined: 18 Nov 07
Posts: 57
Credit: 4,319,898
RAC: 0
Level
Ala
Scientific publications
watwat
Message 746 - Posted: 13 Jan 2008 | 2:59:15 UTC - in response to Message 744.

I agree. Restarting BOINC, or rebooting has always fixed it for me. No need to abort.


This is a bug that needs to be looked at though given that 24+ hours of work lost \"costs\" us over 3700 credits. This is not such a huge deal for almost any other BOINC project but it is such a massive waste of resources here.

seaking57
Send message
Joined: 6 Jan 08
Posts: 1
Credit: 1,068,345
RAC: 0
Level
Ala
Scientific publications
watwat
Message 747 - Posted: 13 Jan 2008 | 5:04:55 UTC - in response to Message 725.

Ok, so I attempted to run a yoyo WU about 16 hours ago and this too has resulted in zero progress. I aborted that WU and I have now started a PS3Grid WU and I will let it run overnight.

So I have had to abort 2 yoyo and 2 PS3Grid so far and the results for the current WU are already not too promising. This is clearly not an application problem but a BOINC USB build problem. What should I do? Reinstall the entire USB again (the third time actually)? Is anyone else having this problem?

Thanks,

Tim


I\'m having the same problems. After a few hours the WU looks like its running but really it\'s not. Restarting was only a temporary fix and rebuilding the thumb drive img. didn\'t help ether. I formatted a SD card and copied the files from the thumb drive over to the card and all seems to be working now. I’m going to let it run over night to see how it does but so far after three hours it’s doing much better. BTW the thumb drive was a SanDisk Cruzer 2GB.
____________

Okan
Send message
Joined: 18 Sep 07
Posts: 7
Credit: 6,548,368
RAC: 0
Level
Ser
Scientific publications
watwat
Message 813 - Posted: 3 Feb 2008 | 22:31:06 UTC - in response to Message 747.
Last modified: 3 Feb 2008 | 22:31:56 UTC

I\'m having troubles again.
A PS3 job was running for 30+ hours with no progress.
It shows 89% work done now but it remained the same since maybe 10h.
I can hear the fan so it must be crunching.
Is it always starting from the beginning after doing a yoyo-job?
Is there a bug-fix being worked on to make it stable to switch between 2 projects w/o issues?

sslickerson
Send message
Joined: 18 Nov 07
Posts: 57
Credit: 4,319,898
RAC: 0
Level
Ala
Scientific publications
watwat
Message 814 - Posted: 4 Feb 2008 | 3:18:54 UTC - in response to Message 813.

I\'m having troubles again.
A PS3 job was running for 30+ hours with no progress.
It shows 89% work done now but it remained the same since maybe 10h.
I can hear the fan so it must be crunching.
Is it always starting from the beginning after doing a yoyo-job?
Is there a bug-fix being worked on to make it stable to switch between 2 projects w/o issues?


I ditched the PS3GRID LIVE USB install and installed the full YDL and everything has been fine since. Before, all of my yoyo and PS3GRID jobs ran for dozens of hours with no progress. It seems you are going through the same thing here. It also seems there is no intention of fixing the USB Pen Drive any time soon so try the full YDL instead. Good Luck.

Okan
Send message
Joined: 18 Sep 07
Posts: 7
Credit: 6,548,368
RAC: 0
Level
Ser
Scientific publications
watwat
Message 815 - Posted: 4 Feb 2008 | 9:12:21 UTC - in response to Message 814.

The pendrive just boots up and shots down quite fast.
YDL was installed and I think I just have to change the bootloader again :-/

Post to thread

Message boards : Number crunching : WU No Progress

//