Advanced search

Message boards : Number crunching : Full-atom molecular dynamics for Cell processor 5.03

Author Message
Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 901 - Posted: 28 Feb 2008 | 13:14:04 UTC

I see we have a new application running. See front page for a description.

Question:
My first one ran for 23 hours 8 minutes. Credited OK.

My second one has already run 1 day 6 hours and is showing 10 more hours to go. Progress shows 74% done which is about correct for those times

Is this OK ?

Are these tasks going to have variable run lengths ?


____________
Alpha Tester ~~ BOINCin since 10-Apr-2004 (2.28) ~~~ Join team USA

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 902 - Posted: 28 Feb 2008 | 16:30:20 UTC

I also had one which took a little bit longer (one day and 4 hours or so) than the normal 22-23 hours for the new WUs, and it was ok and I got the credits for it... ;)
____________

pixelicious.at - my little photoblog

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 905 - Posted: 28 Feb 2008 | 18:22:59 UTC - in response to Message 902.

I also had one which took a little bit longer (one day and 4 hours or so) than the normal 22-23 hours for the new WUs, and it was ok and I got the credits for it... ;)


The current new workunits should last approximately 24 hours as the previous ones.
gdf

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 910 - Posted: 28 Feb 2008 | 22:34:23 UTC - in response to Message 905.

I also had one which took a little bit longer (one day and 4 hours or so) than the normal 22-23 hours for the new WUs, and it was ok and I got the credits for it... ;)


The current new workunits should last approximately 24 hours as the previous ones.
gdf

If you say so, but they are not.

I have not touched my PS3 or changed anything. I was wondering if there was a problem or something with my PS3, but since someone else had one run longer I guess its not me.

The one I mentioned earlier ran 36 hours.

The next one I have running now is estimating at about 32 hours (5 done, 27 to go), although the estimate can be off.

Its not a problem if they run longer, I was just wondering.

AnRM
Send message
Joined: 22 Feb 08
Posts: 13
Credit: 1,458,414
RAC: 0
Level
Ala
Scientific publications
watwat
Message 911 - Posted: 29 Feb 2008 | 2:54:17 UTC
Last modified: 29 Feb 2008 | 3:54:01 UTC

Hi KK. FWIW we\'ve processed 3 \'Full-atoms\' so far and they have run approx. 20,23,and 21 hours on our PS3 (we are using a pen drive).....Cheers, Rog.

Fred
Send message
Joined: 27 Jun 07
Posts: 4
Credit: 1,003,357
RAC: 0
Level
Ala
Scientific publications
wat
Message 912 - Posted: 29 Feb 2008 | 2:54:53 UTC - in response to Message 901.

I see we have a new application running. See front page for a description.

Question:
My first one ran for 23 hours 8 minutes. Credited OK.

My second one has already run 1 day 6 hours and is showing 10 more hours to go. Progress shows 74% done which is about correct for those times

Is this OK ?

Are these tasks going to have variable run lengths ?



I ran a new work unit for about 22 hours, Its progress went down from 98% to 8% and ran for another day. Now it started over again. I aborted.

-Fred

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 913 - Posted: 29 Feb 2008 | 13:24:44 UTC - in response to Message 912.

I see we have a new application running. See front page for a description.

Question:
My first one ran for 23 hours 8 minutes. Credited OK.

My second one has already run 1 day 6 hours and is showing 10 more hours to go. Progress shows 74% done which is about correct for those times

Is this OK ?

Are these tasks going to have variable run lengths ?



I ran a new work unit for about 22 hours, Its progress went down from 98% to 8% and ran for another day. Now it started over again. I aborted.

-Fred

I think I see the same thing, the one running which had an estimate of 32 hours and was at 20% done, has now run for 18 hours with an estimate now of 50 hours more to go and only 4% done. I\'ll let it run to see what happens, as the estimates can be wrong.

Mitchell
Send message
Joined: 18 Aug 07
Posts: 15
Credit: 22,771,987
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 915 - Posted: 29 Feb 2008 | 21:22:28 UTC - in response to Message 913.

I have one running now for 40 hours, seems to use only up to 100% of the CPU according to the top utility, and shows 0.0% in the progress bar. I\'ve considered aborting it, but would hate to lose 40 hours of time.

AnRM
Send message
Joined: 22 Feb 08
Posts: 13
Credit: 1,458,414
RAC: 0
Level
Ala
Scientific publications
watwat
Message 916 - Posted: 29 Feb 2008 | 22:21:06 UTC
Last modified: 29 Feb 2008 | 22:37:06 UTC

I\'m new to this PS3 process and hardware (so don\'t laugh.... :)) but is it possible that the PS3s throttle back or have processing problems if they get too hot?? I have cooling fans on our PS3 and it seems to run these WUs without problems (at least so far!! Touch wood!!!) Just a thought....Cheers, Rog.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 918 - Posted: 1 Mar 2008 | 10:23:29 UTC - in response to Message 916.

I\'m new to this PS3 process and hardware (so don\'t laugh.... :)) but is it possible that the PS3s throttle back or have processing problems if they get too hot?? I have cooling fans on our PS3 and it seems to run these WUs without problems (at least so far!! Touch wood!!!) Just a thought....Cheers, Rog.


This is an interesting hypothesis. It could be that the hardware reduced the clock or something when it gets too hot. I have never understood the differences between runs, if people is not using the machine, as many of you. The run on our machines that are climatized are very regular. I will start crunching some of the new wus and see.

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 919 - Posted: 1 Mar 2008 | 15:10:11 UTC - in response to Message 918.

I\'m new to this PS3 process and hardware (so don\'t laugh.... :)) but is it possible that the PS3s throttle back or have processing problems if they get too hot?? I have cooling fans on our PS3 and it seems to run these WUs without problems (at least so far!! Touch wood!!!) Just a thought....Cheers, Rog.


This is an interesting hypothesis. It could be that the hardware reduced the clock or something when it gets too hot. I have never understood the differences between runs, if people is not using the machine, as many of you. The run on our machines that are climatized are very regular. I will start crunching some of the new wus and see.


This should not be a factor at this time for me, we have very cool weather here, temps from 0-15C now and I keep my windows open a little to let cool air in. I do not run the heat as my PS3 and computers put out enough to keep it comfortable in my room. The weather has not changed that much and all of my past ga 5.04 tasks ran very much about the same 24 hours plus 47 to 53 minutes, even during the hotter weather months with only a variance of about 5 or so minutes. My other computers are very quiet, fans not running on high like in the summer.

I have now completed 3 in a row of the new cellmd 5.03 with run times of 23, 36 and 40 hours.

I see two cellmd 5.03 from 2/20 and 2/21 that ran 33 and 32 hours. Yet before those 2/10 to 2/18 ran 9 ga 5.04 all within 5 or so minutes of each other at about the 25 hours time (24 hours plus 50 to 54 minutes) and after those 2 cellmd ran 4 more of the ga 5.04 all with about the same times again, within 5 minutes of each other and at just about 25 hours (24 hours plus 45 to 48 minutes).

I have ga 5.04 running again now, I\'ll see if it runs as predicted, it has run 6 hours so it should be 19 more to go.

I have not touched my PS3 during this time, was on vacation and just got back so I\'m catching up on other stuff. I was thinking maybe I should restart it, but I will let the task running finish first to see what happens.

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 920 - Posted: 1 Mar 2008 | 17:56:14 UTC

Seems like I also have now one of the faulty Full-atom molecular dynamics WUs...
It looks like it is stuck at 64% - I noticed this a few hours ago and watched it now for a few minutes... Here´s what I observed ->



The WU in question is the first one... The screenshot was taken at 18:37, WU had a CPU time of 14:34:45, the speed in Mflops shows approx 200 Mflops more than the other two PS3s, but there is no info for the CPU efficiency...

Here´s a screenshot one minute later ->



The WU in question is still the first one... The screenshot was taken at 18:38, WU had a CPU time of 14:34:41, the speed in Mflops shows almost nothing compared to the other two PS3s, and only 0.0237 % CPU efficiency...

After some time the CPU time will jump back to 14:34:45 after some minutes back to 14:34:41... A looping workunit?

I´ll abort it now, because I wasted almost a half day on it, without any change.
____________

pixelicious.at - my little photoblog

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 921 - Posted: 1 Mar 2008 | 18:18:49 UTC - in response to Message 919.
Last modified: 1 Mar 2008 | 18:23:25 UTC

OK, it\'s not heat. I am looking at returned results.
Write down the name of the faulty ones.
All but one of the returned results for the new TIM3 workunits last between 69,000 and 71,000 seconds.
g

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 135,911,881
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 923 - Posted: 1 Mar 2008 | 18:59:24 UTC

Ok, the faulty one I had was this one -> http://www.ps3grid.net/PS3GRID/workunit.php?wuid=14880

I also noticed after aborting this faulty one, I got errors on the GA WU I had allready downloaded. Ther error was


localhost.localdomain PS3GRID 02.03.2008 02:46:53 Restarting task uq20246-W5R2_5-7-nodelete_3 using ga version 504
localhost.localdomain PS3GRID 02.03.2008 02:46:41 If this happens repeatedly you may need to reset the project.
localhost.localdomain PS3GRID 02.03.2008 02:46:38 Task uq20246-W5R2_5-7-nodelete_3 exited with zero status but no \'finished\' file
localhost.localdomain PS3GRID 02.03.2008 02:43:53 Restarting task uq20246-W5R2_5-7-nodelete_3 using ga version 504
localhost.localdomain PS3GRID 02.03.2008 02:43:49 If this happens repeatedly you may need to reset the project.

The HDD light of the PS3 was lighting for a few minutes and I couldn´t access it with BOINCview, so hooked the PS3 up to my TV, plugged in a mouse and a keyboard, but I couldn´t do anything - I had to reset the PS3 and now the GA WU that showed the errors about exiting with zero status before, is running again...
____________

pixelicious.at - my little photoblog

Fred
Send message
Joined: 27 Jun 07
Posts: 4
Credit: 1,003,357
RAC: 0
Level
Ala
Scientific publications
wat
Message 924 - Posted: 1 Mar 2008 | 23:57:39 UTC - in response to Message 921.

OK, it\'s not heat. I am looking at returned results.
Write down the name of the faulty ones.
All but one of the returned results for the new TIM3 workunits last between 69,000 and 71,000 seconds.
g


I agree. I was running my PS3 on a NYKO Intercooler from day 1 and it seemed the back side was very warm - hence I thought it was doing some real good. After my 2nd intercooler failed because of cheap fans, I took a chance and ran the PS3 without any additional cooling. Now, I noticed the air comming out the back is a little cooler! I think because of the early overheating XBOX units scare, everyone thought the PS3 need help cooling.

I have not noticed a difference in its operating now w/o xtra fans.

-Fred

AnRM
Send message
Joined: 22 Feb 08
Posts: 13
Credit: 1,458,414
RAC: 0
Level
Ala
Scientific publications
watwat
Message 925 - Posted: 2 Mar 2008 | 7:17:45 UTC

Thanks for the information re. the NYKO Intercooler, Fred. That\'s the one I was using. I also checked on-line and the consensus seems to be that the PS3 has good cooling as long as you keep the dust out(once every month was one suggestion)and don\'t restrict the air flow. Sooo...I have removed the Intercooler and we\'ll see what happens....Cheers, Rog.

Profile Krunchin-Keith [USA]
Avatar
Send message
Joined: 17 May 07
Posts: 512
Credit: 111,288,061
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 926 - Posted: 2 Mar 2008 | 15:48:10 UTC - in response to Message 919.

Continuing from my previous post.

The ga 5.04 ran exactly as predicted 24 hours 47 minutes.

I would think if there was something wrong with my PS3 that after 3 of the cellmd 5.03 with variable times, that this one would also be off, it was not, it is within 1 to 2 minutes of the 2 that ran before the cellmd tasks.

I think there is a problem with the cellmd tasks, if they are suppose to run a fixed length of time.

---

for Fred, I too had one of those cheap worthless NYKO Intercoolers in the beginning. Since this project started, mine ran about 90 days and then stopped. My PS3 has run fine at full processing since then through summer etc without it. I got it because I\'m use to fans on PC\'s, I think the PS3 is built entirely different and they have compensated well for air flow and extra fans are not needed.

for everyone else, just make sure your PS3 has good airflow around it, nothing blocking any vents and that the hot air can escape somewhere (Don\'t put it inside a closet or cabinet). Keep it closer to the floor if possible where the air is cooler, usually.

Cheers,
Keith

Profile [AF>HFR>RR] Jim PROFIT
Send message
Joined: 3 Jun 07
Posts: 107
Credit: 31,331,137
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 928 - Posted: 2 Mar 2008 | 21:38:56 UTC

I have somme strange things with this application.

When the WU started, after 1 hour, nothing seem to happen.
So i restart my PS3, and then after 20 minutes, the WU show me 2%.

Then i had to try some things on my PS3, and then Yoyo restart his WU to finish it.

But later i saw that this was PS3 WU that is calculated . And after about 2 hours, the percentage had not increase.
I reboot the PS3 and the after 20 minutes, a checkpoint was made.
But the WU percentage show me 2%!!!
Before this, the WU show me 24%!!!

Why??

Profile Bender10
Avatar
Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 929 - Posted: 3 Mar 2008 | 15:01:47 UTC

I have a TIM3 wu (#15017) that seems to be running long.

I checked my boxes this morning before heading to werk...This wu was up to 34 hours and still going strong (http://www.ps3grid.net/result.php?resultid=19218).

I\'ll let it run and see what happens.....

____________


Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.

AnRM
Send message
Joined: 22 Feb 08
Posts: 13
Credit: 1,458,414
RAC: 0
Level
Ala
Scientific publications
watwat
Message 930 - Posted: 3 Mar 2008 | 18:39:39 UTC
Last modified: 3 Mar 2008 | 19:37:09 UTC

First problem: maybe a \'Full atom\' WU problem or maybe just a BOINC/WU system one but our WU errored out after a BOINC initiated benchmarking run. It never recovered properly and \'hung\'. The WU resumed OK after a PS3 shutdown and reboot. Anyone seen this problem before? The WU restarted at about 6 hours and indicates another 18 to go which is in the ballpark. Hope it completes OK. We are using a 1GB pen drive setup...Rog.
EDIT: same problem as Others...time dropped to 4 hours indicated...\'time to completion\' climbing. Aborted WU and now can\'t download new WUs....all too sad as we have gone over to yoyo to keep the PS3 crunching. Will try again tomorrow.

Profile Bender10
Avatar
Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 932 - Posted: 3 Mar 2008 | 23:36:45 UTC
Last modified: 3 Mar 2008 | 23:45:30 UTC

HOLY Crap!!!

That TIM3 wu (#15017), finally finished successfully. 45.6 hours.

Here is the Task data (again) http://www.ps3grid.net/result.php?resultid=19218.

At least it finished....

It shows a message (error?) in the middle of the stderr out file.

<core_client_version>5.10.6</core_client_version>
<![CDATA[
<stderr_txt>
ENC
# number of SPEs used 6
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
.....
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
FILE_LOCK::unlock(): close failed.: Bad file descriptor
called boinc_finish
ENC
# number of SPEs used 6
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
.....
</stderr_txt>

EDIT: It almost looks like it ran twice the normal time due to an error at the end of a normal run??

____________


Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 934 - Posted: 4 Mar 2008 | 9:35:07 UTC - in response to Message 932.

HOLY Crap!!!

That TIM3 wu (#15017), finally finished successfully. 45.6 hours.

Here is the Task data (again) http://www.ps3grid.net/result.php?resultid=19218.

At least it finished....

It shows a message (error?) in the middle of the stderr out file.

<core_client_version>5.10.6</core_client_version>
<![CDATA[
<stderr_txt>
ENC
# number of SPEs used 6
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
.....
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
FILE_LOCK::unlock(): close failed.: Bad file descriptor
called boinc_finish
ENC
# number of SPEs used 6
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
.....
</stderr_txt>

EDIT: It almost looks like it ran twice the normal time due to an error at the end of a normal run??


I will check on this. Thanks for letting it run.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 935 - Posted: 4 Mar 2008 | 9:42:44 UTC - in response to Message 934.

HOLY Crap!!!

That TIM3 wu (#15017), finally finished successfully. 45.6 hours.

Here is the Task data (again) http://www.ps3grid.net/result.php?resultid=19218.

At least it finished....

It shows a message (error?) in the middle of the stderr out file.

<core_client_version>5.10.6</core_client_version>
<![CDATA[
<stderr_txt>
ENC
# number of SPEs used 6
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
.....
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
FILE_LOCK::unlock(): close failed.: Bad file descriptor
called boinc_finish
ENC
# number of SPEs used 6
B no 0 ./restart.coor ./restart.vel
B no 0 ./restart.coor ./restart.vel
.....
</stderr_txt>

EDIT: It almost looks like it ran twice the normal time due to an error at the end of a normal run??


I will check on this. Thanks for letting it run.



YES. There is a problem. Restart does not work properly. If you run it continuously it should work, but if you suspend it, it starts from the beginning. I will patch it now.

Profile UL1
Send message
Joined: 16 Sep 07
Posts: 56
Credit: 35,013,195
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 939 - Posted: 5 Mar 2008 | 19:58:56 UTC

WU14776 ran for about 56 hours with zero % progress and 24 hours of remaining time, before I restarted it. But again only the calculation time is running and the other two indicatoirs seem to be stuck. Should I abort this WU? Deadline is nearing...

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 940 - Posted: 6 Mar 2008 | 8:39:24 UTC - in response to Message 939.

WU14776 ran for about 56 hours with zero % progress and 24 hours of remaining time, before I restarted it. But again only the calculation time is running and the other two indicatoirs seem to be stuck. Should I abort this WU? Deadline is nearing...


Yes please. See the thread on workunits *TIM3*.
g

Post to thread

Message boards : Number crunching : Full-atom molecular dynamics for Cell processor 5.03

//