Advanced search

Message boards : Number crunching : Long Running Tasks

Author Message
Paul Raney
Send message
Joined: 26 Dec 10
Posts: 115
Credit: 416,576,946
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 21511 - Posted: 19 Jun 2011 | 3:23:52 UTC

I just completed a task 4092363 and it ran for 51,000 seconds or almost 14 hours. I have never had a task run this long. Are these longer running tasks? In the past the 8 - 12 hour tasks required about 5 - 6 hours.

Has anyone else run into this situation?

thank you


MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21512 - Posted: 19 Jun 2011 | 3:31:22 UTC - in response to Message 21511.
Last modified: 19 Jun 2011 | 3:42:38 UTC

I just completed a task 4092363 and it ran for 51,000 seconds or almost 14 hours. I have never had a task run this long. Are these longer running tasks? In the past the 8 - 12 hour tasks required about 5 - 6 hours.

Has anyone else run into this situation?

thank you


Yes they vary a bit depending on the task. I've had "long" work units that take my factory OC'ed 570 over 12 hours and others that come in under 6. It just depends on the work unit. You'll need to see what the name is as they usually create a batch of them with the same names, and the quicker ones will be from a different batch with a different name.

Looking at the wu you liked to there were a couple of exits due to "no heartbeat from core client". Also I notice you have the card running at 1.81Ghz, so presumable you've OC'ed it which might be effecting it.
____________
BOINC blog

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21517 - Posted: 19 Jun 2011 | 8:37:49 UTC - in response to Message 21512.
Last modified: 19 Jun 2011 | 10:28:48 UTC

Paul, your GTX570 must be downclocking. I expect the OC is too high, or the GPU is being overtaxed (Voltage/Power). Most likely you are running at half speed or less, as the card tries to protect itself.

TONI_AGGdense tasks use lots of power and push the GPU close to their limit. So when you OC you might be using more power than these GPU's were designed to take.

I'm running a TONI_AGGdense task now on my GTX470 and it will take less than 7h. So your GPU should take less than 6h. I'm using SWAN_SYNC and have 2 CPU cores freed up, so it's well optimized. Utilization on W2003 is 98%, no OC used. I occasionally get typing lag, but I'm not using the system for anything other than light work.

I suggest you run with at most a 5% OC for now.

Good luck,

Paul Raney
Send message
Joined: 26 Dec 10
Posts: 115
Credit: 416,576,946
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 21529 - Posted: 20 Jun 2011 | 3:26:13 UTC - in response to Message 21517.

Thanks for the note. How do I dedicate more CPUs to the GPU? I have Swan_Sync=0. Can I set it to 1 and dedicate two processors to the GPU?

Thank you.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21531 - Posted: 20 Jun 2011 | 5:24:36 UTC - in response to Message 21529.
Last modified: 20 Jun 2011 | 5:25:49 UTC

No. Go into the Preferences in BOINC manager, click on the Processor Usage tab and set the value for "On multiprocessor systems use at most _ % of the processors" to the appropriate value. Your computer has 4 processors so to leave 2 processors free for feeding the GPU you would put 50 in the box.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,225,565,968
RAC: 1,955,187
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21676 - Posted: 14 Jul 2011 | 11:05:25 UTC
Last modified: 14 Jul 2011 | 11:06:50 UTC

I had a strange kind of long GIANNI_KKFREE5 type wu.
It was restarted several times, due to system restarts for different reasons (Windows update, and possibly two crash recovery system restarts - maybe the latter is the explanation for the strange behavior of this wu). After all it took 61077s (2 minutes less than 17 hours) to process. (normal running time is 9 hours for this kind of wu)
My GPU was not downclocking and was not overheated, the progress indicator didn't go back to 0% during processing (it's used to do, after crash recovery system restarts, possibly when the GPUGrid client crashes).
There is two revelant error messages in the wu's log:
1. MDIO: read error for file "restart.coor", byte number 4: number of atoms (0) != (36497) expected
(at least now I know the number of atoms for this one :) )
2. No heartbeat from core client for 30 sec - exiting
Any ideas to comfort me? :)

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,117,516,377
RAC: 15,422,935
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21678 - Posted: 15 Jul 2011 | 2:07:48 UTC
Last modified: 15 Jul 2011 | 2:27:05 UTC

Long running tasks are supposed to be 8 to 12 hours on the fastest cards, but the latest TONI long units have been running about 6 to 7 hours on my GTX 480 running Windows 7. They are not exactly the fastest combination card and platform. The fastest platform combinations can finish the tasks in under 5 hours. So shouldn't these work units be enlarged, (incorporating more useful computational work, not just slowing them down or adding useless stuff)?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21685 - Posted: 16 Jul 2011 | 19:32:54 UTC - in response to Message 21676.

Zoltan, updates run at highest CPU priority, so if they took a while (especially if you are running Boinc with higher priority using efmer) they could have caused the no heartbeat problem. If Boinc autostarts and Windows is finishing installing updates following a reboot and trying to recover there is a good chance this can happen.
The read error sounds like a corrupt file (more likely the system rebooted when the file was in use and then the file was restored to an earlier state).
So next time close Boinc before doing the updates.

Post to thread

Message boards : Number crunching : Long Running Tasks

//