New CPU work units

Message boards : News : New CPU work units
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Trotador

Send message
Joined: 25 Mar 12
Posts: 103
Credit: 14,948,929,771
RAC: 11,649
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38788 - Posted: 1 Nov 2014, 14:02:57 UTC

I'm crunching some of these units in my dual processor 32/48 threads machines. They are sandy bridge (32 threads) and ivy bridge (48 threads) xeon based machines.

In the 32 thread machine it has been quite straightforward, it has finished the first unit in 3h5m executing CPU MD v9.02 (mtavx)with CPU kicking in at turbo speed (3,3 GHz). No other boinc project in execution.

In the 48 thread it has been a little bit funnier :), first units crashed all just at the beggining, reading the stddr I learnt that the gromacs application can not work well with over 32 threads, but it will try anyway, so launching with 46 threads available other two reserved two GPUGRID GPU units) ended in error (11 units in a row).

So, while I investigated how to setup an app_config.xml file for mt units, I reduced the % of available processors until it reached 32 and started another MT unit that this time executed properly and finished in something less than 3h.

Then, I copied the app_config.xml file in the GPUGRID folder, enabled again 46 threads and crossed my fingers. It worked fine, 1 MT task using 32 threads and rest of threads executing Rosseta units. Additionally 2 GPUGRID GPU tasks. This time it need about 3h10m which i think should be because of the overall load in the machine.

I'll execute some more units and report of more findings if noticeable.




ID: 38788 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38789 - Posted: 1 Nov 2014, 15:17:58 UTC

I powered on my old workstation with two xeon's and two slow GTX 660's.
I have allowed BOINC to use all 8 cores, requested new work for GPUGRID and got 2 GPU WU's SR and one CPU. This CPU WU runs on 4 cores it says but in taskmaanager it actually used 92%. I don't mind as I allowed to use all cores, but I would have expect that it uses 6 cores, as there are 6 cores free. Two for the GPU WU's, so 8-2=6.

Am I thinking wrong here?
Greetings from TJ
ID: 38789 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,594,296,466
RAC: 117,924
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38894 - Posted: 13 Nov 2014, 0:54:04 UTC

Matt,

Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task.

My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks.

Thanks for all the effort you put in.
captainjack
ID: 38894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38895 - Posted: 13 Nov 2014, 3:26:21 UTC - in response to Message 38894.  
Last modified: 13 Nov 2014, 3:28:27 UTC

Matt,

Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task.

My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks.

Thanks for all the effort you put in.
captainjack


I am actually very familiar with BOINC Work Fetch.

Essentially, what it does is: You have 2 preferences, the "Min Buffer" and the "Additional Buffer".
- When BOINC doesn't have enough work to keep all devices busy for "Min Buffer", or has an idle device presently, it will ask projects for work.
- When it asks, it asks for "Enough work to fill the idle devices, plus enough work to saturate the devices for [Min Buffer + Additional Buffer] time.", properly taking into account that some tasks are MT and some aren't. It correctly asks for that amount, because it minimizes the RPC web calls to the projects.

When BOINC contacted GPUGrid, it likely worked correctly, to satisfy your cache settings. If you think otherwise, then turn on <work_fetch_debug>, abort all of the unstarted tasks, and then let work fetch run, then copy the Event Log data to show us what happened.

Feel free to turn on the <work_fetch_debug> flag to see what BOINC is doing during work fetch. http://boinc.berkeley.edu/wiki/Client_configuration

Regards,
Jacob
ID: 38895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 9 May 13
Posts: 171
Credit: 4,594,296,466
RAC: 117,924
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38903 - Posted: 13 Nov 2014, 22:06:28 UTC

Jacob,

Per your suggestion, I aborted all tasks, disabled the app_config file, turned on the work_fetch_debug option, started BOINC, and allowed new GPUGRID tasks. It downloaded one task.

Then I aborted that task, enabled the app_config file, restarted BOINC and allowed new tasks. It downloaded one task.

Then I turned off the work_fetch_debug option, aborted the task, restarted BOINC, and allowed new tasks. It downloaded one task.

No idea why it downloaded 16 tasks at one time yesterday. Must have been sun spots or something like that. Anyway, it seems to be working today.

Thanks for the suggestion.
captainjack
ID: 38903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38904 - Posted: 13 Nov 2014, 22:12:28 UTC - in response to Message 38903.  
Last modified: 13 Nov 2014, 22:13:45 UTC

Strange.
The only things I can think of, offhand, would be:
- maybe your local cache of work-on-hand had been much lower during the "16-task-work-fetch", as compared to the "1-task-work-fetch"
- maybe your cache settings ("Min buffer" and "Max additional buffer") were different between the fetches.

Anyway, I'm glad to hear it's working for you!

If you have any questions/problems related to work fetch, grab some <work_fetch_debug> Event Log data, and feel free to PM me. I am a work fetch guru -- I helped David A (the main BOINC designer) make sure work fetch works well across projects, resources (cpus, gpus, asics), task types (st single threaded, mt multi threaded), etc. The current BOINC 7.4.27 release does include a handful of work fetch fixes compared to the prior release. You should make sure all your devices are using 7.4.27.

Regards,
Jacob
ID: 38904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39012 - Posted: 23 Nov 2014, 19:49:32 UTC

This application appears to have problems restarting from a checkpoint -
I suspended it for a few days, then when I told it to resume, it gave a
computation error less than a second later.

Test application for CPU MD v9.01 (mtsse2)
http://www.gpugrid.net/result.php?resultid=13426589
http://www.gpugrid.net/workunit.php?wuid=10302711
ID: 39012 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39018 - Posted: 24 Nov 2014, 21:59:37 UTC - in response to Message 39012.  

ID: 39018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Trotador

Send message
Joined: 25 Mar 12
Posts: 103
Credit: 14,948,929,771
RAC: 11,649
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39081 - Posted: 5 Dec 2014, 20:43:58 UTC
Last modified: 5 Dec 2014, 20:44:23 UTC

So, is it still a Test application? Not ready for science production yet?
ID: 39081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39084 - Posted: 5 Dec 2014, 23:51:06 UTC - in response to Message 39018.  

Robert, was LAIM on?


What's LAIM? How do I tell if it's on?
ID: 39084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39085 - Posted: 6 Dec 2014, 0:03:53 UTC - in response to Message 39084.  

Robert, was LAIM on?

What's LAIM? How do I tell if it's on?

LAIM
ID: 39085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExpeditionHope

Send message
Joined: 1 Jun 14
Posts: 1
Credit: 12,837,497
RAC: 0
Level
Pro
Scientific publications
watwatwat
Message 39086 - Posted: 6 Dec 2014, 3:15:25 UTC - in response to Message 39085.  

LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab.
ID: 39086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39093 - Posted: 8 Dec 2014, 0:51:59 UTC - in response to Message 39086.  

LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab.


It's on. However, I may have installed some updates and rebooted while the workunit was suspended.
ID: 39093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39309 - Posted: 26 Dec 2014, 2:08:34 UTC

The CPU work units apparently use GROMACS 4.6, which has provisions for GPU acceleration also. Is that being planned?
ID: 39309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39325 - Posted: 28 Dec 2014, 17:03:00 UTC

It looks like the work units have now gone from 4 cores to 6 cores. It is possible that the difference is due to an increase in the number of cores I allowed in BOINC, but I think it is more likely to be a change in the work units themselves.
GPUGRID	9.03 Test application for CPU MD (mtavx)	73801-MJHARVEY_CPUDHFR2-0-1-RND3693_0	- (-)	6C 

That is perfectly OK with me, and I am glad to find a project that uses AVX.
ID: 39325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39328 - Posted: 28 Dec 2014, 19:01:04 UTC - in response to Message 39325.  

To answer my own question, it looks like it is due to the changes that I made in BOINC. It is now up to 8 cores with the latest WU download, though it seems to me that it was limited by something else when I first started, but that was a while ago.
ID: 39328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39329 - Posted: 28 Dec 2014, 21:01:36 UTC

I'm pretty sure that the thread count of the task is set either at time-of-download, or time-of-task-start... And it's based on the "Use at most X% of CPUs" setting.
ID: 39329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39555 - Posted: 21 Jan 2015, 15:56:06 UTC - in response to Message 39081.  

So, is it still a Test application? Not ready for science production yet?

MJH:
Is CPUMD MJHARVEY_CPUDHFR2 finished? Will a batch of new (test) CPUMD be available? Or CPUMD transitioning to (production)?
ID: 39555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jonathan Figdor

Send message
Joined: 8 Sep 08
Posts: 14
Credit: 425,295,955
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39890 - Posted: 30 Jan 2015, 4:59:47 UTC - in response to Message 39555.  

Bump. Are there new CPU workunits coming? Hope all is well with the project.
ID: 39890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 39897 - Posted: 30 Jan 2015, 9:55:39 UTC - in response to Message 39890.  

The CPU work is temporarily in abeyance while we prepare a new application.
Check back later or, if you have an AMD GPU, please participate in testing the new app.

Matt
ID: 39897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : New CPU work units

©2025 Universitat Pompeu Fabra