Message boards :
News :
New CPU work units
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 11,649 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm crunching some of these units in my dual processor 32/48 threads machines. They are sandy bridge (32 threads) and ivy bridge (48 threads) xeon based machines. In the 32 thread machine it has been quite straightforward, it has finished the first unit in 3h5m executing CPU MD v9.02 (mtavx)with CPU kicking in at turbo speed (3,3 GHz). No other boinc project in execution. In the 48 thread it has been a little bit funnier :), first units crashed all just at the beggining, reading the stddr I learnt that the gromacs application can not work well with over 32 threads, but it will try anyway, so launching with 46 threads available other two reserved two GPUGRID GPU units) ended in error (11 units in a row). So, while I investigated how to setup an app_config.xml file for mt units, I reduced the % of available processors until it reached 32 and started another MT unit that this time executed properly and finished in something less than 3h. Then, I copied the app_config.xml file in the GPUGRID folder, enabled again 46 threads and crossed my fingers. It worked fine, 1 MT task using 32 threads and rest of threads executing Rosseta units. Additionally 2 GPUGRID GPU tasks. This time it need about 3h10m which i think should be because of the overall load in the machine. I'll execute some more units and report of more findings if noticeable. |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I powered on my old workstation with two xeon's and two slow GTX 660's. I have allowed BOINC to use all 8 cores, requested new work for GPUGRID and got 2 GPU WU's SR and one CPU. This CPU WU runs on 4 cores it says but in taskmaanager it actually used 92%. I don't mind as I allowed to use all cores, but I would have expect that it uses 6 cores, as there are 6 cores free. Two for the GPU WU's, so 8-2=6. Am I thinking wrong here? Greetings from TJ |
Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 117,924 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Matt, Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task. My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks. Thanks for all the effort you put in. captainjack |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Matt, I am actually very familiar with BOINC Work Fetch. Essentially, what it does is: You have 2 preferences, the "Min Buffer" and the "Additional Buffer". - When BOINC doesn't have enough work to keep all devices busy for "Min Buffer", or has an idle device presently, it will ask projects for work. - When it asks, it asks for "Enough work to fill the idle devices, plus enough work to saturate the devices for [Min Buffer + Additional Buffer] time.", properly taking into account that some tasks are MT and some aren't. It correctly asks for that amount, because it minimizes the RPC web calls to the projects. When BOINC contacted GPUGrid, it likely worked correctly, to satisfy your cache settings. If you think otherwise, then turn on <work_fetch_debug>, abort all of the unstarted tasks, and then let work fetch run, then copy the Event Log data to show us what happened. Feel free to turn on the <work_fetch_debug> flag to see what BOINC is doing during work fetch. http://boinc.berkeley.edu/wiki/Client_configuration Regards, Jacob |
Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 117,924 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Jacob, Per your suggestion, I aborted all tasks, disabled the app_config file, turned on the work_fetch_debug option, started BOINC, and allowed new GPUGRID tasks. It downloaded one task. Then I aborted that task, enabled the app_config file, restarted BOINC and allowed new tasks. It downloaded one task. Then I turned off the work_fetch_debug option, aborted the task, restarted BOINC, and allowed new tasks. It downloaded one task. No idea why it downloaded 16 tasks at one time yesterday. Must have been sun spots or something like that. Anyway, it seems to be working today. Thanks for the suggestion. captainjack |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Strange. The only things I can think of, offhand, would be: - maybe your local cache of work-on-hand had been much lower during the "16-task-work-fetch", as compared to the "1-task-work-fetch" - maybe your cache settings ("Min buffer" and "Max additional buffer") were different between the fetches. Anyway, I'm glad to hear it's working for you! If you have any questions/problems related to work fetch, grab some <work_fetch_debug> Event Log data, and feel free to PM me. I am a work fetch guru -- I helped David A (the main BOINC designer) make sure work fetch works well across projects, resources (cpus, gpus, asics), task types (st single threaded, mt multi threaded), etc. The current BOINC 7.4.27 release does include a handful of work fetch fixes compared to the prior release. You should make sure all your devices are using 7.4.27. Regards, Jacob |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This application appears to have problems restarting from a checkpoint - I suspended it for a few days, then when I told it to resume, it gave a computation error less than a second later. Test application for CPU MD v9.01 (mtsse2) http://www.gpugrid.net/result.php?resultid=13426589 http://www.gpugrid.net/workunit.php?wuid=10302711 |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 11,649 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So, is it still a Test application? Not ready for science production yet? |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Robert, was LAIM on? What's LAIM? How do I tell if it's on? |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
Send message Joined: 1 Jun 14 Posts: 1 Credit: 12,837,497 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() |
LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab. |
![]() Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab. It's on. However, I may have installed some updates and rebooted while the workunit was suspended. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The CPU work units apparently use GROMACS 4.6, which has provisions for GPU acceleration also. Is that being planned? |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It looks like the work units have now gone from 4 cores to 6 cores. It is possible that the difference is due to an increase in the number of cores I allowed in BOINC, but I think it is more likely to be a change in the work units themselves. GPUGRID 9.03 Test application for CPU MD (mtavx) 73801-MJHARVEY_CPUDHFR2-0-1-RND3693_0 - (-) 6C That is perfectly OK with me, and I am glad to find a project that uses AVX. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
To answer my own question, it looks like it is due to the changes that I made in BOINC. It is now up to 8 cores with the latest WU download, though it seems to me that it was limited by something else when I first started, but that was a while ago. |
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm pretty sure that the thread count of the task is set either at time-of-download, or time-of-task-start... And it's based on the "Use at most X% of CPUs" setting. |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So, is it still a Test application? Not ready for science production yet? MJH: Is CPUMD MJHARVEY_CPUDHFR2 finished? Will a batch of new (test) CPUMD be available? Or CPUMD transitioning to (production)? |
Send message Joined: 8 Sep 08 Posts: 14 Credit: 425,295,955 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Bump. Are there new CPU workunits coming? Hope all is well with the project. |
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
The CPU work is temporarily in abeyance while we prepare a new application. Check back later or, if you have an AMD GPU, please participate in testing the new app. Matt |
©2025 Universitat Pompeu Fabra