New CPU work units

Author	Message
Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 17 Level Scientific publications	Message 38788 - Posted: 1 Nov 2014, 14:02:57 UTC I'm crunching some of these units in my dual processor 32/48 threads machines. They are sandy bridge (32 threads) and ivy bridge (48 threads) xeon based machines. In the 32 thread machine it has been quite straightforward, it has finished the first unit in 3h5m executing CPU MD v9.02 (mtavx)with CPU kicking in at turbo speed (3,3 GHz). No other boinc project in execution. In the 48 thread it has been a little bit funnier :), first units crashed all just at the beggining, reading the stddr I learnt that the gromacs application can not work well with over 32 threads, but it will try anyway, so launching with 46 threads available other two reserved two GPUGRID GPU units) ended in error (11 units in a row). So, while I investigated how to setup an app_config.xml file for mt units, I reduced the % of available processors until it reached 32 and started another MT unit that this time executed properly and finished in something less than 3h. Then, I copied the app_config.xml file in the GPUGRID folder, enabled again 46 threads and crossed my fingers. It worked fine, 1 MT task using 32 threads and rest of threads executing Rosseta units. Additionally 2 GPUGRID GPU tasks. This time it need about 3h10m which i think should be because of the overall load in the machine. I'll execute some more units and report of more findings if noticeable. ID: 38788 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38789 - Posted: 1 Nov 2014, 15:17:58 UTC I powered on my old workstation with two xeon's and two slow GTX 660's. I have allowed BOINC to use all 8 cores, requested new work for GPUGRID and got 2 GPU WU's SR and one CPU. This CPU WU runs on 4 cores it says but in taskmaanager it actually used 92%. I don't mind as I allowed to use all cores, but I would have expect that it uses 6 cores, as there are 6 cores free. Two for the GPU WU's, so 8-2=6. Am I thinking wrong here? Greetings from TJ ID: 38789 · Rating: 0 · rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 171 Level Scientific publications	Message 38894 - Posted: 13 Nov 2014, 0:54:04 UTC Matt, Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task. My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks. Thanks for all the effort you put in. captainjack ID: 38894 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38895 - Posted: 13 Nov 2014, 3:26:21 UTC - in response to Message 38894. Last modified: 13 Nov 2014, 3:28:27 UTC Matt, Scheduling suggestion: One of my PC's has 16 threads. I just ran it dry so I could install Windows updates uninterrupted. When I started back up, the first project I allowed to download new CPU tasks was GPUGRID. It download 16 of the multi-thread tasks. Each task is scheduled to take 70 hours. Since the tasks run one at a time, it will take a while to work through 16 tasks at 70 hours per task. My suggestion is that the number of tasks downloaded at one time is 2 or 3. The number of tasks downloaded should not equal to the number of threads on the machine. I think the BOINC default is to initially download the same number of CPU tasks as there are threads on the machine. That may need to be changed for multi-thread tasks. Thanks for all the effort you put in. captainjack I am actually very familiar with BOINC Work Fetch. Essentially, what it does is: You have 2 preferences, the "Min Buffer" and the "Additional Buffer". - When BOINC doesn't have enough work to keep all devices busy for "Min Buffer", or has an idle device presently, it will ask projects for work. - When it asks, it asks for "Enough work to fill the idle devices, plus enough work to saturate the devices for [Min Buffer + Additional Buffer] time.", properly taking into account that some tasks are MT and some aren't. It correctly asks for that amount, because it minimizes the RPC web calls to the projects. When BOINC contacted GPUGrid, it likely worked correctly, to satisfy your cache settings. If you think otherwise, then turn on <work_fetch_debug>, abort all of the unstarted tasks, and then let work fetch run, then copy the Event Log data to show us what happened. Feel free to turn on the <work_fetch_debug> flag to see what BOINC is doing during work fetch. http://boinc.berkeley.edu/wiki/Client_configuration Regards, Jacob ID: 38895 · Rating: 0 · rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 171 Level Scientific publications	Message 38903 - Posted: 13 Nov 2014, 22:06:28 UTC Jacob, Per your suggestion, I aborted all tasks, disabled the app_config file, turned on the work_fetch_debug option, started BOINC, and allowed new GPUGRID tasks. It downloaded one task. Then I aborted that task, enabled the app_config file, restarted BOINC and allowed new tasks. It downloaded one task. Then I turned off the work_fetch_debug option, aborted the task, restarted BOINC, and allowed new tasks. It downloaded one task. No idea why it downloaded 16 tasks at one time yesterday. Must have been sun spots or something like that. Anyway, it seems to be working today. Thanks for the suggestion. captainjack ID: 38903 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38904 - Posted: 13 Nov 2014, 22:12:28 UTC - in response to Message 38903. Last modified: 13 Nov 2014, 22:13:45 UTC Strange. The only things I can think of, offhand, would be: - maybe your local cache of work-on-hand had been much lower during the "16-task-work-fetch", as compared to the "1-task-work-fetch" - maybe your cache settings ("Min buffer" and "Max additional buffer") were different between the fetches. Anyway, I'm glad to hear it's working for you! If you have any questions/problems related to work fetch, grab some <work_fetch_debug> Event Log data, and feel free to PM me. I am a work fetch guru -- I helped David A (the main BOINC designer) make sure work fetch works well across projects, resources (cpus, gpus, asics), task types (st single threaded, mt multi threaded), etc. The current BOINC 7.4.27 release does include a handful of work fetch fixes compared to the prior release. You should make sure all your devices are using 7.4.27. Regards, Jacob ID: 38904 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 39012 - Posted: 23 Nov 2014, 19:49:32 UTC This application appears to have problems restarting from a checkpoint - I suspended it for a few days, then when I told it to resume, it gave a computation error less than a second later. Test application for CPU MD v9.01 (mtsse2) http://www.gpugrid.net/result.php?resultid=13426589 http://www.gpugrid.net/workunit.php?wuid=10302711 ID: 39012 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 39018 - Posted: 24 Nov 2014, 21:59:37 UTC - in response to Message 39012. Robert, was LAIM on? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 39018 · Rating: 0 · rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 17 Level Scientific publications	Message 39081 - Posted: 5 Dec 2014, 20:43:58 UTC Last modified: 5 Dec 2014, 20:44:23 UTC So, is it still a Test application? Not ready for science production yet? ID: 39081 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 39084 - Posted: 5 Dec 2014, 23:51:06 UTC - in response to Message 39018. Robert, was LAIM on? What's LAIM? How do I tell if it's on? ID: 39084 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 39085 - Posted: 6 Dec 2014, 0:03:53 UTC - in response to Message 39084. Robert, was LAIM on? What's LAIM? How do I tell if it's on? LAIM ID: 39085 · Rating: 0 · rate: / Reply Quote

ExpeditionHope Send message Joined: 1 Jun 14 Posts: 1 Credit: 12,837,497 RAC: 0 Level Scientific publications	Message 39086 - Posted: 6 Dec 2014, 3:15:25 UTC - in response to Message 39085. LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab. ID: 39086 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 39093 - Posted: 8 Dec 2014, 0:51:59 UTC - in response to Message 39086. LAIM stands for "Leave application in memory"(while suspended) setting in boinc client under disk and memory usage tab. It's on. However, I may have installed some updates and rebooted while the workunit was suspended. ID: 39093 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 39309 - Posted: 26 Dec 2014, 2:08:34 UTC The CPU work units apparently use GROMACS 4.6, which has provisions for GPU acceleration also. Is that being planned? ID: 39309 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 39325 - Posted: 28 Dec 2014, 17:03:00 UTC It looks like the work units have now gone from 4 cores to 6 cores. It is possible that the difference is due to an increase in the number of cores I allowed in BOINC, but I think it is more likely to be a change in the work units themselves. GPUGRID 9.03 Test application for CPU MD (mtavx) 73801-MJHARVEY_CPUDHFR2-0-1-RND3693_0 - (-) 6C That is perfectly OK with me, and I am glad to find a project that uses AVX. ID: 39325 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 39328 - Posted: 28 Dec 2014, 19:01:04 UTC - in response to Message 39325. To answer my own question, it looks like it is due to the changes that I made in BOINC. It is now up to 8 cores with the latest WU download, though it seems to me that it was limited by something else when I first started, but that was a while ago. ID: 39328 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 39329 - Posted: 28 Dec 2014, 21:01:36 UTC I'm pretty sure that the thread count of the task is set either at time-of-download, or time-of-task-start... And it's based on the "Use at most X% of CPUs" setting. ID: 39329 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 39555 - Posted: 21 Jan 2015, 15:56:06 UTC - in response to Message 39081. So, is it still a Test application? Not ready for science production yet? MJH: Is CPUMD MJHARVEY_CPUDHFR2 finished? Will a batch of new (test) CPUMD be available? Or CPUMD transitioning to (production)? ID: 39555 · Rating: 0 · rate: / Reply Quote

Jonathan Figdor Send message Joined: 8 Sep 08 Posts: 14 Credit: 425,295,955 RAC: 0 Level Scientific publications	Message 39890 - Posted: 30 Jan 2015, 4:59:47 UTC - in response to Message 39555. Bump. Are there new CPU workunits coming? Hope all is well with the project. ID: 39890 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 39897 - Posted: 30 Jan 2015, 9:55:39 UTC - in response to Message 39890. The CPU work is temporarily in abeyance while we prepare a new application. Check back later or, if you have an AMD GPU, please participate in testing the new app. Matt ID: 39897 · Rating: 0 · rate: / Reply Quote