New CPU work units

Author	Message
MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38499 - Posted: 14 Oct 2014, 13:41:13 UTC - in response to Message 38498. Last modified: 14 Oct 2014, 13:41:44 UTC It's on the TODO list, yes. It'll be appearing in the Linux version first, as that's the easier to develop. Matt ID: 38499 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38500 - Posted: 14 Oct 2014, 14:00:58 UTC - in response to Message 38491. Last modified: 14 Oct 2014, 14:07:00 UTC I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions? Matt For GPUGRID only crunchers this is great, but I think that those who do other project on the CPU are not that happy. For me you can do it. Edit: there are also projects that use the i-GPU, that is also a thread on the CPU. Greetings from TJ ID: 38500 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38501 - Posted: 14 Oct 2014, 14:04:18 UTC I have set to use 100% of the CPU's and only the CPU app for GPUGRID is in hte queque, but strangely only 5CPU's are used. Should be 8. I noticed this in the progress file: Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz Family: 6 Model: 26 Stepping: 5 Features: apic clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 Acceleration most likely to fit this hardware: SSE4.1 Acceleration selected at GROMACS compile time: SSE2 Binary not matching hardware - you might be losing performance. Acceleration most likely to fit this hardware: SSE4.1 Acceleration selected at GROMACS compile time: SSE2 Also I think as the estimation of run time is not correct yet the first 99% goes rather quick and then the last 1% take between 20-28 hours to finish. But Matt knows this already. Greetings from TJ ID: 38501 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38502 - Posted: 14 Oct 2014, 14:09:58 UTC - in response to Message 38423. Last modified: 14 Oct 2014, 14:15:32 UTC TJ- from a post earlier. Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance. Matt TJ or Jacob- for you're SLI system- have you noticed any task being kicked out while running an older CPUMD task with a new x10 compute cost in cache? For me- with an old CPUMD running in high priority mode and new CPUMD in cache- one of two GPU tasks running go's into "waiting to run" mode. ID: 38502 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38503 - Posted: 14 Oct 2014, 14:28:19 UTC - in response to Message 38502. TJ- from a post earlier. [quote]Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance. Matt Aha, thanks now I understand. TJ or Jacob- for you're SLI system- have you noticed any task being kicked out while running an older CPUMD task with a new x10 compute cost in cache? For me- with an old CPUMD running in high priority mode and new CPUMD in cache- one of two GPU tasks running go's into "waiting to run" mode. I have only "old" ones running and in queue, will let them finish first. However none is yet running at high priority. Greetings from TJ ID: 38503 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38504 - Posted: 14 Oct 2014, 14:37:06 UTC - in response to Message 38503. Last modified: 14 Oct 2014, 14:38:06 UTC TJ- from a post earlier. [quote]Only SSE2 at the moment. Will probably make builds with higher levels of optimisation later but for now I'm concerned about correctness rather than performance. Matt Aha, thanks now I understand. TJ or Jacob- for you're SLI system- have you noticed any task being kicked out while running an older CPUMD task with a new x10 compute cost in cache? For me- with an old CPUMD running in high priority mode and new CPUMD in cache- one of two GPU tasks running go's into "waiting to run" mode. I have only "old" ones running and in queue, will let them finish first. However none is yet running at high priority. Currently- CPU task from first batch is NOT running in high priority- but when I download a "new" CPUMD 10x compute task- "old" task go's into high priority and kick's out one of two GPU task computing. ID: 38504 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38505 - Posted: 14 Oct 2014, 16:03:30 UTC Ok, let's get a thing straight here. Client scheduling. The order, I believe, goes something like this: 1) "High Priority" coprocessor (GPU/ASIC) tasks 2) "High Priority" CPU tasks (up to ncpus + 1) (MT tasks allowed to overcommit) 3) "Regular" coprocessor (GPU/ASIC) tasks (up to ncpus + 1) 4) "Regular" CPU tasks (up to ncpus + 1) (MT tasks allowed to overcommit) So... When one of the new GPUGrid MT CPU tasks comes in, if it is set to use all of the CPUs, and it run's high priority... It gets scheduled in "order 2", which is above the GPU tasks which come in at "order 3". And then, it will additionally schedule as many "order 3" GPU tasks as it can, but only up to the point that it budgets 1 additional CPU. (So, if your GPU tasks are set to use 0.667 CPUs like I have scheduled mine via app_config, then it will run 1 GPU task, but not 2). This is NOT a problem of "oh wow, GPUGrid MT tasks are scheduling too many CPUs." This IS a problem of "oh wow, GPUGrid MT tasks go high-priority immediately. That throws off all of the scheduling on the client." Hopefully that helps clarify. PS: Here is some dated info that is a useful read: http://boinc.berkeley.edu/trac/wiki/ClientSched http://boinc.berkeley.edu/trac/wiki/ClientSchedOctTen ID: 38505 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38508 - Posted: 14 Oct 2014, 16:55:05 UTC Last modified: 14 Oct 2014, 16:57:45 UTC Jacob- thank you for the information about client scheduling. Matt- I see you released a CPUMD app for Linux with support for SSE4/AVX. Will windows also see an upgrade? Do you have idea what the speed up with SSE4/ AVX app will be compared to standard SSE2 app? ID: 38508 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38509 - Posted: 14 Oct 2014, 17:20:49 UTC - in response to Message 38508. Will windows also see an upgrade? Probably within the week. Do you have idea what the speed up with SSE4/ AVX app will be compared to standard SSE2 app? 10-30% for AVX on Intel, I think. ID: 38509 · Rating: 0 · rate: / Reply Quote

=Lupus= Send message Joined: 10 Nov 07 Posts: 10 Credit: 12,777,491 RAC: 0 Level Scientific publications	Message 38522 - Posted: 14 Oct 2014, 22:08:35 UTC ohmyohmy... http://www.gpugrid.net/result.php?resultid=13195959 running on 3 out of 4 cpu cores, nsteps=5000000 at 57 hours: Writing checkpoint, step 3770270 at Tue Oct 14 23:57:52 2014 seems it will finish... in 24 more hours. seems something went rly weird with est runtime. question: should I abort the 6 other workunits? ID: 38522 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level Scientific publications	Message 38524 - Posted: 14 Oct 2014, 23:50:27 UTC Step = 1 744 000 After 9 hrs 20 min running on full 8 threads. This might be the most expensive (computing wise) WU I've run ever since I started DC'ing. ID: 38524 · Rating: 0 · rate: / Reply Quote

boinc127 Send message Joined: 31 Aug 13 Posts: 11 Credit: 7,952,212 RAC: 0 Level Scientific publications	Message 38525 - Posted: 15 Oct 2014, 4:08:20 UTC I noticed this bit of info off of the task I ran, using 3 out of the 4 available cores on my computer: http://www.gpugrid.net/result.php?resultid=13201370 Using 1 MPI thread Using 3 OpenMP threads NOTE: The number of threads is not equal to the number of (logical) cores and the -pin option is set to auto: will not pin thread to cores. This can lead to significant performance degradation. Consider using -pin on (and -pinoffset in case you run multiple jobs). Can this become an issue for computers that aren't running the task with all cores? ID: 38525 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38526 - Posted: 15 Oct 2014, 14:46:03 UTC This is the last bit of the stderr file: starting mdrun 'Protein in water' 5000000 steps, 10000.0 ps (continuing from step 3283250, 6566.5 ps). Writing final coordinates. Core t (s) Wall t (s) (%) Time: 32457.503 32458.000 100.0 9h00:58 (ns/day) (hour/ns) Performance: 9.140 2.626 gcq#0: Thanx for Using GROMACS - Have a Nice Day 16:39:45 (4332): called boinc_finish(0) It ran on 5 CPU's (8 where allowed). Am I right seeing that it took 9 hours to finish? It took a bit more, see this: 1345-MJHARVEY_CPUDHFR-0-1-RND9787_0 10159887 153309 12 Oct 2014 \| 17:58:49 UTC 15 Oct 2014 \| 14:38:34 UTC Completed and validated 94,864.92 567,905.50 2,773.48 Test application for CPU MD v8.46 (mt) Greetings from TJ ID: 38526 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38529 - Posted: 15 Oct 2014, 19:52:17 UTC - in response to Message 38526. It ran on 5 CPU's (8 where allowed). Am I right seeing that it took 9 hours to finish? No - it took just over a day. The performance was ~9ns/day, the sim was 10ns in length. Matt ID: 38529 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level Scientific publications	Message 38530 - Posted: 15 Oct 2014, 21:25:42 UTC Had 3 errors on one of my PCs: http://www.gpugrid.net/results.php?hostid=185425 All errored out with: "Program projects/www.gpugrid.net/mdrun.846, VERSION 4.6.3 Source code file: ..\..\..\gromacs-4.6.3\src\gmxlib\checkpoint.c, line: 1562 File input/output error: Cannot read/write checkpoint; corrupt file, or maybe you are out of disk space? For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors" This computer has no problems running other projects... including vLHC@Home, Rosetta, etc. ID: 38530 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38534 - Posted: 16 Oct 2014, 10:17:32 UTC A few observations about CPUMD tasks- a dual core CPU will need 4 days/ 96hr~ to compete task- a dual core with HT [4threads] requires 2days/48~hr - a quad core with no HT [4threads] takes 16-36~hr - a quad core with HT [8threads] competes tasks in ~24hr - a 6 core [12threads] finishes a task in ~8-16hr - while a 16 thread CPU manages CPUMD tasks in under ~12hr. There are CPU finishing faster from being overclocked and having 1833MHz or higher RAM clocks. Disk usage is low for CPUMD- notice when running GPU tasks disk usage can be higher for certain tasks. (unfold_Noelia) CPU temps are low with SSE2 app- when AVX CPUMD app is released temps will be higher. For people who are running Intel AVX CPU- there a possible 10-30% speed up coming when AVX app is released. Some info-- http://en.wikipedia.org/wiki/Dihydrofolate_reductase ID: 38534 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38535 - Posted: 16 Oct 2014, 11:26:37 UTC Last modified: 16 Oct 2014, 11:33:56 UTC I completed my first CPU task on my main rig on Windows 10 Technical Preview x64. http://www.gpugrid.net/result.php?resultid=13206356 Observations: - It used app: Test application for CPU MD v8.46 (mtsse2) - It had horrible estimates, along with an ability to report progress correctly, and had a 1-week deadline, and so it ran as high-priority the entire time, interfering with the BOINC client scheduling of my GPUGrid GPU tasks. I will not be running this type of task again unless the estimation is fixed. - It did not report progress correctly. - It ran using 6 (of my 8) logical CPUs, as I had BOINC set to use 75% CPUs, since I am running 2 RNA World VM tasks outside of BOINC - It took 162,768.17s (45.2 hours) of wall time - It consumed 721,583.90s (200.4 hours) of CPU time - It did checkpoint every so often, which I was happy to see. It appeared to resume from checkpoints just fine. - It completed successfully, with the output text below - It validated successfully, and granted credit. - It seems weird that the time values in the output do not match either the wall time or CPU time values that BOINC reported. Bug? Core t (s) Wall t (s) (%) Time: 18736.176 18736.000 100.0 5h12:16 (ns/day) (hour/ns) Performance: 5.491 4.371 Let us know when the estimation and progress problems are fixed, and then maybe I'll run another one for you! Thanks, Jacob ID: 38535 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38537 - Posted: 16 Oct 2014, 11:37:27 UTC - in response to Message 38535. Last modified: 16 Oct 2014, 11:54:54 UTC Deleted post ID: 38537 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38539 - Posted: 16 Oct 2014, 11:52:02 UTC - in response to Message 38537. Last modified: 16 Oct 2014, 12:03:43 UTC eXaPower: Your questions about Windows 10 should have been a PM. I'll send you a PM response. ID: 38539 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38544 - Posted: 16 Oct 2014, 13:59:49 UTC For last couple days- I've had two GPU tasks and one CPUMD tasks running in high priority- up until now all ran with no issues. Just now and randomly BOINC has decided to kill one of GPU tasks- sending it to "waiting to run" mode. If I suspend CPUMD task both GPU tasks will run. Allowing CPUMD task to run will shut a GPU task. ID: 38544 · Rating: 0 · rate: / Reply Quote