New CPU work units

Author	Message
TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38466 - Posted: 13 Oct 2014, 17:56:19 UTC Last modified: 13 Oct 2014, 17:57:41 UTC Found it, I have some data Step Time Lambda 4891000 9782.00000 0.00000 (99.895% done, 5CPU, 24:23:20h) Step Time Lambda 3856000 7712.00000 0.00000 (99.887% done, 5CPU, 23:53:20h) Step Time Lambda 2938000 5876.00000 0.00000 (97.075% done, 4CPU, 18:51:13h) So will take more time to finish. At one rig I got 4 more, they will not meet the deadline of 17 October. Greetings from TJ ID: 38466 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38467 - Posted: 13 Oct 2014, 18:56:44 UTC Finally got the first one finished in 24h55m! Greetings from TJ ID: 38467 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38472 - Posted: 13 Oct 2014, 20:29:02 UTC - in response to Message 38451. exapower, Actually it's dihydrofolate reductase http://en.wikipedia.org/wiki/Dihydrofolate_reductase Matt ID: 38472 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38473 - Posted: 13 Oct 2014, 20:31:04 UTC - in response to Message 38467. TJ - odd that it ran with 5 threads -- do you have a CPU limit set ? Matt ID: 38473 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38474 - Posted: 13 Oct 2014, 20:56:11 UTC For the WUs going on tonight I've increased the expected compute cost by 10x, and the deadline to 7 days. Matt ID: 38474 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38477 - Posted: 13 Oct 2014, 22:23:32 UTC - in response to Message 38472. Last modified: 13 Oct 2014, 23:23:17 UTC Thanks for the correction. Gives me another enzyme to learn about. These CPUMD tasks for Ivy Bridge 4 threads looks to be ~48 hr runtime. My Westmere generation Pentium runtime is about 90~hr. (It really a Dual core Xeon 30TDP L3403) reason why- a 4.8GT/s QPI internal link and external DMI link. No westmere Pentium has QPI- only DMI links. Intel been rebadging Xeons for while now. I just received a new task with 10x compute cost-- 1084hour estimated runtime! Old task estimated a 5hr runtime- while taking 48~hr to finish. Going to be very interestxing. http://www.gpugrid.net/workunit.php?wuid=10165559 Edit*** Task http://www.gpugrid.net/workunit.php?wuid=10157944 is now running in high priority mode- kicking out one of my two GPU tasks off to waiting to run. All settings for Boinc have no limitations. Manual restarting of task and shutting off SLI has no affect. This just started to happen when task went into High priority mode after I downloaded new CPUMD task with high compute cost. I aborted http://www.gpugrid.net/workunit.php?wuid=10165559 and GPU waiting to run task started back up. A new task http://www.gpugrid.net/workunit.php?wuid=10165641 downloaded and stopped GPU task again- boinc saying waiting to run state. Aborted task and second GPU starts task again. What's changed from prior batch of CPUMD? Having a new 10x compute cost task in cache suspends one of two GPU tasks running. If no 10x compute task are in cache- both GPU tasks run normal. All this is happening while the older CPUMD task is running in high priority mode with two GPU SDOERR tasks and one 10x compute CPUMD task in cache. ID: 38477 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38478 - Posted: 13 Oct 2014, 22:25:22 UTC - in response to Message 38473. TJ - odd that it ran with 5 threads -- do you have a CPU limit set ? Matt Yes it is at 70 so that 2 are free to feed the GPU's and 1 for the system to stay responsive so I can use this site and do some other things. I have another rig set to 100 so the next one should run at 100% when the current one has finished. Greetings from TJ ID: 38478 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level Scientific publications	Message 38481 - Posted: 14 Oct 2014, 0:57:49 UTC Last modified: 14 Oct 2014, 1:08:25 UTC Running one @ 8 threads (4 cores + HT). EDIT: is there a way to let it use only 6 cores and free up two cores (so vLHC can ran as well...) ID: 38481 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38482 - Posted: 14 Oct 2014, 2:18:51 UTC Last modified: 14 Oct 2014, 2:23:16 UTC Hmm... I got a task, which will correctly run at 6 of my 8 CPUs on my rig (since I'm using "Use at most 75% CPUs" to presently accommodate 2 VM tasks outside of BOINC): http://www.gpugrid.net/workunit.php?wuid=10166037 BUT... It immediately started running in "High Priority" "Earliest Deadline First" mode. Could this mean that the 1-week-deadline is too short? The estimated runtime is 1130+ hours, which is, uhh, 6.7 weeks? Does that sound right? This configuration must be incorrect. The deadline should never be less than the expected initial runtime, right? ID: 38482 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38484 - Posted: 14 Oct 2014, 7:08:38 UTC - in response to Message 38482. Yeah, no idea what went wrong with the runtime estimates. The last set got an estimate of 5h, but when I increased the cost by 10x the estimate went up 200x. Matt ID: 38484 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38487 - Posted: 14 Oct 2014, 9:41:02 UTC Last modified: 14 Oct 2014, 9:41:47 UTC Task http://www.gpugrid.net/workunit.php?wuid=10157944 been @ 99.978 compete with an estimated 3 minute left for last 12hr with a current 2million steps to go before finish. This task been past 98% for about 25hr out 32hr running. The newer CPUMD tasks sitting in cache kills one of GPU tasks - keeping task in waiting to run mode. Once 10x compute cost task booted all is normal again. ID: 38487 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38489 - Posted: 14 Oct 2014, 12:38:52 UTC - in response to Message 38481. Running one @ 8 threads (4 cores + HT). EDIT: is there a way to let it use only 6 cores and free up two cores (so vLHC can ran as well...) Yes that can be achieved, but not with the current one running. Eight cores, so 12.5 per core. You want to use 6 cores thus 12.5x6=75. So you have to set in BOINC Manager, Tools, Computing preferences, and then at the bottom: On multiprocessor systems, use at most and there you set it to 75%. This works only for new WU's that not have been started. Once your current WU has finished, the next one will only use 75% thus 6 threads. Greetings from TJ ID: 38489 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38490 - Posted: 14 Oct 2014, 12:47:54 UTC - in response to Message 38489. I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions? Matt ID: 38490 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38491 - Posted: 14 Oct 2014, 12:47:59 UTC - in response to Message 38489. I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions? Matt ID: 38491 · Rating: 0 · rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 4,739,796,466 RAC: 2,142 Level Scientific publications	Message 38493 - Posted: 14 Oct 2014, 12:56:51 UTC Running one @ 8 threads (4 cores + HT). EDIT: is there a way to let it use only 6 cores and free up two cores (so vLHC can ran as well...) Another way to accomplish this is to set up an app_config.xml file like the following: <app_config> <app> <name>android</name> <max_concurrent>8</max_concurrent> </app> <app_version> <app_name>cpumd</app_name> <plan_class>mt</plan_class> <avg_ncpus>6</avg_ncpus> <cmdline>--nthreads 6</cmdline> </app_version> </app_config> The "avg_ncpus" parameter sets the number of threads reserved in the BOINC scheduler and the "__nthreads" parameter set the number of threads you want a task to use when the task runs. The app_config.xml file goes into the /Boinc/projects/www.gpugird.net folder. And just as a reminder, if you are using Windows, use Notepad to edit the app_config.xml file. Do not use Word as it put in extra formatting characters that confuses the xml interpreter. If you are using Ubuntu, use Gedit to edit the app_config.xml file. That will leave 2 threads free for BOINC to schedule other tasks. Sometimes you can make this effective by opening BOINC Manager and clicking on "Advanced" then "Read config files". In the message log, you should get a message that says something like app_config.xml found for www.gpugrid.net. You might need to shut down BOINC and start it back up again to make the app_config.xml effective. Then the parameters will apply to any new work that is downloaded after the parameters are effective. Hope that helps. ID: 38493 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38494 - Posted: 14 Oct 2014, 12:57:59 UTC Last modified: 14 Oct 2014, 13:10:15 UTC MJH: Yeah, no idea what went wrong with the runtime estimates. The last set got an estimate of 5h, but when I increased the cost by 10x the estimate went up 200x. Did you increase the <rsc_flops_bound> value appropriately? I believe it's used for task size, and hence task runtime estimation. I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions? I personally don't care either way, but I absolutely care that you make absolutely sure that the number of cores used (via commandline) matches the number of cores budgeted (via ncpus), so BOINC doesn't overcommit or undercommit the CPU. I hope that makes sense. ID: 38494 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38495 - Posted: 14 Oct 2014, 13:03:10 UTC - in response to Message 38491. I am thinking about hard-coding the CPU use to [number-of-CPU-cores] - [number-of-GPUs]. Opinions? Matt Doesn't hurt to try. Only Nvidia GPU? How much would a GPU help with task time? If runtime is lowered by half or more then I'd say this is workable. Would be an option to run a task with half of GPU cores so maybe two of MD task can run at a time? Or is whole GPU required? ID: 38495 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38496 - Posted: 14 Oct 2014, 13:13:37 UTC Side note: Is it possible that progress % is not hooked up correctly? For instance, progress.log says: nsteps = 5000000 and if I scroll to the bottom: Step Time Lambda 1254000 2508.00000 0.00000 ... yet, the BOINC UI only says "1.777% done" Shouldn't it say 25% done? ID: 38496 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38497 - Posted: 14 Oct 2014, 13:20:16 UTC - in response to Message 38496. The app isn't reporting its progress to the client yes. It's just being estimated from the flopses. Matt ID: 38497 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 38498 - Posted: 14 Oct 2014, 13:35:34 UTC Is it possible to change it so that the app can report a better progress %? ID: 38498 · Rating: 0 · rate: / Reply Quote