Author |
Message |
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Hi,
There's a new CPU app available for Linux clients. A few WUs are out now, with some more to come after I've received the first results back.
The app is multithreaded, I think the default behaviour of the BOINC client is to allocate all cores to it.
Please report any observations here.
Matt
|
|
|
|
Hi Matt,
So far, all tasks ending in error. Some stop immediately, some run for a while.
Here's my main question. Over at Milkyway@home, they are able to control the threads assigned to a multi-thread task using an app_config.xml file. I put together a similar app_config.xml file for the cpumd tasks and tried it on a few tasks.
My BOINC client has 10 threads assigned. I have an app_config.xml set up to limit the threads assigned to cpumd tasks to 6 threads. In BOINC Manager, the task shows that it is using 6 CPU's. However, the stderr.txt shows that it is using 10 openmp threads and system usage indicates that it is using all 10 threads.
My app_config:
<app_config>
<app>
<name>acemdlong</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdbeta</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<max_concurrent>9999</max_concurrent>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>android</name>
<max_concurrent>4</max_concurrent>
</app>
<app_version>
<app_name>cpumd</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>6</avg_ncpus>
</app_version>
</app_config>
Can you see anything that needs to be adjusted in my app_config? BOINC Manager is not giving me any error messages when it reads the app_config file.
Sure would be nice if this would work here so we could leave some threads open to support GPU tasks.
Thanks for all the help. |
|
|
|
Refer to the Application configuration documentation.
For most multi-threaded applications you need
<cmdline>--nthreads 6</cmdline>
to control the behaviour of the application, in addition to the <avg_ncpus>6</avg_ncpus> (which only controls BOINC's scheduling, as you've found).
I've only ever used --nthreads under Windows: I'm not sure whether it's applicable under Linux. Perhaps you or Matt could find out for us. |
|
|
|
<cmdline>--nthreads 6</cmdline>
That's right - it's controlled by the command line option "--nthreads". It should default to using a single core if that's not specified. You'll be able to see in the stderr of the task's tombstone web page what arguments it received.
Matt
|
|
|
|
Matt and Richard,
Thanks for the advice. The cmdline option seemed to do the trick. It is now running on 6 threads.
Thanks for the help,
captainjack |
|
|
|
I've just updated the app to correct for crashes on clients with venerable Core2 CPUs.
Matt |
|
|
|
Could someone please report on the success or otherwise of suspend/resume of WUs?
Matt |
|
|
floydSend message
Joined: 17 Dec 11 Posts: 11 Credit: 105,502,570 RAC: 0 Level
Scientific publications
|
Is there any real advantage in making the app multithreaded? It saves memory, but that's all that comes to mind. On the other hand I expect it to be less efficient than running several single-threaded tasks. Plus the BOINC scheduler seems to be bad at managing multithreaded apps.
|
|
|
|
Is there any real advantage in making the app multithreaded?
Yes. For the use we intend for it we need the results back in a timely manner. Running these WUs on a single core will work, but the results are likely to come back too late to be useful. This application scales linearly for small N - I'm estimating 4-8 cores on most machines.
Plus the BOINC scheduler seems to be bad at managing multithreaded apps.
Well that's another thing entirely, and our problem to solve. I'm more concerned that the client doesn't give the user the desired level of control.
Matt
|
|
|
floydSend message
Joined: 17 Dec 11 Posts: 11 Credit: 105,502,570 RAC: 0 Level
Scientific publications
|
For the use we intend for it we need the results back in a timely manner. Running these WUs on a single core will work, but the results are likely to come back too late to be useful.
This can only work if you don't have to compete for resources. You'll lose your advantage at every task switch or when BOINC decides to delay the start of a cached task in favour of some other. A very short deadline could possibly avoid this but I'm not sure if I could tolerate such hijacking.
Plus the BOINC scheduler seems to be bad at managing multithreaded apps.
Well that's another thing entirely, and our problem to solve. I'm more concerned that the client doesn't give the user the desired level of control.
I was talking about the client's task scheduler actually. As you already mentioned, it will run multithreaded apps on all cores, effectively interrupting all other work.
|
|
|
|
Matt said,
Could someone please report on the success or otherwise of suspend/resume of WUs?
Matt
Just successfully finished an 8.43 task and it shows validated. It was running with an app_config.xml which allocated 6 threads to the task. After it was running for a few minutes, I suspended then resumed the task. Looks like it worked fine. Link to task http://www.gpugrid.net/result.php?resultid=12864432
I have another task running now. After it had been running for ~10 minutes, I shut down BOINC then restarted BOINC. It restarted from the beginning. Looks like it is running fine now. Will report back in later.
Let me know if you need more information. |
|
|
|
Looks like it worked fine.
Super, thanks!
Matt |
|
|
|
Matt,
I had a second task that was running when I shut down BOINC and started it back up again. That task has finished and validated.
Ubuntu 14.04
BOINC 7.2.42 installed using the Berkeley installer
Let us know if you want us to run some other tests. |
|
|
|
Hi Everyone,
The new test application is now available for Windows as well as Linux. Please do help us test it!
This application does the same type of simulations as ACEMD, our GPU application. We reason why we are testing it now is that, in pinciple, modern CPUs are now finally fast enough to do process some of our WUs within an acceptable amount of time.
To get to that point though, it is essential to use multiple CPU simultaneously, so this is a multithreaded app. I'd encourage you to let the WUs run on all cores (which it will do by default). The performance scales linearly with core count.
The main objective of this first phase is to test application stability and measure achieved simulation rates and total throughput.
It's a Beta application - to get work for it, you'll need to have your profile set to allow CPU work, allow beta work and enable the application "Molecular Dynamics on CPU".
The app is largely feature-complete. The only issue I know is outstanding is that % completion statistics are wrong. I'm sure you'll all find other issues -
please post your experiences and observations here.
Matt
Matt |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
Hi Matt,
My Haswell 4771 on win7 did one but with error. Five of my wing(wo)man had error too. You can see it here: http://www.gpugrid.net/result.php?resultid=12877823
Off topic: my error page has also still one error on it from 31 Aug 2013 and one form 19 Nov 2013.
____________
Greetings from TJ |
|
|
|
Hi Matt,
Just ran one of the multithread CPU tasks on a Windows 7 machine with 16 threads.
Matt said:
I'd encourage you to let the WUs run on all cores (which it will do by default). The performance scales linearly with core count.
Per your request, I let it run on all 16 threads. CPU Utilization was pegged at 100% throughout the run. Task has uploaded and validated.
Just started another one. I will keep an eye on it and let you know if anything changes.
Let me know if you want me to try a different kind of test.
captainjack
|
|
|
|
My Haswell 4771 on win7 did one but with error.
Those were with the previous version. 844 is current, and should already have fixed those problems.
Matt |
|
|
TJSend message
Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level
Scientific publications
|
My Haswell 4771 on win7 did one but with error.
Those were with the previous version. 844 is current, and should already have fixed those problems.
Matt
Thanks, will wait for new ones.
____________
Greetings from TJ |
|
|
|
Thanks, will wait for new ones.
There are plenty of unsent tasks - if you're not receiving them, best check your project settings as below.
Matt |
|
|
floydSend message
Joined: 17 Dec 11 Posts: 11 Credit: 105,502,570 RAC: 0 Level
Scientific publications
|
Matt,
you already know about the incorrect progress display, together with the elapsed and remaining run times, but you haven't mentioned if you see a way to fix this. If not, I'd like to point out that all calculations based on those values are of course wrong too, like computing speed, estimated run times and credits, possibly affecting system operations and user acceptance.
For completeness, all my 8.43 WUs so far have finished and validated without further issues, including one that restarted after a system shutdown. WU size seems reasonable.
|
|
|
|
There'll be a new version out later today, after stumps. |
|
|
|
As promised, version 845 should report progress correctly.
Matt |
|
|
|
Or not, because I don't know the difference between percentages and fractions. 846 out now, which should also correctly report to the client when a checkpoint was performed.
Matt |
|
|
|
Just aborted a V845 that has been been at 100% complete forever...
According to BOINC it was 4,706.000% DONE
Application Test application for CPU MD 8.45 (mt)
Workunit name 6_745-MJHARVEY_gpugrid10z4-0-1-RND3120
State Waiting to run
Received 28-07-2014 16:10
Report deadline 02-08-2014 16:08
Estimated app speed 3.82 GFLOPs/sec
Estimated task size 5,000,000 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 00:00:00
CPU time 21:05:58
Elapsed time 13:59:25
Estimated time remaining 00:00:00
Fraction done 4,706.000%
Virtual memory size 49.34 MB
Working set size 0.35 MB
Directory slots/5
Process ID 5716
____________
|
|
|
|
According to BOINC it was 4,706.000% DONE
845 misreported by a factor of 100. You job was 47% complete when you killed it.
This is fixed in 846.
MJH |
|
|
petebeSend message
Joined: 19 Nov 12 Posts: 31 Credit: 1,549,545,867 RAC: 0 Level
Scientific publications
|
Matt,
currently running a v 8.46 -
http://www.gpugrid.net/result.php?resultid=12898547
It has been running 1 hour and reports that it is 1% complete, with time to completion 5days and 36 minutes?
EDIT - Also, there are no other BOINC tasks or non-BOINC app's running. BOINC reports CPU usage at 100%, but Win XP Task Manager reports only 50% CPU usage. |
|
|
Presrvd Send message
Joined: 6 Jul 14 Posts: 5 Credit: 41,548,910 RAC: 0 Level
Scientific publications
|
Mine ran for roughly 5 1/2 hours, and went from 8732% to complete instantly, and the part that is really bothering me is that I have the test applications turned off. |
|
|
|
Hi Matt,
Just started up an 8.46 job in Windows 7. My app_config is listed below, but the app seems to ignore the app_config settings for number of CPU's. Can you see something that needs to be changed in my app_config. BOINC Manager is not showing any errors when it reads the app_config.
<app_config>
<app>
<name>android</name>
<max_concurrent>8</max_concurrent>
</app>
<app_version>
<app_name>cpumd</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>8</avg_ncpus>
<cmdline>--nthreads 8</cmdline>
</app_version
</app_config>
The Linux tasks seems to be running well and running within the number of CPU's specified in the app_config. As a reminder, I really like using the app_config so I can reserve a few cpu threads to support GPU processing.
Thanks,
captainjack |
|
|
|
The 8.46 task on Windows 7 just finished. It was running on 16 threads. Here are the estimates:
Elapsed minutes: 42
% complete: 1.75
Estimated to finish: 40:53:39
Elapsed minutes: 103
% complete: 4.2
Estimated to finish: 39:53:44
Elapsed minutes: 107
Finished
Hope that helps. |
|
|
|
Captainjack,
That app config looks ok to me. Have a look in the stderr reported by the job, that will say at the top how many threads the program is using.
Matt
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The app_config probably won't take effect immediately (even if you read the config files from Boinc) as work units occupy slots and cores are already allocated to started work. So it should kick in after a WU finishes. The trouble with this is that the WU's are long (just like the GPU work units).
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
PDHSend message
Joined: 15 Oct 10 Posts: 1 Credit: 340,914,168 RAC: 175,146 Level
Scientific publications
|
Excellent job. Multithreaded application works fine on my host, V846 report progress correctly. Now GPUGRID is like Folding@Home project - with apps for the GPUs and multi-core CPUs. Thx. |
|
|
|
I don't have time to check my BOINC client all that often so I am not sure how long this has been going on. I had "Test Applications" disabled but I still found these work units running on my system. It appears GPU tasks will not run with these WUs using all the cores. At least that is my assumption, possibly an incorrect one, since no GPU WU was running or even pending.
I disabled the "Molecular Dynamics on CPU" jobs, updated the BOINC client, aborted all the MT WUs and updated the client again. No GPU WUs were loading so I tried updated the client again with the same results. The BOINC client log says no short or long GPU tasks are available. I checked the server and there are thousands of short and long WU that are unsent. I rebooted and I am still getting the same thing.
Any guesses on how long my GPU will remain idle?
|
|
|
|
The client log also says I have processed my daily quota of 16 tasks. :-(
I suppose that means I won't get any new tasks until tomorrow. I guess my GPU will get another day of vacation. :-) |
|
|
|
Could someone confirm that a client without any special configuration will obtain and execute both CPU and GPU WUs simultaneously?
Matt
|
|
|
|
Could someone confirm that a client without any special configuration will obtain and execute both CPU and GPU WUs simultaneously?
Matt
Only if GPUGrid is the only project the client is attached to. Look at this Event Log snippet from one of my GPUGrid attached machines.
02/08/2014 15:51:51 | boincsimap | [work_fetch] REC 0.000 prio -0.000000 can req work
02/08/2014 15:51:51 | LHC@home 1.0 | [work_fetch] REC 0.429 prio -0.000007 can req work
02/08/2014 15:51:51 | Einstein@Home | [work_fetch] REC 2309.624 prio -0.044407 can req work
02/08/2014 15:51:51 | NumberFields@home | [work_fetch] REC 5219.505 prio -0.088347 can req work
02/08/2014 15:51:51 | GPUGRID | [work_fetch] REC 232810.246 prio -1.957289 can req work
02/08/2014 15:51:51 | SETI@home | [work_fetch] REC 242781.786 prio -2.030169 can req work
<work_fetch_debug> lists projects in priority order for the next work fetch. SIMAP is highest priority because they've been off-stream for about the last month between batches. When SIMAP comes back on-stream on Thursday, work will be fetched preferentially from there to even up resource share and make up for that missing month.
LHC and NumberFields are my other two active CPU projects (Einstein is intel_gpu only on this machine, so let's leave it out for now). When all three are active and have work available, their REC and priority figures will all be jostling around the same levels, and work will be fetched turn-and-turn-about to maintain resource share.
But my two GPU projects - SETI and GPUGrid, they're allocated one GPU each - are in a class of their own. The REC (Recent Estimated Credit - bears no relationship to actual granted credit) from a GPU is so much higher than from a CPU that work fetch priority is driven extremely low - CPU work will only be fetched as a last resort when all other possible sources of supply have been exhausted.
Even if you force it to fetch work by blocking other projects, it still faces a similar priority hurdle before actually running. I've been trying to get a couple of test CPU tasks from SETI to run this afternoon, and I've had to tweak a lot of my normal settings to force them into action.
In short: if a client is running GPU work from GPUGrid, it will have a strong bias towards running "anything except GPUGrid" on its CPUs. |
|
|
floydSend message
Joined: 17 Dec 11 Posts: 11 Credit: 105,502,570 RAC: 0 Level
Scientific publications
|
I don't know about obtaining but I can confirm that Linux BOINC 7.2.47 does not suspend my Einstein GPU WUs when running cpumd on all cores. That must be a BOINC error however.
|
|
|
|
Matt asked:
Could someone confirm that a client without any special configuration will obtain and execute both CPU and GPU WUs simultaneously?
I just reconfigured my Ubuntu box to not have an app_config and test Matt's question. It is currently running two GPUGRID GPU tasks and one CPUMD task using all available CPU's.
The GPU tasks show that they are using 0.756 CPU each and the CPUMD task is using 11 CPU's (all that is allocated to that BOINC client). BOINC has over-commited the CPU's available, but it does seem to be working.
Let us know if you want us to perform a different test with a different configuration.
Hope that helps,
captainjack
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Matt, the harsh reality is that GPU WU's use the CPU and the more you use the CPU for other work the more it impacts upon GPU work.
Your app is fine for systems without an NVidia GPU. Otherwise its a bang your head of a wall exercise!
____________
FAQ's
HOW TO:
- Opt out of Beta Tests
- Ask for Help |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
The Linux version of CPUMD has been updated. Changes:
* Rebase from gmx 4.6 to 5.0
* There are now optimised builds for SSE2, SSE4 and AVX.
* BOINC progress reporting
Matt |
|
|
totoshiSend message
Joined: 30 Aug 14 Posts: 2 Credit: 3,679,508 RAC: 0 Level
Scientific publications
|
Hey there,
I received a very long WU (4377-MJHARVEY_CPUDHFR-0-1-RND8587_1) on my windows machine which has a runtime of approx. 305 hours. Really? The deadline is in one week. ;)
After ~ 10 min my computer has crunched ~ 0,042 %.
I guess, I will not be able to send this WU back in time. ;)
If I remember correctly, then one of my linux machines has such a WU as well.
Should I abort these ones? |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
totosi,
Ignore both the estimated runtime and also the progress monitor - both are wrong. The task ought to take around 24-50h, depending on your machine. Feel free to kill it if it's causing you inconvenience.
Matt |
|
|
totoshiSend message
Joined: 30 Aug 14 Posts: 2 Credit: 3,679,508 RAC: 0 Level
Scientific publications
|
Hi Matt,
Alrighty. I will ignore both. :)
Thx for the information. |
|
|
|
I got one of the test applications for CPUMD.
http://www.gpugrid.net/result.php?resultid=13209824
http://www.gpugrid.net/workunit.php?wuid=10166393
Now at 0.442% progress (occasionally increasing by about 0.002%), 06:43:9 elapsed, 2930:20:57 estimated remaining.
Running high priority. Using 3 CPU cores, the maximum I allow BOINC to use on that computer.
If the estimated time to completion is anything close to accurate, I probably won't allow any more of these to run on that computer.
Running under 64-bit Windows Vista. |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Robert,
Both the runtime estimates and the completion progress are wrong. The task ought take no more than 2 days, depending on the spec of your machine.
Matt |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
New CPUMD version 900 for AVX. Major change to a 64bit build.
You'll need a dev version of BOINC to get the AVX app - 7.2.42 doesn't report cpu caps correctly.
This version of the app report its progress correctly.
(NB - there's also a 900 SSE2 which is the same as the older 850)
Matt |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Haven't been able to receive any new AVX CPUMD tasks with BOINC version 7.4.21 even though 8000 CPUMD tasks are available.
14/10/21 04:21:30 | GPUGRID | No tasks are available for Test application for CPU MD
14/10/21 04:34:59 | GPUGRID | No tasks are available for Test application for CPU MD
14/10/21 07:42:57 | GPUGRID | No tasks are available for Test application for CPU MD
14/10/21 07:49:08 | GPUGRID | No tasks are available for Test application for CPU MD
|
|
|
|
I was also *unable* to get a new CPU task, on BOINC 7.4.22 x64.
Why?
10/21/2014 8:03:18 AM | GPUGRID | [work_fetch] set_request() for CPU: ninst 6 nused_total 2.00 nidle_now 6.00 fetch share 1.00 req_inst 6.00 req_secs 41472.00
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] Starting scheduler request
10/21/2014 8:03:18 AM | GPUGRID | [work_fetch] request: CPU (41472.00 sec, 6.00 inst) miner_asic (0.00 sec, 0.00 inst) NVIDIA GPU (0.00 sec, 0.00 inst)
10/21/2014 8:03:18 AM | GPUGRID | Sending scheduler request: To fetch work.
10/21/2014 8:03:18 AM | GPUGRID | Requesting new tasks for CPU
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] CPU work request: 41472.00 seconds; 6.00 devices
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] miner_asic work request: 0.00 seconds; 0.00 devices
10/21/2014 8:03:18 AM | GPUGRID | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
10/21/2014 8:03:19 AM | GPUGRID | Scheduler request completed: got 0 new tasks
10/21/2014 8:03:19 AM | GPUGRID | [sched_op] Server version 613
10/21/2014 8:03:19 AM | GPUGRID | No tasks sent
10/21/2014 8:03:19 AM | GPUGRID | Project requested delay of 31 seconds
10/21/2014 8:03:19 AM | GPUGRID | [work_fetch] backing off CPU 603 sec
10/21/2014 8:03:19 AM | GPUGRID | [sched_op] Deferring communication for 00:00:31
10/21/2014 8:03:19 AM | GPUGRID | [sched_op] Reason: requested by project |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Jacob, what's the host ID? |
|
|
|
For my work fetch request which yielded 0 CPU tasks...
Computer ID was: 153764
It's an 8-logical-CPU machine, setup to run 75% of CPUs (since I'm running 2 cpu-intensive VM tasks outside of BOINC)... so I usually get MT tasks that run 6 CPUs on this machine. The machine also has 3 NVIDIA GPUs. |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
According to the logs, as of 50mins ago, you should have been receiving work, and the mtsse4 app (which actually maps to the 845 sse2 binary)
The host features reported by your BOINC client are:
host [153764] client [70422] plan_class [mtsse4] effective_cpus [6] [fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe]
As you can see, no AVX!
I am pretty sure this represents a bug in the 7.4.22 BOINC app, since your CPU and OS are manifestly AVX-capable.
Matt
|
|
|
|
2 questions:
Are you sure my computer is avx capable? [This machine just celebrated its 5-year-birthday, with no upgrades to mobo nor CPU] */**
Are you sure the BOINC client is supposed to be able to detect that and relay that to the server scheduler?
If the answer is yes to both, then I'll pass the info to the BOINC Alpha mailing list to see if it's a bug.
* This website seems to indicate I might legitimiately NOT have avx, based on dates:
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions
** Here's my CPU, I believe: http://ark.intel.com/products/37149/Intel-Core-i7-965-Processor-Extreme-Edition-8M-Cache-3_20-GHz-6_40-GTs-Intel-QPI |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Actually, I misread your CPU model - the i7 CPU 965 doesn't have AVX, only SSE4.
Bus are you saying that you are receiving no work at all?
Matt
|
|
|
|
I am not receiving any CPU tasks from GPUGrid, as evidenced by my work fetch request/response that I pasted a couple posts up. |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
BOINC 7.4.21 properly reporting CPU feature set for Intel Ivy Bridge.
14/10/20 18:23:06 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 pbe fsgsbase smep
14/10/21 09:05:00 | GPUGRID | No tasks are available for Test application for CPU MD |
|
|
|
On this page:
http://www.gpugrid.net/apps.php
... it seems that the only x64 version offered is avx.
Or, is it supposed to send me the x86 (mtsse4) app, even though I'm using x64? Because, that did not happen :( |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Getting 901 now? |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Jacob,
As I understand it, you should get the 32b app, even if using a 64bit BOINC client. (Although, for maximum confusion, the 32b app is actually a 64b one)
Matt |
|
|
|
Getting 901 now?
No :(
10/21/2014 9:28:58 AM | GPUGRID | [work_fetch] set_request() for CPU: ninst 6 nused_total 2.00 nidle_now 6.00 fetch share 1.00 req_inst 6.00 req_secs 41472.00
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] Starting scheduler request
10/21/2014 9:28:58 AM | GPUGRID | [work_fetch] request: CPU (41472.00 sec, 6.00 inst) miner_asic (0.00 sec, 0.00 inst) NVIDIA GPU (0.00 sec, 0.00 inst)
10/21/2014 9:28:58 AM | GPUGRID | Sending scheduler request: To fetch work.
10/21/2014 9:28:58 AM | GPUGRID | Requesting new tasks for CPU
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] CPU work request: 41472.00 seconds; 6.00 devices
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] miner_asic work request: 0.00 seconds; 0.00 devices
10/21/2014 9:28:58 AM | GPUGRID | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
10/21/2014 9:29:00 AM | GPUGRID | Scheduler request completed: got 0 new tasks
10/21/2014 9:29:00 AM | GPUGRID | [sched_op] Server version 613
10/21/2014 9:29:00 AM | GPUGRID | No tasks sent
10/21/2014 9:29:00 AM | GPUGRID | Project requested delay of 31 seconds
10/21/2014 9:29:00 AM | GPUGRID | [work_fetch] backing off CPU 658 sec
10/21/2014 9:29:00 AM | GPUGRID | [sched_op] Deferring communication for 00:00:31
10/21/2014 9:29:00 AM | GPUGRID | [sched_op] Reason: requested by project |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
The server has arbitratily decided that the new app versions are "not reliable". I wonder what that means? |
|
|
|
My first avx with error:
http://www.gpugrid.net/result.php?resultid=13227076
Name 86-MJHARVEY_TEST1001-0-1-RND5489_1
Workunit 10157681
Created 17 Oct 2014 | 11:36:40 UTC
Sent 21 Oct 2014 | 15:34:02 UTC
Received 21 Oct 2014 | 15:35:57 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 255 (0xff) Unknown error number
Computer ID 169357
Report deadline 26 Oct 2014 | 15:34:02 UTC
Run time 0.41
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version Test application for CPU MD v9.01 (mtavx)
Stderr output
<core_client_version>7.4.22</core_client_version>
<![CDATA[
<message>
The extended attributes are inconsistent.
(0xff) - exit code 255 (0xff)
</message>
]]>
____________
|
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Multiple Test application for CPU MD v9.01 (mtavx) failures- all with the same error message: The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)
All tasks have 0.00 CPU/run time. |
|
|
|
My machine got work now, a 9.01 mtsse4 task. I will start it soon, and reply with the result. |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Multiple Test application for CPU MD v9.01 (mtavx) failures- all with the same error message: The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)
All tasks have 0.00 CPU/run time.
Yes, my fault. The app build is bad.
Matt |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
902 |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
9.02 is working--BOINC progress is 0.023 at fifteen minutes. Task has 10000hr estimated runtime throwing the 20mgx GPU task to the side. Estimated runtime is going up as task computes- currently @ 12000hr.
If BOINC is correctly reporting task progress-- CPU time till complete-- 10hr for every 1% computed > 1000hr/44days total runtime?
14/10/21 17:06:26 | GPUGRID | [cpu_sched_debug] unfoldx476-NOELIA_UNFOLD-3-72-RND3785_0 sched state 2 next 2 task state 1
14/10/21 17:06:26 | GPUGRID | [cpu_sched_debug] 20mgx396-NOELIA_20MG2-2-50-RND9100_1 sched state 1 next 1 task state 0
14/10/21 17:06:26 | GPUGRID | [cpu_sched_debug] 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 sched state 2 next 2 task state 1
14/10/21 17:10:45 | GPUGRID | [rr_sim_detail] 0.00: starting 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 (4.00 CPU) (514480914.05G/12.54G)
14/10/21 17:10:45 | | [rrsim_detail] rpbest: 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 (finish delay 40205433.52)
14/10/21 17:11:45 | GPUGRID | [rr_sim] 4083-MJHARVEY_CPUDHFR-0-1-RND9529_1 misses deadline by 40416714.74 |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Ok, so the length reporting is still wrong. |
|
|
klepelSend message
Joined: 23 Dec 09 Posts: 189 Credit: 4,531,556,793 RAC: 2,860,019 Level
Scientific publications
|
does it work for pricese puppy 5.7? I do not get any WUs. Thanks. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Just tried one of these WUs. My BOINC preference is set to used 50% of the processors, which is four of my eight. The WU grabbed four; Task Manager showed this task using 50%.
Thermal Radar normally shows my CPU temp at 41C. Within five minutes the temp had shot up to 65C and the red warning came on. I killed the WU.
Anyone else having CPU overheating problems with these WUs? |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
65C is hardly an unreasonable operating temperature for a CPU.
Matt |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
does it work for pricese puppy 5.7? I do not get any WUs. Thanks.
Who is Princess Puppy?
Seriously, tell me the machine ID and I can take a look.
|
|
|
|
I did. i changed "Use at moust 40.00% of CPU time ( 0 means no restriction ).
____________
|
|
|
|
Just tried one of these WUs. My BOINC preference is set to used 50% of the processors, which is four of my eight. The WU grabbed four; Task Manager showed this task using 50%.
Thermal Radar normally shows my CPU temp at 41C. Within five minutes the temp had shot up to 65C and the red warning came on. I killed the WU.
Anyone else having CPU overheating problems with these WUs?
Note 1: On my 8-logical-CPU, Intel i7 965 XE, all 4 cores routinely run at 86*C and near-100% CPU usage, 24/7, for 5 years straight, unless I'm gaming. Core Temp shows that TjMax (thermal limiting temperature) is 100*C, and I've never hit that mark.
Note 2: MT tasks are allowed to overcommit the CPU, especially in cases where they are running alongside coprocessor (GPU/ASIC) tasks or other high-priority CPU tasks. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
65C is hardly an unreasonable operating temperature for a CPU.
Thanks for your responses, Matt & Jacob.
OK. I downloaded another WU and have been running for 2+ hours, with Core Temp running for most of that. Before the WU started, CPU temp was 40C and CPU fan was 2700rpm. The fan is now at 3500rpm and you can see the CPU temps here:
A bit worried about that 75C max...
Am I OK to continue, with the Thermal Radar temp well into the red??
|
|
|
|
Who says red is bad? Maybe red is just "hi", and then ultraviolet neon green is "nuclear"?
:)
Basically, your TjMax is 90. You should feel comfortable going up to 80*C or maybe even 85*C, I'd think, before worrying about heat and stability. |
|
|
AstiesanSend message
Joined: 8 Jun 10 Posts: 3 Credit: 829,088,770 RAC: 4,380,839 Level
Scientific publications
|
Is the time estimate for these work units incorrect? I have no slouch of a processor, a 4790K @ stock speeds, and it estimates 66 hours to completion with just shy of four hours of work done. If this is accurate, I will barely be able to complete the two I have downloaded before their turn in time on the 28th assuming I use my computer for a standard 8 hours a day. |
|
|
Chilean Send message
Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level
Scientific publications
|
I am getting "Download failed" when downloading the CPU app (avx version) on a Windows 7 x64.
EDIT: MD5 check fail.
EDIT2: Checking the "skip image file verification" under BOINC preferences fixed the error... but I'm pretty sure this is not how things should be done.
EDIT3: I re-enabled the image file verification and it managed to download it correctly, yet it only uses 1 core (although BOINC says 8), and the progress report seems to have gotten stuck with:
Log file opened on Fri Oct 24 01:12:46 2014
Host: unknown pid: 6588 rank ID: 0 number of ranks: 1
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: D:\ProgramData\BOINC\slots\8\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 8 -nt 8 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt
Gromacs version: VERSION 5.0.2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
SIMD instructions: AVX_256
FFT library: fftw3
RDTSCP usage: enabled
C++11 compilation: disabled
TNG support: enabled
Tracing support: disabled
Built on: Unknown date
Built by: Anonymous@unknown [CMAKE]
Build OS/arch: Windows-6.1 AMD64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
Build CPU family: 6 Model: 42 Stepping: 7
Build CPU features: apic clfsh cmov cx8 lahf_lm mmx msr pse rdtscp sse2 sse3 ssse3
C compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /MD /O2 /Ob2 /D NDEBUG
C++ compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C++ compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /GR /EHsc /wd4800 /wd4355 /wd4996 /wd4305 /wd4244 /wd4101 /wd4267 /wd4090 /MD /O2 /Ob2 /D NDEBUG
Boost version: 1.55.0 (internal)
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
Changing nstlist from 10 to 20, rlist from 0.9 to 0.928
Input Parameters:
integrator = md-vv
tinit = 0
dt = 0.002
nsteps = 5000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 1993
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 25000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = FALSE
verlet-buffer-tolerance = 0.005
rlist = 0.928
rlistlong = 0.928
nstcalclr = 10
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-switch
rvdw-switch = 0.75
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.1
fourier-nx = 64
fourier-ny = 64
fourier-nz = 64
pme-order = 4
ewald-rtol = 1e-005
ewald-rtol-lj = 1e-005
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = Nose-Hoover
nsttcouple = 10
nh-chain-length = 10
print-nose-hoover-chain-variables = FALSE
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p (3x3):
ref-p[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
refcoord-scaling = No
posres-com (3):
posres-com[0]=0.00000e+000
posres-com[1]=0.00000e+000
posres-com[2]=0.00000e+000
posres-comB (3):
posres-comB[0]=0.00000e+000
posres-comB[1]=0.00000e+000
posres-comB[2]=0.00000e+000
QMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Shake
continuation = FALSE
Shake-SOR = FALSE
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = no
rotation = FALSE
interactiveMD = FALSE
disre = No
disre-weighting = Conservative
disre-mixed = FALSE
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
simulated-tempering = FALSE
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
adress = FALSE
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 48414
ref-t: 300
tau-t: 0.8
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
and there's all there is... I'll reset the project once the GPU WU is completed and see what happens then.
Also, my other AVX-capable machine only downloads SSE2 WUs, not AVX.
____________
|
|
|
|
I'm currently running the mdrun-502-902-avx-64.exe program for Windows 64 bit. The program has been running for about 7 hours and is about to go to 6% complete. However, if I start the task manager it reports the program is only using 27-28% cpu, even though it should be running all 4 cores.
Does it take a while for the program to start using all 4 of the cores?
Here is the stderr.txt output:
BOINC wrapper for GROMACS.
Arg 0 [projects/www.gpugrid.net/mdrun-502-902-avx-64.exe]
Arg 1 [--nthreads]
Arg 2 [4]
BOINC running with [4] threads
BOINC resolving [traj.xtc] to [traj.xtc]
BOINC resolving [topol.tpr] to [topol.tpr]
BOINC resolving [progress.log] to [progress.log]
BOINC resolving [state.cpt] to [state.cpt]
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: C:\ProgramData\BOINC\slots\1\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 4 -nt 4 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt
Reading file topol.tpr, VERSION 4.6.1 (single precision)
Note: file tpx version 83, software tpx version 100
Changing nstlist from 10 to 20, rlist from 0.9 to 0.928
|
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Astiesan wrote:
Is the time estimate for these work units incorrect?
Yes estimates are incorrect.
With AVX 8 threads- the task will take a total ~16hr.
Chilean wrote:
Also, my other AVX-capable machine only downloads SSE2 WUs, not AVX.
Upgrade to BOINC dev 7.4.22 as you did with 3610QM machine. Non-Dev kit are incorrectly reading CPU feature kit for CPUMD tasks.
Boinc127 wrote:
Does it take a while for the program to start using all 4 of the cores?
No- task should use amount threads you've set with BOINC. If you have heavy background processes running concurrent with task this could eat away at cycles. |
|
|
|
I'm using the mdrun-502-902-avx-64.exe program and it should be running on 4 cores but its only running on one core. Task manager says its only using 27% cpu, so about 1 core. I also tested multithreading with the only other MT app I can think of, Milkyway N-Body Simulation, and it runs about 80-85% of the total cpu's (all 4 cpus). I've reset and removed and reattached the project, and it still uses only 1 core. There are no other programs taking heavy compute cycles in the background. |
|
|
|
So the mdrun-502-902-avx-64.exe program is tying up 4 cores but it is only using one core (about 27%). However, if I use 1 core (refreshing the GPUGrid account on BOINC Manager telling it to use 1 core) it uses the same amount of cpu (27% according to task manager) but the time estimate quadruples and the progress is markedly slower. I don't get it. Its obvious that it isn't using 4 cores when it is supposed to... I would expect AVX running on 4 cores would be moderately warm or hot running. Even process explorer reports only about 26 to 27% cpu usage. |
|
|
|
Install Process Explorer. Find the process that is running the task. Double click it. Click the Threads tab. Sort that Threads tab by CPU descending.
If it's running 4-threaded, then the Threads tab should show 4 threads utilizing some CPU.
What do you see there? |
|
|
|
Install Process Explorer. Find the process that is running the task. Double click it. Click the Threads tab. Sort that Threads tab by CPU descending.
If it's running 4-threaded, then the Threads tab should show 4 threads utilizing some CPU.
What do you see there?
mdrun-502-902-avx-64.exe properties
3 Threads.
Thread 1 is 4664 using 24% cpu called mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x131ac
Thread 2 is 4432 using < 0.01% cpu called mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x12880
Thread 3 is 2124 called MSVCR100.dll!endthreadex+0x60
|
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Looks buggered. Would you kill it and start it again?
MAtt |
|
|
|
Sorry I literally killed the process without thinking about it. However, I did download another one and it is doing the exact same thing (using 3 threads, 24.8% cpu). The thread IDs have changed but they are still using the same start addresses:
mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x131ac
mdrun-502-902-avx-64.exe!bwlzh_decompress_verbose+0x12880
MSVCR100.dll!endthreadex+0x60
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Yes estimates are incorrect.
With AVX 8 threads- the task will take a total ~16hr.
I've been though this thread. I installed BOINC 7.4.22 and the estimated remaining time dropped dramatically.
I also Googled AVX. I guess my AMD FX 8350 has it, but do I need to do anything to activate it?
How many hours for AVX 4 threads? In 24 hours I only did 20%.
|
|
|
|
I also Googled AVX. I guess my AMD FX 8350 has it, but do I need to do anything to activate it?
According to the AMD website, FX processors do have the AVX instruction set. All you need then is a compatible operating system. Windows 7 SP1 and Windows 8/8.1 do use and recognize the AVX instruction set so you should be set.
|
|
|
Chilean Send message
Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level
Scientific publications
|
After resetting the project, same thing happened. It says "using 8 threads", yet it only uses one (13% CPU usage).
The progres.txt file doesn't update at all after this:
Log file opened on Fri Oct 24 16:28:33 2014
Host: unknown pid: 5364 rank ID: 0 number of ranks: 1
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: D:\ProgramData\BOINC\slots\0\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 8 -nt 8 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt
Gromacs version: VERSION 5.0.2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
SIMD instructions: AVX_256
FFT library: fftw3
RDTSCP usage: enabled
C++11 compilation: disabled
TNG support: enabled
Tracing support: disabled
Built on: Unknown date
Built by: Anonymous@unknown [CMAKE]
Build OS/arch: Windows-6.1 AMD64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz
Build CPU family: 6 Model: 42 Stepping: 7
Build CPU features: apic clfsh cmov cx8 lahf_lm mmx msr pse rdtscp sse2 sse3 ssse3
C compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /MD /O2 /Ob2 /D NDEBUG
C++ compiler: C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/x86_amd64/cl.exe MSVC 16.0.30319.1
C++ compiler flags: /arch:AVX /DWIN32 /D_WINDOWS /W3 /GR /EHsc /wd4800 /wd4355 /wd4996 /wd4305 /wd4244 /wd4101 /wd4267 /wd4090 /MD /O2 /Ob2 /D NDEBUG
Boost version: 1.55.0 (internal)
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
Changing nstlist from 10 to 20, rlist from 0.9 to 0.928
Input Parameters:
integrator = md-vv
tinit = 0
dt = 0.002
nsteps = 5000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 1993
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 25000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = FALSE
verlet-buffer-tolerance = 0.005
rlist = 0.928
rlistlong = 0.928
nstcalclr = 10
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-switch
rvdw-switch = 0.75
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.1
fourier-nx = 64
fourier-ny = 64
fourier-nz = 64
pme-order = 4
ewald-rtol = 1e-005
ewald-rtol-lj = 1e-005
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = Nose-Hoover
nsttcouple = 10
nh-chain-length = 10
print-nose-hoover-chain-variables = FALSE
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
compressibility[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p (3x3):
ref-p[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
ref-p[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
refcoord-scaling = No
posres-com (3):
posres-com[0]=0.00000e+000
posres-com[1]=0.00000e+000
posres-com[2]=0.00000e+000
posres-comB (3):
posres-comB[0]=0.00000e+000
posres-comB[1]=0.00000e+000
posres-comB[2]=0.00000e+000
QMMM = FALSE
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Shake
continuation = FALSE
Shake-SOR = FALSE
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = no
rotation = FALSE
interactiveMD = FALSE
disre = No
disre-weighting = Conservative
disre-mixed = FALSE
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
deform[ 2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
simulated-tempering = FALSE
E-x:
n = 0
E-xt:
n = 0
E-y:
n = 0
E-yt:
n = 0
E-z:
n = 0
E-zt:
n = 0
swapcoords = no
adress = FALSE
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
grpopts:
nrdf: 48414
ref-t: 300
tau-t: 0.8
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
And the stderr.txt gives this:
BOINC wrapper for GROMACS.
Arg 0 [projects/www.gpugrid.net/mdrun-502-902-avx-64.exe]
Arg 1 [--nthreads]
Arg 2 [8]
BOINC running with [8] threads
BOINC resolving [traj.xtc] to [traj.xtc]
BOINC resolving [topol.tpr] to [topol.tpr]
BOINC resolving [progress.log] to [progress.log]
BOINC resolving [state.cpt] to [state.cpt]
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Sebastian Fritsch
Gerrit Groenhof Christoph Junghans Peter Kasson Carsten Kutzner
Per Larsson Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2014, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: mdrun-502-902-avx-64, VERSION 5.0.2
Executable: D:\ProgramData\BOINC\slots\0\projects\www.gpugrid.net\mdrun-502-902-avx-64.exe
Library dir: C:\Program Files\Gromacs\share\gromacs\top
Command line:
mdrun-502-902-avx-64 -ntomp 8 -nt 8 -x traj.xtc -s topol.tpr -g progress.log -cpi state.cpt
Reading file topol.tpr, VERSION 4.6.1 (single precision)
Note: file tpx version 83, software tpx version 100
Changing nstlist from 10 to 20, rlist from 0.9 to 0.928
So... apparently it gets stuck running on 1 thread doing apparently nothing at all.
____________
|
|
|
|
After resetting the project, same thing happened. It says "using 8 threads", yet it only uses one (13% CPU usage).
The progres.txt file doesn't update at all after this:
Excellent point... I couldn't remember how often the progress.log file updates after the program starts. Like workunits do for some other projects, I figured it has some preliminary work to do before started really crunching with all threads. I thought it was a problem with my computer. It doesn't seem like the program is spinning up enough threads.
And reviewing the older MT tasks I've crunched there is usually a note in stderr.txt or progress.log that mentions running 1 MPI thread and 4 OpenMP threads. I don't see the same note using the AVX program. Perhaps its something very simple like a missing command line argument? |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Yes estimates are incorrect.
With AVX 8 threads- the task will take a total ~16hr.
I've been though this thread. I installed BOINC 7.4.22 and the estimated remaining time dropped dramatically.
I also Googled AVX. I guess my AMD FX 8350 has it, but do I need to do anything to activate it?
How many hours for AVX 4 threads? In 24 hours I only did 20%.
Note: These estimates are for Intel AVX. I'm unsure about AMD AVX CPUMD times.
AMD AVX instructions are computed differently than Intel's. AMD has more Integer execution ports than Floating Point.
FX modules share a AVX FP unit with threads- for every two integer core there is one 128bit AVX capable FP unit. To complete a 256bit AVX instruction: a second 128bit cycle is required. Whereas Intel Sandy/Ivy/Haswell has a 256bit AVX FP unit.
http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/2
http://www.anandtech.com/show/6355/intels-haswell-architecture/8[/url] |
|
|
Chilean Send message
Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level
Scientific publications
|
After resetting the project, same thing happened. It says "using 8 threads", yet it only uses one (13% CPU usage).
The progres.txt file doesn't update at all after this:
Excellent point... I couldn't remember how often the progress.log file updates after the program starts. Like workunits do for some other projects, I figured it has some preliminary work to do before started really crunching with all threads. I thought it was a problem with my computer. It doesn't seem like the program is spinning up enough threads.
And reviewing the older MT tasks I've crunched there is usually a note in stderr.txt or progress.log that mentions running 1 MPI thread and 4 OpenMP threads. I don't see the same note using the AVX program. Perhaps its something very simple like a missing command line argument?
The SSE2 WUs that I ran on this exact same machine updated its progress.txt showing the step number it was on every 10 seconds or so. This one though, seems to get stuck, so I'm pretty sure it's some kind of bug. The stderr.txt file does show that the WU is asking for 8 cores, yet it only "uses" 1, so it seems the bug is located after the initialization of the WU.
____________
|
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Are CPUMD SSE2/4 tasks being sent to AVX hosts? |
|
|
|
Are CPUMD SSE2/4 tasks being sent to AVX hosts?
I do not believe so. I have requested a few CPU test apps (over the past couple of days) and have always gotten the AVX ones. I suppose I could downgrade the BOINC client so it stops recognizing AVX, or set up a specific app_info.xml file, but I wouldn't have any idea what to put in it. |
|
|
Chilean Send message
Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level
Scientific publications
|
I keep getting CPU WUs (that don't work...) even though I unchecked the "Use CPU?" in the setting page for GPUGRID (along with unchecking everything BUT ACEMD LONG RUNS). If the AVX WU problem is not fixed or the whole not-obeying the settings thing, then I'm going to be forced to detach the project entirely (I don't want to play cop and manually abort the CPU WUs that are bugged, it defeats the whole set-it-and-forget-it purpose of BOINC...).
____________
|
|
|
|
does it work for pricese puppy 5.7? I do not get any WUs. Thanks.
Who is Princess Puppy?
I think Puppy Linux Precise 5.7.1
http://www.puppylinux.com/ and http://bkhome.org/blog2/?viewDetailed=00346 for precise
:D |
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Does anybody else have unusually high Task duration correction for GPUGRID since computing or downloading a MDCPU task?
Since I've completed or downloaded a CPUMD task-- GPUGRID Task duration correction factor (9.829525) has sky rocketed. Other Projects are have the correct factor number and task estimates are within normal ranges for CPU/GPU completion times.
Currently there is no CPUMD tasks running or in cache while factor has stayed the same. I just downloaded a new GPU task and the estimate time is at 370Hr.(All CPU/GPU task from GPUGRID are abnormal estimates for a week or so) 7.4.21 Client reverting task factor to the proper number isn't happening (re-setting/letting all tasks in cache finish/completing 10 GPU tasks since)
What is causing the continuation of high correction factors?
Also- Will future MD tasks have GPU support enabled? |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
Also- Will MD task in future have GPU support enabled?
Not in the short-to-medium term - the point of this application is to use CPU.
In the long term, it might support AMD GPUs, but don't quote me on that.
Matt |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
The buggy Windows AVX app is gone now. Please abort any instances of it still running. It's replaced with the working SSE2 app. |
|
|
SpeedySend message
Joined: 19 Aug 07 Posts: 43 Credit: 28,391,082 RAC: 0 Level
Scientific publications
|
I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2
Has anyone else noticed this? The reporting of wrong CPU.
Also could somebody please explain to me how the time is worked out in the progress file? E.g. after 3686000 steps it says under time 7372.00000 |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
The buggy Windows AVX app is gone now. Please abort any instances of it still running. It's replaced with the working SSE2 app.
Working? Yes, but it stops one of my two GPU tasks:
I gave BOINC another CPU thread to play with and the waiting-to-run task restarted, but I immediately got another MJHARVEY, which is hardly likely to complete before its deadline...
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
15 years ago, when I started BOINCing SETI, it was the case, and as far as I know still is, that a feature of BOINC was/is to use only spare cycles, by running at low priority.
Why do JMHARVEYs run at high priority? |
|
|
|
15 years ago, when I started BOINCing SETI, it was the case, and as far as I know still is, that a feature of BOINC was/is to use only spare cycles, by running at low priority.
Why do JMHARVEYs run at high priority?
Different usage and meaning of the word 'priority'.
In the first case, when SETI first started (long before the BOINC platform was created and opened up for other projects), 'priority' referred to the thread/process priority of the task running on the CPU - and it was (and remains) low by comparison to the other primary tasks running on the computer - writing documents, surfing the web, reading emails etc. etc.
In the second case - where you are seeing it displayed in BOINC Manager - the word priority merely refers to the relative processing order of the BOINC tasks in the queue: there is some urgency to run that particular task because the time estimate is suggesting that it might not be completed before deadline. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
15 years ago, when I started BOINCing SETI, it was the case, and as far as I know still is, that a feature of BOINC was/is to use only spare cycles, by running at low priority.
Why do JMHARVEYs run at high priority?
Different usage and meaning of the word 'priority'.
In the first case, when SETI first started (long before the BOINC platform was created and opened up for other projects), 'priority' referred to the thread/process priority of the task running on the CPU - and it was (and remains) low by comparison to the other primary tasks running on the computer - writing documents, surfing the web, reading emails etc. etc.
In the second case - where you are seeing it displayed in BOINC Manager - the word priority merely refers to the relative processing order of the BOINC tasks in the queue: there is some urgency to run that particular task because the time estimate is suggesting that it might not be completed before deadline.
Many thanks for the clarification, Richard :)
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
WU started 8.3 hours ago. It's done 5%. 8.3x20÷24=6.9 days to finish, two days after its deadline.
|
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Tomba- runtime estimates for MDCPU are incorrect. For SSE2/SSE4/AVX tasks you're AMD CPU completes a task in under 24hr with 8threads. For 4threads: 48-72hr.
Unless something changed with App 9.03- a progress file exists showing how many steps have been computed. The progress file is located in allotted slot for MDCPU. 5million total steps for each work unit. An update for amount of steps computed happens every ten or so minutes.
http://www.gpugrid.net/forum_thread.php?id=3898 |
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
I've tweaked the estimated cost, but it'll take a while for the change to propagate.
Matt
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
For 4 threads: ~48hr.
OK. I had aborted that WU and stopped any more, but because of the above I decided to have another go.
At that time BOINC use-at-most preference was at 62.5% = five CPU threads. Darn me if the next MJHARVEY grabbed six! That is not gentlemanly!
I aborted that WU, set the preference to 50%, and the next one grabbed four, but stopped one of my GPU tasks. I gave BOINC another 12.5% and the stopped GPU task resumed, but, lo and behold, I immediately got another MJHARVEY!! (See my post below).
More thought needed, methinks...
|
|
|
MJHProject administrator Project developer Project scientist Send message
Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level
Scientific publications
|
At that time BOINC use-at-most preference was at 62.5% = five CPU threads. Darn me if the next MJHARVEY grabbed six! That is not gentlemanly!
That's a real number to integer rounding problem. Might be able to fix that, depending on where the conversion's made.
Mjh
|
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Holy Moses! Just finished my dinner and checked. One of my GPU tasks had stopped!
Suspended the active MJHARVEY and the other one, which should never have been downloaded, started.
I've had enough nurse-maiding. I'm out of here. Sorry... |
|
|
|
Tomba:
Please understand what is happening with the task scheduling, by reading this post:
http://www.gpugrid.net/forum_thread.php?id=3898&nowrap=true#38505
And if you feel the need to go into the web prefs to temporarily disable the GPUGrid MT CPU app, then by all means, feel free. It's obviously got some time estimation issues that are erroneously making them run as "high-priority" (earliest deadline first) mode, scheduled ahead of GPU jobs, which could interfere with your normal scheduling.
Kind regards,
Jacob |
|
|
SpeedySend message
Joined: 19 Aug 07 Posts: 43 Credit: 28,391,082 RAC: 0 Level
Scientific publications
|
I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2
For those interested the above task finished much sooner than I expected. Finished in 16.4 hours run-time and a CPU time 193.15 CPU hours |
|
|
Chilean Send message
Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level
Scientific publications
|
I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2
Has anyone else noticed this? The reporting of wrong CPU.
Also could somebody please explain to me how the time is worked out in the progress file? E.g. after 3686000 steps it says under time 7372.00000
I think that CPU is the CPU used to compile the software...
In other words it might be MJH's CPU.
____________
|
|
|
SpeedySend message
Joined: 19 Aug 07 Posts: 43 Credit: 28,391,082 RAC: 0 Level
Scientific publications
|
I am running the CPU application version 9.01 I noticed when I opened the progress text file it tells me I am running the following CPU
Build CPU brand: Intel(R) Core(TM) i3-2365M CPU @ 1.40GHz however I am actually running and I 7 980 X the task is currently 62.9% complete. With another estimated 16 hours to go task name 1981-MJHARVEY_CPUDHFR-0-1-RND0908_2
Has anyone else noticed this? The reporting of wrong CPU.
Also could somebody please explain to me how the time is worked out in the progress file? E.g. after 3686000 steps it says under time 7372.00000
I think that CPU is the CPU used to compile the software...
In other words it might be MJH's CPU.
Thank you for the explanation, that would make sense. Interesting how the application doesn't get this information direct from Boinc. However I can understand how the above works |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Tomba:
Please understand what is happening with the task scheduling, by reading this post:
http://www.gpugrid.net/forum_thread.php?id=3898&nowrap=true#38505
Thanks for the post, Jacob. I checked that thread but must confess most of it went right over my head!
Yesterday I reactivated the new CPU WUs. There were none, but overnight I got an MJHARVEY. I was happy to see that, with BOINC given 50% of my CPUs, no GPU task had been suspended.
The MJHARVEY just completed in a little over 16 hours, having used four CPUs.
I was not a little disappointed to get a mere 922 credits for my PCs efforts vs. 12k-19k for my GPUs. Perhaps the difference is a measure of the relative importance to the project of these new CPU tasks? |
|
|
|
I encourage you to try to read through the thread one more time. Basically, when a task has a risk of not being able to meet a deadline unless it is given priority, BOINC will run it in "high-priority" mode.
And, if you read that post, paying special attention to the ordering, you will see that "high-priority CPU tasks" get scheduled BEFORE any "regular-priority GPU tasks".
So, if an MT task happens to go high-priority, then you can expect BOINC to only schedule up-to-1-CPU-worth of regular-priority GPU tasks. And, if *2* MT tasks go high-priority, then you can expect BOINC to not run any GPU tasks. I suspect that is the behavior that you saw.
Hope that helps,
Jacob |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
I encourage you to try to read through the thread one more time. Basically, when a task has a risk of not being able to meet a deadline unless it is given priority, BOINC will run it in "high-priority" mode.
I guess priority is a dead duck for these WUs. The task I got here did not say "high priority", and the deadline was Jan 5!!
|
|
|
eXaPowerSend message
Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level
Scientific publications
|
Runtime is excellent with 4 threads. As is performance--
(ns/day)
7.436
(hour/ns)
3.228
AVX or SSE2 task-- stderr text states: projects/www.gpugrid.net/mdrun-463-901-sse-32.exe
If task priority is a "dead duck" then a system with two or more GPU's won't need "nurse-maiding"!
Are more January 5 deadline MJH tasks available for testing? |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
Runtime is excellent with 4 threads.
That's my trusty AMD FX-8350!
If task priority is a "dead duck" then a system with two or more GPU's won't need "nurse-maiding"!
In fact, it looks like Matt has fixed "That's a real number to integer rounding problem. Might be able to fix that, depending on where the conversion's made." |
|
|
|
Why CPU MD running on does not show the percentage of the job?
is there AVX version of this app for new CPUS? |
|
|
|
Why CPU MD running on does not show the percentage of the job?
is there AVX version of this app for new CPUS? |
|
|
|
Excuse me! Some problems with Mozilla Browser. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
65C is hardly an unreasonable operating temperature for a CPU.
Thanks for your responses, Matt & Jacob.
OK. I downloaded another WU and have been running for 2+ hours, with Core Temp running for most of that. Before the WU started, CPU temp was 40C and CPU fan was 2700rpm. The fan is now at 3500rpm and you can see the CPU temps here:
A bit worried about that 75C max...
Am I OK to continue, with the Thermal Radar temp well into the red??
Earlier today I noticed that, with four cores allocated to MJHARVEYs, my Thermal Radar temp had dropped from a steady 68C to 55C. Quite a surprise!!
I gave BOINC two more cores to play with (= six) and my temp is a steady 65C.
I did nothing to bring about this change....
|
|
|
Chilean Send message
Joined: 8 Oct 12 Posts: 98 Credit: 385,652,461 RAC: 0 Level
Scientific publications
|
PLEASE fix the problem that allows GPUGrid to send CPU WUs even though I explicitly stated in the settings to not do so.
____________
|
|
|
|
What are your exact settings? |
|
|
AstiesanSend message
Joined: 8 Jun 10 Posts: 3 Credit: 829,088,770 RAC: 4,380,839 Level
Scientific publications
|
mdrun-463-901-sse-32 causes a soft system freeze occassionally when exiting active state into sleeping state i.e. screensaver off to on.
By soft system freeze, I mean that the start bar/menu (I do use start8, but it's confirmed to occur without this active as well), all parts of it are locked. Windows-R can bring up the Run menu, and I can use cmd and taskkill mdrun and the start menu itself will return to normalcy, however the bar will continue to be unresponsive. Killing explorer.exe to reset the start bar will result in a hard freeze requiring reboot. During the soft freeze, alt-tab and other windows will be VERY slow to respond until mdrun is killed, afterwards all other windows work fine, but the start bar is unusable and will force a reboot of the system.
I've found through further testing, that boincmgr is also wholly unresponsive during this as well. There doesn't seem to be a consistent cause, I've had it go a few days between doing this, and other times it happens literally every 3-5 minutes (which my screensaver is set for 3 minutes of idle).
There is nothing in the error logs at all, they are 0KB.
Any assistance or ideas in resolving this would be appreciated.
My system:
Windows 8.1 64-bit
i7 4790K @ stock
ASRock Z97-Extreme4
EVGA GTX 970 SC ACX @ stock 344.60 drivers
2x8GB HyperX Fury DDR3-1866 @ stock
I posted this in the other CPU WU thread, but it seems to have not been seen. |
|
|
tomba Send message
Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level
Scientific publications
|
PLEASE fix the problem that allows GPUGrid to send CPU WUs even though I explicitly stated in the settings to not do so.
I fixed it by setting my PC's location to "School" and setting "Molecular Dynamics on CPU: no" in the school preferences.
|
|
|