Message boards :
Graphics cards (GPUs) :
GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello everyone, I'm creating this thread to document my GPUGrid GPU Task performance variances, while testing things such as: - GPU task with no other tasks - GPU task with full CPU load - GPU task with overloaded CPU load - Multiple GPU tasks on 1 video card My system (as of right now) is: Intel Core i7 965 Extreme (quad-core, hyper-threaded, Windows sees 8 processors) Memory: 6GB GPU device 0: eVGA GeForce GTX 660 Ti 3GB FTW (primary display) GPU device 1: eVGA GeForce GTX 460 (not connected to any display) OS: Windows 8 Pro x64 with Media Center So far, I have some interesting results to share, and would like to "get the word out". If you'd like to share your results within this thread, feel free. Regards, Jacob |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I originally did some performance testing in another thread, but wanted the results consolidated into this "GPU Task Performance" thread. That thread is titled "app_config.xml", and is located here: http://www.gpugrid.net/forum_thread.php?id=3319 Note: The post within that thread, which contains the app_config values that I recommend using, can be found here: http://www.gpugrid.net/forum_thread.php?id=3319#29216 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here are the first results (from running only on my GTX 660 Ti), copied from that thread: ======================================================================== Running with no other tasks (every other BOINC task and project was suspended, so the single GPUGrid task was free to use up the whole CPU core): Task: 6669110 Name: I23R54-NATHAN_dhfr36_3-17-32-RND2572_0 URL: http://www.gpugrid.net/result.php?resultid=6669110 Run time (sec): 19,085.32 CPU time (sec): 19,043.17 ======================================================================== Running at <cpu_usage>0.001</cpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks: Task: 6673077 Name: I11R21-NATHAN_dhfr36_3-18-32-RND5041_0 URL: http://www.gpugrid.net/result.php?resultid=6673077 Run time (sec): 19,488.65 CPU time (sec): 19,300.91 Task: 6674205 Name: I25R97-NATHAN_dhfr36_3-13-32-RND4438_0 URL: http://www.gpugrid.net/result.php?resultid=6674205 Run time (sec): 19,542.35 CPU time (sec): 19,419.97 Task: 6675877 Name: I25R12-NATHAN_dhfr36_3-19-32-RND6426_0 URL: http://www.gpugrid.net/result.php?resultid=6675877 Run time (sec): 19,798.77 CPU time (sec): 19,606.33 ======================================================================== CONCLUSION: So, as expected, there is some minor CPU contention whilst under full load, but not much (Task Run time is maybe ~3% slower). It's not affected much because the ACEMD process actually runs at a higher priority than other BOINC task processes, and therefor, are never starved for CPU, and are likely only minorly starved for contention during CPU process context switching. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here are some more results, where I focused on the "short" Nathan units: ======================================================================== Running with no other tasks (every other BOINC task and project was suspended, so the single GPUGrid task was free to use up the whole CPU core): Task: 6678769 Name: I1R110-NATHAN_RPS1_respawn3-10-32-RND4196_2 URL: http://www.gpugrid.net/result.php?resultid=6678769 Run time (sec): 8,735.43 CPU time (sec): 8,710.61 Task: 6678818 Name: I1R42-NATHAN_RPS1_respawn3-12-32-RND1164_1 URL: http://www.gpugrid.net/result.php?resultid=6678818 Run time (sec): 8,714.75 CPU time (sec): 8,695.18 ======================================================================== Running at <cpu_usage>0.001</cpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks: Task: 6678817 Name: I1R436-NATHAN_RPS1_respawn3-13-32-RND2640_1 URL: http://www.gpugrid.net/result.php?resultid=6678817 Run time (sec): 8,949.63 CPU time (sec): 8,897.27 Task: 6679874 Name: I1R414-NATHAN_RPS1_respawn3-7-32-RND6785_1 URL: http://www.gpugrid.net/result.php?resultid=6679874 Run time (sec): 8,828.17 CPU time (sec): 8,786.48 Task: 6679828 Name: I1R152-NATHAN_RPS1_respawn3-5-32-RND8187_0 URL: http://www.gpugrid.net/result.php?resultid=6679828 Run time (sec): 8,891.22 CPU time (sec): 8,827.11 ======================================================================== CONCLUSION: So, again, as expected, there is only slight contention while under full CPU load, because the ACEMD process actually runs at a higher priority than other BOINC task processes, and therefor, are never starved for CPU, and are likely only minorly starved for contention during CPU process context switching. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, previously, I was only running 1 GPU Task on that GPU (and the GPU Load would usually be around 87%-88%). But I wanted to find out what would happen when I run 2. So, the following tests will use <gpu_usage>0.5</gpu_usage> ... in my app_config.xml. Note: The GPU Load goes to ~97% when I do this, and I believe that's a good thing! ======================================================================== Long-run Nathan tasks... Running at <cpu_usage>0.001</cpu_usage>, <gpu_usage>0.5</gpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks: Name: I19R1-NATHAN_dhfr36_3-22-32-RND2354_0 URL: http://www.gpugrid.net/result.php?resultid=6684711 Run time (sec): 35,121.51 CPU time (sec): 34,953.33 Name: I6R6-NATHAN_dhfr36_3-18-32-RND0876_0 URL: http://www.gpugrid.net/result.php?resultid=6685136 Run time (sec): 39,932.98 CPU time (sec): 39,549.67 Name: I22R42-NATHAN_dhfr36_3-15-32-RND5482_0 URL: http://www.gpugrid.net/result.php?resultid=6685907 Run time (sec): 35,077.12 CPU time (sec): 34,889.61 Name: I31R89-NATHAN_dhfr36_3-21-32-RND1236_0 URL: http://www.gpugrid.net/result.php?resultid=6687190 Run time (sec): 35,070.94 CPU time (sec): 34,901.26 Name: I8R42-NATHAN_dhfr36_3-22-32-RND2877_1 URL: http://www.gpugrid.net/result.php?resultid=6688517 Run time (sec): 32,339.90 CPU time (sec): 32,082.15 ======================================================================== Short-run Nathan tasks... Running at <cpu_usage>0.001</cpu_usage>, <gpu_usage>0.5</gpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks: Name: I1R318-NATHAN_RPS1_respawn3-11-32-RND9241_0 URL: http://www.gpugrid.net/result.php?resultid=6684931 Run time (sec): 12,032.03 CPU time (sec): 11,959.47 Name: I1R303-NATHAN_RPS1_respawn3-14-32-RND0610_0 URL: http://www.gpugrid.net/result.php?resultid=6690144 Run time (sec): 14,621.04 CPU time (sec): 10,697.88 ======================================================================== CONCLUSIONS: Long-run Nathan units: 1-at-a-time + full CPU load: ~19,600 run time per task 2-at-a-time + full CPU load: ~35,100 run time per task Speedup: 1 - (35,100 / (19,600 * 2)) = 10.5% improvement Short-run Nathan units: 1-at-a-time + full CPU load: ~8,900 run time per task 2-at-a-time + full CPU load: ~13,300 run time per task Speedup: 1 - (13,300 / 8,900 * 2)) = 25.3% improvement So far, it looks like running multiple tasks at a time... GETS WORK DONE QUICKER! Now, admittedly, I am estimating on very few results here, but.. I'll continue using this "2-at-a-time" approach, and will reply here if I find anything different. |
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 8,998 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is very good info. However, I need to point out a couple potential down-side issues: 1) even with 2 tasks per GPU via app_config.xml, it does not increase the number of tasks you can download. For example, on my 4 GPU machine, it normally has 4 running, and 4 waiting to run. Running 8 at once means all 8 are running. So now there is a delay between the time a task completes, uploads, reports, a new task is downloaded (big file), and starts running. That *may* wipe out any utilization advantage. 2) The longer run-time with 2 tasks per GPU *may* cause them to miss the credit bonus for early returns. YMMV Reno, NV Team: SETI.USA |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Point 1: ideally this would average out after some time, so that the different WUs per GPU finish at different times. Depending on your upload speed this might provide enough overlap to avoid running dry. Having more GPUs & WUs in flight should help with this issue. Point 2: correct! MrS Scanning for our furry friends since Jan 2002 |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Point 1: ideally this would average out after some time, so that the different WUs per GPU finish at different times. Depending on your upload speed this might provide enough overlap to avoid running dry. Having more GPUs & WUs in flight should help with this issue. To clarify, a simple example: a machine with 1 GPU would get 2 WUs and if these are not in sync, then while uploading/downloading 1 WU the other WU would run at 2x the speed. A real workaround would be to run the 2x WUs on a box with 1 NV and 1 ATI running on a different project, then 4 WUs would be allocated for the machine. As an aside I think running GPUGrid WUs 2x is a bad idea due to longer turn around time and possible errors. A machine reboot or GPU error (or as Jacob pointed out on the BOINC list, a BOINC restart) would be more likely to take out 2 of these long WUs instead of 1. |
|
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
After some setup difficulties, I now have two long run tasks running - one on each of my GTX 650 Ti GPUs. GPUGrid runs 24/7 on this AMD A10 based PC and there are always two tasks running with either one or two waiting to run. As each GTX 650 processes at a slightly different rate the number of tasks waiting to run varies. I believe this will maximize output from my PC enabling me to make the maximum contribution to the research. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
John, My research indicates that you might be able to contribute more to the project, if you run 2 tasks on each of your GPUs, assuming the tasks don't result in computation errors. You might try that, using the app_config.xml file, and see if your overall performance increases. I was able to see gains in GPU Load (seen via a program called GPU-Z), as well as increased throughput (seen by looking at task times, as noted within this thread). Regards, Jacob |
|
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, Jacob. I am very inexperienced in writing .xml files and fear losing running tasks through syntax errors. I would like to take it one step at a time for now and, maybe in a couple of weeks, try your suggestion. I will likely ask for help..... Thanks for the suggestion. John |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No problem. It's really not that hard, so don't be afraid, and... when you're ready, I encourage you to read this entire thread, which has details and examples: "app_config.xml" located here: http://www.gpugrid.net/forum_thread.php?id=3319 - Jacob |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Careful, guys. The GTX650Ti (Johns GPUs) sounds like it's almost the same as a GTX660Ti (Jacobs GPUs), but it's actually about a factor of 2 slower. Currently 70k credit long-runs take John 33k seconds, running 2 of them might require ~60 ks. That's almost one day, so we're getting close to missing the deadline for the credit-bonus here for even longer tasks (some give 150k credits, so should take over twice as long). And this is not only about credits: the credit bonus is there to encourage people to return results early. The project needs this as much as it needs many WUs done in parallel. As long as we're still making the deadline for the credit bonus we can be sure to return results as quickly as they want us to return them. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sure, in order to get maximum bonus credits, you'll have to be careful to make sure you complete all your tasks within 24 hours. And, in general, they want results returned quickly. But, in order to help the project the most, throughput (how fast can you do tasks) is the factor to measure, and the "deadline" is the task's deadline, which usually is a few days I think. If the administrators deem that a task must be done at a certain time, then I hope they are setting task deadline appropriately. |
|
Send message Joined: 17 Feb 13 Posts: 181 Credit: 144,871,276 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks, Gentlemen: I will leave this alone for now..... With falling prices for the GTX 660 Ti, I may add one to my other AMD A10 based PC in September around my birthday. John |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
You still have plenty of testing to do; all the possible same and mixed WU combinations would need to be looked at:
NATHAN_dhfr36 (Long) + NOELIA_TRYP (Short) NATHAN_dhfr36 (Long) + NATHAN_stpwt1 (Short) NOELIA_148n (Long) + NOELIA_TRYP (Short) NOELIA_148n (Long) + NATHAN_stpwt1 (Short) NOELIA_TRYP (Short) + NATHAN_stpwt1 (Short)
FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, I still have testing to do. You can/should test too! It's not easy to cherry-pick certain task type combinations -- I usually just let any task types run together. Maybe once I find even more time to test, I'll attempt doing the specific-combination testing, using custom suspending, and more vigilant monitoring. As far as "freeing up a core", my research indicates that, at this point, doing so is COMPLETELY UNNECESSARY, at least for me. If you look at the acemd processes in Process Explorer, you'll see that process priority is 6, and the CPU-intensive-thread priority is either 6 or 7. This ensures that the thread and process do not get swapped out of the processor, even when I'm running a full load of other CPU tasks, since those CPU tasks are usually priority 1 or 4. Watching how the CPU time gets divvied up (in Process Explorer, or in Task Manager), also proves it -- you'll see the other processes getting less-than-a-core, but you'll see the acemd process "suffer" much. Plus, as you said, sometimes the GPUGrid tasks don't require much CPU at all (like when a NATHAN Long-run is on my GTX 460), so, reserving a core is sheer waste at that point, at least for my goals. So I won't do it. I'm not trying to speculate here, and I'm certainly not trying to find reasons not to run multiple tasks on the same GPU. I think it's worth it. What I'm trying to do is show the results that I have achieved, using my goals (maximize throughput for GPUGrid, without sacrificing any throughput for my other projects), and I encourage others to do the same. Thanks, Jacob |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't have much time to test, but OK, I'll do a little bit... System: GTX660Ti @1202MHz, i7-3770K CPU @4.2GHz, 8GB DDR3 2133, SATAIII drive, W7x64, 310.90 drivers, Boinc 7.0.60. I've started using your suggested app_config.xml file: <app_config> <app> <name>acemdbeta</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> <app> <name>acemdlong</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> <app> <name>acemd2</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> <app> <name>acemdshort</name> <max_concurrent>9999</max_concurrent> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.001</cpu_usage> </gpu_versions> </app> </app_config>
FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Sounds good, thanks for testing. Note: When running 2-at-a-time, I expect tasks to take slightly-less-than-double what they normally take, which would mean they are being processed faster over-all. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Ah, you bring up a good point, I forgot to mention my clocking experiences with my Keplar architecture eVGA GTX 660 Ti 3GB FTW card... - It's base clock is 1045 MHz, which I think is the lowest clock it can be while running a 3d application or GPU task. - When GPU Load is not great (~60-75%), I think it usually upclocks a little (maybe up to 1160 MHz), but because it sees the application as "not demanding a lot", it doesn't try hard to upclock. - When GPU Load is decent-ish (86%), it auto-upclocks a bit (usually to around 1215 MHz or 1228 MHz I think), with Power Consumption around 96-98% TDP. - When GPU Load is better-saturated (97%-99%), it usually tries to upclock higher, but reaches a thermal limit. It usually ends up clocked at around 1180-1215 MHz, with a temperature of 84*C-89*C, at a Power Consumption around 96%. - TIP: At that saturation, if you want, you can usually allow it to auto-upclock just a tad more, by using whatever overclock tools you have (I have eVGA Precision X), and just adjust the "Power Target". By default, I think the driver sets a Power Target of 100%, but what I usually do is adjust it to 140%. This let's it auto-clock higher, until it starts really hitting those thermal limits. My end result: My card usually runs at 1215 MHz, 86*C - 90*C, with Power Consumption around 106% TDP. So, running at higher GPU Load keeps it clocked high, as high as the thermal limits can let it... which is a good thing, if you care more about GPUGrid throughput than the lifespan of your GPU. :) Regards, Jacob |
©2025 Universitat Pompeu Fabra