GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)

Message boards : Graphics cards (GPUs) : GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29297 - Posted: 31 Mar 2013, 15:37:02 UTC
Last modified: 31 Mar 2013, 15:39:24 UTC

Hello everyone,

I'm creating this thread to document my GPUGrid GPU Task performance variances, while testing things such as:
- GPU task with no other tasks
- GPU task with full CPU load
- GPU task with overloaded CPU load
- Multiple GPU tasks on 1 video card

My system (as of right now) is:
Intel Core i7 965 Extreme (quad-core, hyper-threaded, Windows sees 8 processors)
Memory: 6GB
GPU device 0: eVGA GeForce GTX 660 Ti 3GB FTW (primary display)
GPU device 1: eVGA GeForce GTX 460 (not connected to any display)
OS: Windows 8 Pro x64 with Media Center

So far, I have some interesting results to share, and would like to "get the word out". If you'd like to share your results within this thread, feel free.

Regards,
Jacob
ID: 29297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29298 - Posted: 31 Mar 2013, 15:42:40 UTC - in response to Message 29297.  
Last modified: 31 Mar 2013, 15:45:34 UTC

I originally did some performance testing in another thread, but wanted the results consolidated into this "GPU Task Performance" thread.

That thread is titled "app_config.xml", and is located here:
http://www.gpugrid.net/forum_thread.php?id=3319

Note: The post within that thread, which contains the app_config values that I recommend using, can be found here:
http://www.gpugrid.net/forum_thread.php?id=3319#29216
ID: 29298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29299 - Posted: 31 Mar 2013, 15:43:03 UTC - in response to Message 29298.  
Last modified: 31 Mar 2013, 16:03:54 UTC

Here are the first results (from running only on my GTX 660 Ti), copied from that thread:

========================================================================
Running with no other tasks (every other BOINC task and project was suspended, so the single GPUGrid task was free to use up the whole CPU core):

Task: 6669110
Name: I23R54-NATHAN_dhfr36_3-17-32-RND2572_0
URL: http://www.gpugrid.net/result.php?resultid=6669110
Run time (sec): 19,085.32
CPU time (sec): 19,043.17

========================================================================
Running at <cpu_usage>0.001</cpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:

Task: 6673077
Name: I11R21-NATHAN_dhfr36_3-18-32-RND5041_0
URL: http://www.gpugrid.net/result.php?resultid=6673077
Run time (sec): 19,488.65
CPU time (sec): 19,300.91

Task: 6674205
Name: I25R97-NATHAN_dhfr36_3-13-32-RND4438_0
URL: http://www.gpugrid.net/result.php?resultid=6674205
Run time (sec): 19,542.35
CPU time (sec): 19,419.97

Task: 6675877
Name: I25R12-NATHAN_dhfr36_3-19-32-RND6426_0
URL: http://www.gpugrid.net/result.php?resultid=6675877
Run time (sec): 19,798.77
CPU time (sec): 19,606.33
========================================================================

CONCLUSION:
So, as expected, there is some minor CPU contention whilst under full load, but not much (Task Run time is maybe ~3% slower). It's not affected much because the ACEMD process actually runs at a higher priority than other BOINC task processes, and therefor, are never starved for CPU, and are likely only minorly starved for contention during CPU process context switching.
ID: 29299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29300 - Posted: 31 Mar 2013, 15:44:03 UTC - in response to Message 29299.  
Last modified: 31 Mar 2013, 16:04:21 UTC

Here are some more results, where I focused on the "short" Nathan units:

========================================================================
Running with no other tasks (every other BOINC task and project was suspended, so the single GPUGrid task was free to use up the whole CPU core):

Task: 6678769
Name: I1R110-NATHAN_RPS1_respawn3-10-32-RND4196_2
URL: http://www.gpugrid.net/result.php?resultid=6678769
Run time (sec): 8,735.43
CPU time (sec): 8,710.61

Task: 6678818
Name: I1R42-NATHAN_RPS1_respawn3-12-32-RND1164_1
URL: http://www.gpugrid.net/result.php?resultid=6678818
Run time (sec): 8,714.75
CPU time (sec): 8,695.18

========================================================================
Running at <cpu_usage>0.001</cpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:

Task: 6678817
Name: I1R436-NATHAN_RPS1_respawn3-13-32-RND2640_1
URL: http://www.gpugrid.net/result.php?resultid=6678817
Run time (sec): 8,949.63
CPU time (sec): 8,897.27

Task: 6679874
Name: I1R414-NATHAN_RPS1_respawn3-7-32-RND6785_1
URL: http://www.gpugrid.net/result.php?resultid=6679874
Run time (sec): 8,828.17
CPU time (sec): 8,786.48

Task: 6679828
Name: I1R152-NATHAN_RPS1_respawn3-5-32-RND8187_0
URL: http://www.gpugrid.net/result.php?resultid=6679828
Run time (sec): 8,891.22
CPU time (sec): 8,827.11
========================================================================

CONCLUSION:
So, again, as expected, there is only slight contention while under full CPU load, because the ACEMD process actually runs at a higher priority than other BOINC task processes, and therefor, are never starved for CPU, and are likely only minorly starved for contention during CPU process context switching.
ID: 29300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29302 - Posted: 31 Mar 2013, 16:01:12 UTC - in response to Message 29300.  
Last modified: 31 Mar 2013, 16:15:36 UTC

So, previously, I was only running 1 GPU Task on that GPU (and the GPU Load would usually be around 87%-88%). But I wanted to find out what would happen when I run 2.

So, the following tests will use <gpu_usage>0.5</gpu_usage> ... in my app_config.xml.

Note: The GPU Load goes to ~97% when I do this, and I believe that's a good thing!

========================================================================
Long-run Nathan tasks...
Running at <cpu_usage>0.001</cpu_usage>, <gpu_usage>0.5</gpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:

Name: I19R1-NATHAN_dhfr36_3-22-32-RND2354_0
URL: http://www.gpugrid.net/result.php?resultid=6684711
Run time (sec): 35,121.51
CPU time (sec): 34,953.33

Name: I6R6-NATHAN_dhfr36_3-18-32-RND0876_0
URL: http://www.gpugrid.net/result.php?resultid=6685136
Run time (sec): 39,932.98
CPU time (sec): 39,549.67

Name: I22R42-NATHAN_dhfr36_3-15-32-RND5482_0
URL: http://www.gpugrid.net/result.php?resultid=6685907
Run time (sec): 35,077.12
CPU time (sec): 34,889.61

Name: I31R89-NATHAN_dhfr36_3-21-32-RND1236_0
URL: http://www.gpugrid.net/result.php?resultid=6687190
Run time (sec): 35,070.94
CPU time (sec): 34,901.26

Name: I8R42-NATHAN_dhfr36_3-22-32-RND2877_1
URL: http://www.gpugrid.net/result.php?resultid=6688517
Run time (sec): 32,339.90
CPU time (sec): 32,082.15

========================================================================
Short-run Nathan tasks...
Running at <cpu_usage>0.001</cpu_usage>, <gpu_usage>0.5</gpu_usage>, BOINC set at 100% processors, along with a full load of other GPU/CPU tasks:

Name: I1R318-NATHAN_RPS1_respawn3-11-32-RND9241_0
URL: http://www.gpugrid.net/result.php?resultid=6684931
Run time (sec): 12,032.03
CPU time (sec): 11,959.47

Name: I1R303-NATHAN_RPS1_respawn3-14-32-RND0610_0
URL: http://www.gpugrid.net/result.php?resultid=6690144
Run time (sec): 14,621.04
CPU time (sec): 10,697.88

========================================================================

CONCLUSIONS:

Long-run Nathan units:
1-at-a-time + full CPU load: ~19,600 run time per task
2-at-a-time + full CPU load: ~35,100 run time per task
Speedup: 1 - (35,100 / (19,600 * 2)) = 10.5% improvement

Short-run Nathan units:
1-at-a-time + full CPU load: ~8,900 run time per task
2-at-a-time + full CPU load: ~13,300 run time per task
Speedup: 1 - (13,300 / 8,900 * 2)) = 25.3% improvement

So far, it looks like running multiple tasks at a time... GETS WORK DONE QUICKER!

Now, admittedly, I am estimating on very few results here, but.. I'll continue using this "2-at-a-time" approach, and will reply here if I find anything different.
ID: 29302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 9,935
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29313 - Posted: 3 Apr 2013, 1:18:10 UTC

This is very good info. However, I need to point out a couple potential down-side issues:

1) even with 2 tasks per GPU via app_config.xml, it does not increase the number of tasks you can download. For example, on my 4 GPU machine, it normally has 4 running, and 4 waiting to run. Running 8 at once means all 8 are running. So now there is a delay between the time a task completes, uploads, reports, a new task is downloaded (big file), and starts running. That *may* wipe out any utilization advantage.

2) The longer run-time with 2 tasks per GPU *may* cause them to miss the credit bonus for early returns.

YMMV
Reno, NV
Team: SETI.USA
ID: 29313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29314 - Posted: 3 Apr 2013, 17:49:45 UTC - in response to Message 29313.  

Point 1: ideally this would average out after some time, so that the different WUs per GPU finish at different times. Depending on your upload speed this might provide enough overlap to avoid running dry. Having more GPUs & WUs in flight should help with this issue.

Point 2: correct!

MrS
Scanning for our furry friends since Jan 2002
ID: 29314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29320 - Posted: 5 Apr 2013, 12:24:35 UTC - in response to Message 29314.  

Point 1: ideally this would average out after some time, so that the different WUs per GPU finish at different times. Depending on your upload speed this might provide enough overlap to avoid running dry. Having more GPUs & WUs in flight should help with this issue.

To clarify, a simple example: a machine with 1 GPU would get 2 WUs and if these are not in sync, then while uploading/downloading 1 WU the other WU would run at 2x the speed. A real workaround would be to run the 2x WUs on a box with 1 NV and 1 ATI running on a different project, then 4 WUs would be allocated for the machine. As an aside I think running GPUGrid WUs 2x is a bad idea due to longer turn around time and possible errors. A machine reboot or GPU error (or as Jacob pointed out on the BOINC list, a BOINC restart) would be more likely to take out 2 of these long WUs instead of 1.
ID: 29320 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29321 - Posted: 5 Apr 2013, 15:17:53 UTC

After some setup difficulties, I now have two long run tasks running - one on each of my GTX 650 Ti GPUs. GPUGrid runs 24/7 on this AMD A10 based PC and there are always two tasks running with either one or two waiting to run. As each GTX 650 processes at a slightly different rate the number of tasks waiting to run varies. I believe this will maximize output from my PC enabling me to make the maximum contribution to the research.
ID: 29321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29322 - Posted: 5 Apr 2013, 15:22:46 UTC - in response to Message 29321.  
Last modified: 5 Apr 2013, 15:23:17 UTC

John,

My research indicates that you might be able to contribute more to the project, if you run 2 tasks on each of your GPUs, assuming the tasks don't result in computation errors.

You might try that, using the app_config.xml file, and see if your overall performance increases. I was able to see gains in GPU Load (seen via a program called GPU-Z), as well as increased throughput (seen by looking at task times, as noted within this thread).

Regards,
Jacob
ID: 29322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29323 - Posted: 5 Apr 2013, 15:45:27 UTC

Hi, Jacob.

I am very inexperienced in writing .xml files and fear losing running tasks through syntax errors.

I would like to take it one step at a time for now and, maybe in a couple of weeks, try your suggestion. I will likely ask for help.....

Thanks for the suggestion.

John
ID: 29323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29324 - Posted: 5 Apr 2013, 15:47:45 UTC - in response to Message 29323.  

No problem. It's really not that hard, so don't be afraid, and... when you're ready, I encourage you to read this entire thread, which has details and examples:

"app_config.xml" located here:
http://www.gpugrid.net/forum_thread.php?id=3319

- Jacob
ID: 29324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29326 - Posted: 5 Apr 2013, 19:42:21 UTC
Last modified: 5 Apr 2013, 19:44:24 UTC

Careful, guys. The GTX650Ti (Johns GPUs) sounds like it's almost the same as a GTX660Ti (Jacobs GPUs), but it's actually about a factor of 2 slower. Currently 70k credit long-runs take John 33k seconds, running 2 of them might require ~60 ks. That's almost one day, so we're getting close to missing the deadline for the credit-bonus here for even longer tasks (some give 150k credits, so should take over twice as long).

And this is not only about credits: the credit bonus is there to encourage people to return results early. The project needs this as much as it needs many WUs done in parallel. As long as we're still making the deadline for the credit bonus we can be sure to return results as quickly as they want us to return them.

MrS
Scanning for our furry friends since Jan 2002
ID: 29326 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29327 - Posted: 5 Apr 2013, 19:46:42 UTC - in response to Message 29326.  
Last modified: 5 Apr 2013, 19:56:45 UTC

Sure, in order to get maximum bonus credits, you'll have to be careful to make sure you complete all your tasks within 24 hours. And, in general, they want results returned quickly.

But, in order to help the project the most, throughput (how fast can you do tasks) is the factor to measure, and the "deadline" is the task's deadline, which usually is a few days I think. If the administrators deem that a task must be done at a certain time, then I hope they are setting task deadline appropriately.
ID: 29327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29328 - Posted: 5 Apr 2013, 20:59:54 UTC
Last modified: 5 Apr 2013, 21:00:31 UTC

Thanks, Gentlemen:

I will leave this alone for now.....

With falling prices for the GTX 660 Ti, I may add one to my other AMD A10 based PC in September around my birthday.

John
ID: 29328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29332 - Posted: 6 Apr 2013, 6:42:01 UTC - in response to Message 29328.  
Last modified: 6 Apr 2013, 6:50:56 UTC

You still have plenty of testing to do; all the possible same and mixed WU combinations would need to be looked at:
    NATHAN_dhfr36 (Long) + NOELIA_148n (Long)
    NATHAN_dhfr36 (Long) + NOELIA_TRYP (Short)
    NATHAN_dhfr36 (Long) + NATHAN_stpwt1 (Short)
    NOELIA_148n (Long) + NOELIA_TRYP (Short)
    NOELIA_148n (Long) + NATHAN_stpwt1 (Short)
    NOELIA_TRYP (Short) + NATHAN_stpwt1 (Short)


... plus any I've missed and whatever else turns up...

Basically, how do the various Long and Short tasks perform running together, how do mixed WU types perform and as there are several apps in use (16.16app, 16.18, 16.49, 6.52) - how do they get on together?

You might want to start 'freeing up' a CPU thread/core when running two WU's; the ~3% loss could well be exponential (more like 9%). Note also that some apps might ask for a full CPU core, while others won't (I think this is also GPU specific; needed for Kepler's but not Fermi's).

When you do all that, then you will be in a position to look at the error rates and thus determine overall gain, or loss :))

You have to remember that all this depends on the operating system. It's a well discussed fact that Linux/WinXP/2003 are faster for crunching at GPUGrid (11%+). Your numbers probably won't hold up on these operating systems, but should be true for Vista and W7. The Win 2008 servers are somewhere in between in terms of performance loss.

This would all have to be tested for Fermi's and Titan's (which might offer more).

I wouldn't be keen on running two long WU's but two short tasks looks interesting.


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 29332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29333 - Posted: 6 Apr 2013, 7:00:23 UTC - in response to Message 29332.  
Last modified: 6 Apr 2013, 7:02:16 UTC

Yes, I still have testing to do. You can/should test too!

It's not easy to cherry-pick certain task type combinations -- I usually just let any task types run together. Maybe once I find even more time to test, I'll attempt doing the specific-combination testing, using custom suspending, and more vigilant monitoring.

As far as "freeing up a core", my research indicates that, at this point, doing so is COMPLETELY UNNECESSARY, at least for me. If you look at the acemd processes in Process Explorer, you'll see that process priority is 6, and the CPU-intensive-thread priority is either 6 or 7. This ensures that the thread and process do not get swapped out of the processor, even when I'm running a full load of other CPU tasks, since those CPU tasks are usually priority 1 or 4. Watching how the CPU time gets divvied up (in Process Explorer, or in Task Manager), also proves it -- you'll see the other processes getting less-than-a-core, but you'll see the acemd process "suffer" much. Plus, as you said, sometimes the GPUGrid tasks don't require much CPU at all (like when a NATHAN Long-run is on my GTX 460), so, reserving a core is sheer waste at that point, at least for my goals. So I won't do it.

I'm not trying to speculate here, and I'm certainly not trying to find reasons not to run multiple tasks on the same GPU. I think it's worth it.

What I'm trying to do is show the results that I have achieved, using my goals (maximize throughput for GPUGrid, without sacrificing any throughput for my other projects), and I encourage others to do the same.

Thanks,
Jacob
ID: 29333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29337 - Posted: 6 Apr 2013, 8:17:11 UTC - in response to Message 29333.  
Last modified: 6 Jun 2013, 12:15:21 UTC

I don't have much time to test, but OK, I'll do a little bit...

System:
GTX660Ti @1202MHz, i7-3770K CPU @4.2GHz, 8GB DDR3 2133, SATAIII drive, W7x64, 310.90 drivers, Boinc 7.0.60.

I've started using your suggested app_config.xml file:

    <app_config>
    <app>
    <name>acemdbeta</name>
    <max_concurrent>9999</max_concurrent>
    <gpu_versions>
    <gpu_usage>0.5</gpu_usage>
    <cpu_usage>0.001</cpu_usage>
    </gpu_versions>
    </app>
    <app>
    <name>acemdlong</name>
    <max_concurrent>9999</max_concurrent>
    <gpu_versions>
    <gpu_usage>0.5</gpu_usage>
    <cpu_usage>0.001</cpu_usage>
    </gpu_versions>
    </app>
    <app>
    <name>acemd2</name>
    <max_concurrent>9999</max_concurrent>
    <gpu_versions>
    <gpu_usage>0.5</gpu_usage>
    <cpu_usage>0.001</cpu_usage>
    </gpu_versions>
    </app>
    <app>
    <name>acemdshort</name>
    <max_concurrent>9999</max_concurrent>
    <gpu_versions>
    <gpu_usage>0.5</gpu_usage>
    <cpu_usage>0.001</cpu_usage>
    </gpu_versions>
    </app>
    </app_config>


I was running one Long NATHAN_dhfr36 task. It had reached ~33% when I added the app_config file. GPU Utilization was around 87% (as you observed), power was about 87% and the temp ~60°C. CPU was set to only use 75% (free 2 threads), also running POGS. Note that I was using swan_sync=0.

I increased the Boinc cache and downloaded a Short NATHAN_stpwt1 task.

When I restarted Boinc, I had 4 POGS CPU tasks running (50% of the CPU). The two GPUGrid tasks used 25% of the CPU; a full CPU thread each (not due to swan_sync). GPU utilization rose to 98%, power to 97%, GPU temp to 65°C and the system Wattage went up by around 15 or 20W.

On my system these NATHAN_dhfr36 tasks (6.18 app) have varied in runtime from between ~18,400s and ~19,000s and the only two previous NATHAN_stpwt1 tasks (6.16 app) took 5,166 and 5,210s.

I expect that by just running another task you force the Kepler GPU's to run at higher clocks; they try to self-adjust their frequency!

- The Short NATHAN_stpwt1 task completed in 8,112s, so it took 56% longer, but not twice as long...
- Didn't automatically get another GPUGrid WU (Boinc Cache set to low??), but did when I updated; a NATHAN_RPS1_respawn (6.52app)
Both GPUGrid tasks each still using a full CPU thread (swan on). Will try to run a few with swan on and then off, for comparison.

The NATHAN_RPS1_respawn took 12,021sec. On average they take 8876sec, but have varied from 8,748 to 9,215sec. That's 35% longer than normal but a good bit less than twice as long.

The third task to run along with the Long WU is NATHAN_RPS1_respawn3-25-32-RND4658_0.

The Long task took 39,842sec, over twice as long as normal (2.13 times as long). Given that the first 33% was run by itself, the final 66% took over 3times as long as normal to complete the WU. That's a big loss when running Long and Short tasks together. Even considering the Short tasks were >0.5 as fast, in this case it looks less efficient overall.

Warning! Running two NATHAN_RPS1_respawn3 tasks together caused dangerously bad lag. GPU utilization fell to 33% and GPU temp dropped to 41°C. After 55min the second Short task had only reached 1.7% complete. Just one of these tasks runs at 94% GPU utilization on my system, so there is no way running two would be beneficial. I've since retested this, and found the same results. I was also able to run 4 POEM tasks as well as one respawn3, but they were very slow. Alone these 4 POEM tasks used 88% of the GPU and with the respawn3 WU that went up to 99%. For now I have disabled app_config, as I'm just getting these respawn3 WU's.

I ran a single NOELIA_Klebe_Equ task and then two at the same time.
While running the single task GPU utilization was 87% and while two tasks were running it rose to 97%.

Basically it's not any faster running two tasks:

041px21x3-NOELIA_Klebe_Equ-0-1-RND6607_0 4338582 7 Apr 2013 | 6:07:54 UTC 7 Apr 2013 | 10:08:02 UTC Completed and validated 13,243.66 5,032.64 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)

041px2x1-NOELIA_Klebe_Equ-0-1-RND3215_0 4338518 7 Apr 2013 | 6:43:58 UTC 7 Apr 2013 | 10:27:29 UTC Completed and validated 13,320.82 4,158.10 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)

005px46x3-NOELIA_Klebe_Equ-0-1-RND6629_0 4338501 7 Apr 2013 | 2:28:54 UTC 7 Apr 2013 | 4:50:37 UTC Completed and validated 6,288.30 2,656.45 23,700.00 Short runs (2-3 hours on fastest card) v6.52 (cuda42)

- Running another two with swan off. The first of the two NOELIA_Klebe_Equ tasks started using 934MB and the second used an additional 808MB GDDR5. That 1742MB dropped to 1630MB before they reached 10% complete. With two tasks running the clock stabilized at 1189MHz.

Note that I'll just edit this post with any further results.


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 29337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29347 - Posted: 6 Apr 2013, 11:00:42 UTC - in response to Message 29337.  
Last modified: 6 Apr 2013, 11:04:03 UTC

Sounds good, thanks for testing.

Note: When running 2-at-a-time, I expect tasks to take slightly-less-than-double what they normally take, which would mean they are being processed faster over-all.
ID: 29347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29348 - Posted: 6 Apr 2013, 11:01:39 UTC - in response to Message 29337.  
Last modified: 6 Apr 2013, 11:49:41 UTC

Ah, you bring up a good point, I forgot to mention my clocking experiences with my Keplar architecture eVGA GTX 660 Ti 3GB FTW card...

- It's base clock is 1045 MHz, which I think is the lowest clock it can be while running a 3d application or GPU task.
- When GPU Load is not great (~60-75%), I think it usually upclocks a little (maybe up to 1160 MHz), but because it sees the application as "not demanding a lot", it doesn't try hard to upclock.
- When GPU Load is decent-ish (86%), it auto-upclocks a bit (usually to around 1215 MHz or 1228 MHz I think), with Power Consumption around 96-98% TDP.
- When GPU Load is better-saturated (97%-99%), it usually tries to upclock higher, but reaches a thermal limit. It usually ends up clocked at around 1180-1215 MHz, with a temperature of 84*C-89*C, at a Power Consumption around 96%.
- TIP: At that saturation, if you want, you can usually allow it to auto-upclock just a tad more, by using whatever overclock tools you have (I have eVGA Precision X), and just adjust the "Power Target". By default, I think the driver sets a Power Target of 100%, but what I usually do is adjust it to 140%. This let's it auto-clock higher, until it starts really hitting those thermal limits. My end result: My card usually runs at 1215 MHz, 86*C - 90*C, with Power Consumption around 106% TDP.

So, running at higher GPU Load keeps it clocked high, as high as the thermal limits can let it... which is a good thing, if you care more about GPUGrid throughput than the lifespan of your GPU. :)

Regards,
Jacob
ID: 29348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Graphics cards (GPUs) : GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)

©2025 Universitat Pompeu Fabra