Pascal Settings and Performance

Message boards : Graphics cards (GPUs) : Pascal Settings and Performance
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45114 - Posted: 2 Nov 2016, 13:44:45 UTC

nanoprobe wrote:
JoergF wrote:
Strange. My 1070 does not get better than 90% and the 1080 is even worse, maybe 75%. No matter how many tasks and the CPU/GPU ratio. As if the algorithm is not yet Pascal optimized.

But this gets off topic a little... maybe I should bring that question forward somewhere else.

FWIW my 1060 runs @ 95% with 2 tasks at a time at stock settings. What's truly amazing is that according to my UPS it's only pulling 45 watts.

Nanoprobe, you are using your card under Linux, which doesn't have WDDM, while JoergF using his card under Windows 7 which has WDDM.
ID: 45114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 45115 - Posted: 2 Nov 2016, 14:34:35 UTC

One at a time ADRIA_1JWP_dist (Win8.1 OS) on GTX 1070 and GTX 1060 (3GB) have higher GPU usage than SDOERR CASP (a3d) or GERALD CXCL12.

ADRIA_1JWP_dist:
-- GTX 1070 (PCIe 3.0 x8) @ 2.1GHz: 78% GPU / 32% MCU / 52% BUS / 109W power
-- GTX 1060 (PCIe 3.0 x4) @ 2.1GHz: 82% GPU / 32% MCU / 63% BUS / 96W power

GTX 1070 ADRIA_1JWP_dist completed runtime will be ~30% faster. (8hr vs. 11.6hr)
Non-ADRIA (GTX 1060) WU's are closer to 20~25% slower than GTX 1070.

Pascal's power consumption really shine though even when the card OCed.
My 970's are starting to feel old as it's power consumption / performance ratio struggle keeping up with Pascal.
Pascal already have a very high out of the box boost negating overclocking. Even at 2.1GHz Pascal core scale for slight performance with PCIe 3.0 x4. For those who fine tune their cards locating it's highest stable operating point - Pascal is the easiest and least fun to manage.


ID: 45115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45116 - Posted: 2 Nov 2016, 15:16:13 UTC - in response to Message 45115.  
Last modified: 2 Nov 2016, 18:28:06 UTC

Thanks,

The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)...

Even within the CASP runs there are different types of sub_run:

e13s6_e9s4p0f5-SDOERR_CASP22SnoS_crystal_ss_5ns_ntl9_0-0-1-RND7413_0 1,337.92 - runtime
e12s1_e11s5p0f27-SDOERR_CASP22SnoS_crystal_ss_contacts_5ns_a3D_1-0-1-RND6418_0 2,152.56 - runtime

The runs that include atomic contacts take longer because they involve calculating contact interactions too.
In theory these should utilize the GPU more (but I haven't looked to see if that's the case or not).

The CASP runs have rather low PCIe Bandwidth Utilization (for all GPUs).

For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15%!

- edit - see I'm getting 88% GPU utilization on a non_contacts CASP22SnoS_crystal_ss task. PCIE bandwidth ~ 25%. Temps are higher too.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 45118 - Posted: 2 Nov 2016, 17:24:00 UTC - in response to Message 45116.  

Thanks,

You're welcome!

The proof of concept CASP runs have fewer atoms, so GPU utilization is generally lower. It's probably the case that the bigger the GPU the lower the utilization for such work units. This might even be exasperated with the Pascals compared to Maxwell GPU's (or not)...

[..]The runs that include atomic contacts take longer because they involve calculating contact interactions too.
In theory these should utilize the GPU more (but I haven't looked to see if that's the case or not).

What I reported in the CASP thread:
CASP runtimes (atom and step amount) vary so this just a general reference.

GTX 1070 (PCIe 3.0 x8) CASP runtimes:

-- 1ns (ntl9) 600 credits = 240/sec @ 2.1GHz / 59% GPU / 15% MCU / 37% BUS / 78W power
-- 1ns (a3d)= 1,350 credits = 330/sec @ 2.1GHz / 70% GPU / 24% MCU / 39% BUS / 96W power

-- 5ns (ntl9) 3,150 credits = 1,200/sec @ same usage and power numbers as 1ns
-- 5ns (a3d) 6,900 credits = 1,600/sec @ same usage and power numbers as 1ns

GTX 1060 (3GB) PCIe 3.0 x4 CASP runtimes:

-- 1ns (ntl9) 600 credits = 300/sec @ 2.1GHz / 63% GPU / 17% MCU / 51% BUS / 74W power
-- 1ns (a3d) 1,350 credits = 450/sec @ 2.1GHz / 74% GPU / 24% MCU / 59% BUS / 88W power

-- 5ns (ntl9) 3,150 credits = 1,500/sec @ same GPU usage and power as 1ns
-- 5ns (a3d) = 6,900 credits = 2.275/sec @ same GPU usage and power as 1ns

IMO: a (1152CUDA GTX 1060) is on par with (2048CUDA GTX 980) and ~20% faster than a (1664CUDA GTX 970). The (1920CUDA GTX 1070) is as (if not) ~5% faster than a (2816CUDA GTX 980ti).

CASP WU on (2) GTX 970 at 1.5GHz are 2.5~2.7x slower with PCIe 2.0 x1 compared to PCIe 3.0 x4.
WU's require PCIe 3.0 x8 for proper OC scaling.

-- 1ns: 900/sec vs. 350/sec
-- 5ns: 4770/sec vs. 1761/sec

-- PCIe 2.0 x1: 46% GPU / 7% MCU / 80% BUS usage / 75W GPU power
-- PCIe 3.0 x4: 57% GPU / 12% MCU / 40% BUS / 107W

I haven't noticed any difference between GPU usage with contacts or having none.
Though CPU usage 5% higher at ~15% per WU when the contacts are included.

For reference I'm seeing ~81% GPU utilization on a GTX970 on Linux (Ubuntu x64 16.04 LTS) crunching a CASP22SnoS_crystal_ss_contacts task. PCIE Bandwidth Utilization is ~17% (PCIE2 x16). It's a low-spec system but I'm not using the CPU for anything else. When I do the GPU's performance drops off. When I run a cartain mt CPU app on another system (W10) the GPU utilization drops to ~15%

I have >10% CPU usage on each Pascal WU. Mostly around 25% average (4C/4T Haswell S series) crunching (2) WU's.
When shooting for the most efficient runtimes it help's to have CPU clock speed above 3GHz (Preferably >3.5GHz).
Every GTX 1070 host faster than mine have a overclocked 'K' series even though my Pascals are at 2.1GHz (1.5GHz on Maxwell).

WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers.)
PCIe 2.0 has an overhead of 20% (8bit/10b line-code encoding) while PCIe3.0 is 128bit/130b. In reality PCIe2.0 has an available bandwidth max of 80%. PCIe3.0 provides 98.4% available bandwidth. Intel (4) bi-directional lanes at 1GB/s per lane DMI link on Haswell and Ivy Bridge is suppose to be faster than AMD's 500MB/s? per lane. Skylake doubled the bandwidth with (4) lanes at 2GB/s per lane.
Maybe during MT CPU compute the DMI link became a bottleneck causing dramatic GPU utilization loss? Or PCIe flooded out completely.
ID: 45118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45121 - Posted: 2 Nov 2016, 19:37:40 UTC

Since the 9.14 app is CUDA 8.0, and there is a couple of CUDA8.0 drivers for Windows XP I've installed my GTX 1080 under Windows XP x64 with the latest XP driver available for GTX 960 (368.81), but the 9.14 app did not work with this setup. It said that the
Task blablabla exited with zero status but no 'finished' file
If this happens repeatedly you may need to reset the project.
But the task did not run into an error, so these two lines repeated infinitely until I've suspended the task.
When I booted this host to Windows 10 the task resumed normally.
So now I really have to learn to install a Linux based BOINC host.
ID: 45121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45128 - Posted: 2 Nov 2016, 23:44:48 UTC
Last modified: 2 Nov 2016, 23:45:47 UTC

Is there anyone, who is using a GTX 1080 or TITAN X (Pascal) under Linux with swan_sync on?
Is the new (9.14) Linux app supports swan_sync?
ID: 45128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 45138 - Posted: 3 Nov 2016, 15:08:22 UTC - in response to Message 45118.  
Last modified: 3 Nov 2016, 15:09:06 UTC

WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers.


I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080.
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 45138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 45139 - Posted: 3 Nov 2016, 16:01:12 UTC - in response to Message 45138.  

WDDM performance degradation versus Linux or XP is similar to PCIe width affecting runtimes. PCIe3.0 x4 runtimes can be ~10% slower if PCIe 3.0 x8 not an option. (Maybe the AMD Zen platform will have more than the 16/28/40 CPU PCIe3.0 lanes Intel currently offers.


I admit that I did not work into the matter yet. Is there any way to bypass the WDDM degradation? It is somewhat frustrating to see a 1080 performing worse than a 1070 or 980ti just because of low utilization. Actually I dont get more than 75% load on my 1080.

Yes by moving to Linux. There are a few remedies on WDDM OS's that help gain GPU utilization - enable SWAN-SYNC (I don't use this).
Have a CPU above 3.5GHz with a GPU + CPU PCIe3.0 x16 connection. (single GPU set-ups seem be faster than a system with 2 or 3 of the same CPU and GPU's.) Or compute 2 tasks at a time with 30 to 50% longer runtime than single at a time.

PCIe3.0 x8 is the bare minimum for overclock scaling. GTX 1060 and above with PCIe3.0 x4 will encounter a 4~12% performance drop off from x8 depending on type of WU.
ID: 45139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 45154 - Posted: 4 Nov 2016, 8:08:37 UTC
Last modified: 4 Nov 2016, 8:30:22 UTC

Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason?

My other system is less affected, maybe because the board is a new 1151 and the GTX 1070 GPU therefore utilizes PCIe3.0x16. The CPU is also a little faster (6700K) but I guess it has nothing to do with it.

Besides, it would be of interesting to know how the task algorithms work. If there are large data blocks transferred, the bandwith doesn't really matter. (provided that the work time of the GPU per subtask is long enough). You would see some downward peaks in the GPU utilization occasionally. But in case the CPU/Memory and GPU are permanently exchanging little data slices, it could explain why limited bus bandwith would slow down the entire performance. If so, maybe something can be done on the code side in order to keep the Pascal (and the upcoming Volta) type GPU busy.
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 45154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45156 - Posted: 4 Nov 2016, 9:44:18 UTC - in response to Message 45154.  

Thanks... to my mind the config cannot be the bottleneck. I run 2 Tasks with 1 virtual CPU core per task [1CPU/0.5GPU] and utilize a i7-3770S which should be fast enough. But now that you mention it, I still have an old 1155 board in my primary PC that is PCIe2.0 only and therefore the GTX 1080 is linked by PCIe2.0x16 which is equal to PCIe3.0x8 in terms of throughput. Can this be the reason?

The i7-3770S is PCIE3x16 capable, but I guess there could be some LGA 1055 motherboards that are PCIE2 only. Which motherboard model do you have? If it is PCIE2 that could be the issue or one of the main issues. You could probably get a replacement PCIE3 capable motherboard if that's the case.

IF you crunch using your integrated HD Graphics 4000 gPU, that would impact on the GTX1080's performance, as would crunching lots of CPU projects. Basically for optimal performance for GPUGrid (especially for such a high end GPU) you want to be crunching for as few CPU projects as possible. MT apps are a no-no and running apps in a VM can bog the systems down.

CPU speed and RAM speed also impact, but while there are faster processors, that CPU isn't bad and it does have a PCIE3 controller on board (but probably just isn't using it). HT off and SWAN_SYNC might help a little too.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 45158 - Posted: 4 Nov 2016, 10:13:53 UTC

Thanks ... the mainboard is an ASUS P8P67-M (socket 1155) and definitely PCIe2.0 only. Yes, the CPU does support 3.0 but the board doesn't and so it could be part of the issue, as you wrote.

No, i dont use the iGPU or VM in the background. So I guess I should simply upgrade my system components. Frankly I am waiting for AMD Zen or Intel Cannonlake in order to get noticable extra speed. Upgrading from Ivy Bridge to Kaby Lake now does not make much sense to me, maybe aside from DDR4 Memory which could be some advantage.

I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 45158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45164 - Posted: 4 Nov 2016, 11:47:53 UTC - in response to Message 45158.  

If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now.

Zen will facilitate more PCIE3 lanes but wont be out for a while yet.
A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition.

Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 45165 - Posted: 4 Nov 2016, 12:06:32 UTC - in response to Message 45164.  

If I were you I would swap the 1070 with the 1080 or try to pick up a second hand PCIE3 1055 motherboard for now.


I have already considered that but sometimes chaning the motherboard will also lead to different SATA contollers and drivers and therefore Windows will no longer boot. Which means, well, it is surely possible but not that easy.


Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue.


yes, I have noticed that as well. Does this affect the performance in any way?

I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 45165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45167 - Posted: 4 Nov 2016, 13:04:14 UTC - in response to Message 45165.  

Noticed all the 1080's and 1070's are reporting 4GB graphics memory. IIRC it's not an issue.


yes, I have noticed that as well. Does this affect the performance in any way?

I think it's Boinc that reports this and the app reads the details directly from the hardware/system itself. So it wouldn't impact upon performance. Most tasks tend to use less than 1GB of GDDR and the most I can recall is around 1.7GB.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 45168 - Posted: 4 Nov 2016, 13:29:28 UTC - in response to Message 45164.  
Last modified: 4 Nov 2016, 13:30:57 UTC

A good upgrade time will be when the GTX1080Ti and Zen arrives and are available in sufficient quantities and with competition for prices to be reasonable. That might be in 6 to 9 months time, but possibly more depending on the competition.


yes... and not to forget the AMD RX490. If this one performs well, which I hope, it will have positive influence on Nvidia pricing in general. Which means: Prices DOWN ;-)
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 45168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45169 - Posted: 4 Nov 2016, 13:29:57 UTC - in response to Message 45167.  

Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB.
Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Obviously this is a comparison of only two tasks, but they are similar task types and give the same amount of credit:

e47s4_e35s2p0f0-SDOERR_CASP10_crystal_ss_5ns_ntl9_1-0-1-RND0325_1 : 1,454.48 804.24 3,150.00 v9.14 (cuda80)
e25s5_e20s7p0f45-SDOERR_CASP22S_crystal_ss_5ns_ntl9_2-0-1-RND0908_0 : 1,498.21 736.19 3,150.00 v8.48 (cuda65)

When watching the runs, CPU usage, GPU usage and PCIE usage all looked about the same for each GPU. I expect the Pascal uses less power to do the same work, but I haven't measured the power draw just yet...

I think a few other people have reported increased CPU usage on the Pascal's. If CPU is increased with the CUDA8.0 app/Pascal cards it's probably going to be more noticeable with larger GPU's. Certainly something to take account of when building a system for GPU crunching.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 45170 - Posted: 4 Nov 2016, 13:37:57 UTC - in response to Message 45169.  

Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU.


Great... which means the comparison by SP GFLOPs out from the specification works for this kind of jobs, more or less. What is the average GPU usage of the 1060?

I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 45170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45175 - Posted: 4 Nov 2016, 15:12:50 UTC - in response to Message 45170.  
Last modified: 4 Nov 2016, 15:39:55 UTC

The GPU usage is ~78% - similar but the utilization is spiky and might vary during the run. When I had the 970 in I used NVidia X Server Settings to observe the GPU Utilization. However, I've just observed that keeping the NV X Server Settings window open increases the apparent GPU utilization:
When I watch the graphics % using Psensor, X Server windows maximized or minimized there is an approximate 10% difference in GPU utilization.
CPU usage is also ~65% when the NV X Server Settings window is open and ~22% when it's minimized. So I would conclude that threads to both the CPU and GPU are kept live when X Server isn't minimized. Power usage at the wall doesn't change.
That sort of throws a spanner in the works for some of my observations!
- update - leaving the Boinc Manager window open has the same effect (increased GPU utilization).

I suspect using SWAN_SYNC or a nice value might improve performance a bit as would running two tasks.

My GPU clock is supposed to be 1911MHz but it's 1904MHz (a bit shy of 2000 but not far off). The memory is at 7604MHz.

My GTX1060-3GB power usage is about 75W, so it's about 45% more efficient than a GTX970 for here:

System idle using 38W at the wall.
With Boinc maximized but only running some nci apps the systems power usage is 50W.
When I start to crunch on the GPU the system uses ~125W, so the GPU is using ~75W.
Of note is that my 1060 only has one 6-pin power connector (which only delivers up to ~75W)! So perhaps I'm being power capped? It's a small PCIe-2x16 motherboard, and although there's a 12-pin ATX power connector I wouldn't be surprised if the GPU isn't drawing power from the PCIE slot.

I had to re-enter cool-bits=4 & reboot to allow me to alter the fan speed and manually control the GPU’s temp, but before I did this the GPU temps rose to 63C and the fan didn’t go over 50%. Now I'm running the fans at 60% and the GPU is ~58C. The fan's aren’t audible over the systems case fan.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45177 - Posted: 4 Nov 2016, 15:53:25 UTC - in response to Message 45169.  
Last modified: 4 Nov 2016, 16:01:59 UTC

Pulled my GTX970 from my Ubuntu x64 16.04 LTS system and replaced it with a GTX1060-3GB.
Comparing two tasks; one which ran on the 970 and the second which ran on the 1060-3GB I can say on my setup that the 1060-3GB is ~ 3% faster than the 970 and uses ~9% more CPU. Obviously this is a comparison of only two tasks, but they are similar task types and give the same amount of credit:

e47s4_e35s2p0f0-SDOERR_CASP10_crystal_ss_5ns_ntl9_1-0-1-RND0325_1 : 1,454.48 804.24 3,150.00 v9.14 (cuda80)
e25s5_e20s7p0f45-SDOERR_CASP22S_crystal_ss_5ns_ntl9_2-0-1-RND0908_0 : 1,498.21 736.19 3,150.00 v8.48 (cuda65)

When watching the runs, CPU usage, GPU usage and PCIE usage all looked about the same for each GPU. I expect the Pascal uses less power to do the same work, but I haven't measured the power draw just yet...

I think a few other people have reported increased CPU usage on the Pascal's. If CPU is increased with the CUDA8.0 app/Pascal cards it's probably going to be more noticeable with larger GPU's. Certainly something to take account of when building a system for GPU crunching.


The difference is more noticible with the 20ns tasks:

e35s7_e32s8p0f82-SDOERR_CASP10_crystal_ss_20ns_ntl9_2-0-1-RND9465_0 : 5,365.27 3,215.99 12,750.00 v9.14 (cuda80)

e14s4_e9s6p0f34-SDOERR_CASP22S_crystal_ss_20ns_ntl9_0-0-1-RND4064_0 : 5,946.69 2,911.63 12,750.00 (cuda65)

e15s3_e14s5p0f90-SDOERR_CASP22S_crystal_contacts_20ns_ntl9_2-0-1-RND8066_0 : 5,966.27 2,947.58 12,750.00 v8.48 (cuda65)

In this case the 1060-3GB is ~10% faster than the GTX970 and the CPU usage is also around 10% greater. As most tasks at GPUGrid tend to be longer, 10% might more accurately reflect the differences between the cards than the short tasks; which spend as much time loading but less time running. So ~10% faster for ~45% less energy ~60% better in terms of performance/Watt.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 45177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 45178 - Posted: 4 Nov 2016, 16:48:57 UTC - in response to Message 45177.  

Thanks. You have saved me the trouble. That is a nice improvement for the GTX 1060, but not enough to buy a new card. I will leave my GTX 970 on GPUGrid, and my 1060 on Folding, where it gets as much improvement, if not a little more, due to the Quick Return Bonus.
ID: 45178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : Pascal Settings and Performance

©2025 Universitat Pompeu Fabra