Advanced search

Message boards : Graphics cards (GPUs) : Titan X

Author Message
Michael P. Gainor
Send message
Joined: 23 Jan 16
Posts: 15
Credit: 69,388,188
RAC: 0
Level
Thr
Scientific publications
wat
Message 44082 - Posted: 2 Aug 2016 | 19:27:19 UTC

Well today at 9:00AM EST I bought the new NVidia Pascal Titan X. According wccftech they think at times it may hit 12 Tflops, due to something similar to INTEL Turbo boost. It should arrive tomorrow so we should see what it does. Will the new Titian work out of the box with GPU GIRD or will I have to wait for CUDA 8?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2076
Credit: 15,119,928,983
RAC: 4,859,660
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44084 - Posted: 2 Aug 2016 | 19:32:31 UTC - in response to Message 44082.

You (like all of us with lesser Pascal GPUs) have to wait for the public release of CUDA 8.

Michael P. Gainor
Send message
Joined: 23 Jan 16
Posts: 15
Credit: 69,388,188
RAC: 0
Level
Thr
Scientific publications
wat
Message 44085 - Posted: 2 Aug 2016 | 20:01:01 UTC - in response to Message 44084.
Last modified: 2 Aug 2016 | 20:01:24 UTC

I was afraid as much, I was hoping someone found a work around :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2688
Credit: 1,177,140,599
RAC: 172,656
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44089 - Posted: 3 Aug 2016 | 13:41:03 UTC

You can expect 25-30% more performance from a new Titan X versus a GTX1080 for about a ~70% higher price. That's actually "kinf od reasonable" in the Titan world, as the previous ones were even more expensive relative to the best high-end GPU.

In the past and current version GPU-Grid struggles to fully utilize those ever wider GPUs, though. If this holds true for the CUDA 8 version (and I have no reason to assume anything else), the real performance advantage may be down to ~20%.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Project tester
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 12,305
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44090 - Posted: 4 Aug 2016 | 9:42:27 UTC - in response to Message 44089.
Last modified: 4 Aug 2016 | 9:54:09 UTC

It was the case that the GTX980Ti didn't scale (if not the 980) and the GTX1080 (and possibly 1070) outperform the GTX980Ti in most trials, so it's likely that the GTX1080 won't scale well for here, never mind the Titan X (Pascal).

While we can't show this for here it's already apparent at Folding:

Folding@home (sp)

GPU ns/day % Watts Perf/W % Gflops (sp) GFlops % Titan X ? ? 250 ? 10974 boost 124 GTX1080 135.6 100 180 100 8873 boost 100 GTX1070 125.8 92.8 150 111.4 6463 boost 72.8 GTX1060 84.8 62.5 120 93.75 4372 boost 49.3 GTX980Ti 119.4 88.1 250 63.4 6060 boost 68.3 GTX980 94.1 69.4 165 75.7 4981 boost 56.1 GTX970 76.7 56.6 145 70.3 3920 boost 44.2 GTX960 47.0 34.7 120 51.6 2413 boost 27.2

Observed performance per Watt as a relative percentage:
Observed relative performance=92.8% (1070 vs 1080)
150/180=83.333W (relative power usage)
92.8/83.33 %=111.4%

GFlops (SP) = 2*shaders*clock speed. Reference boost frequencies used!

Note that both series boost higher than reference values but boost varies by model, conditions and can be controlled and constrained.
While this is based on actual observed performances, it’s still somewhat theoretical. To be accurate you would need to use actual observed power usages and actual boosted GFlops (calculated from reference). That said it’s still a good indicator.
Numbers taken from AnandTech’s GPU 2016 Benchmarks, http://www.anandtech.com/bench/product/1715?vs=1714

Although the primary observation is that the GTX1070 offers best performance/Watt, it's likely that both it and the 1080 could be significantly tweaked for performance/Watt by capping the power &/or temps, and it's also possible to run 2 apps on the one big GPU to improve overall throughput (when there is an abundance of apps).

With more basic apps such as Einstein (CUDA 3.2 & 5.5) and MW you may see a more linear performance relative to the Pascals GFlops (as these apps don’t utilize the more complex CUDA instruction sets).
GPUGrid is more similar to Folding but the app is different so it may bottleneck in different places. For that reason a performance chart will likely look similar but the choicest card(s) might be different...

Other hardware factors. The Titan has 3MB L2 cache whereas the GTX1080 has 2MB. The Titan’s bus width (& ROPs ratio) are slightly (7%) higher, so there are less potential hardware bottlenecks. Should help it scale but it is still 24% bigger than a 1080.
Note that the GTX1060’s (GP106) cache is only 1.5MB, which might explain the slightly poorer performance at Folding. While 1.5MB is likely to be a factor at GPUGrid too, how significant that is remains to be seen.

PS the Titan X (Pascal) isn’t full-fat; the Quadro P6000 has two more SM’s for 3840 CUDA cores (not that I recommend either card for here – both are far too costly).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 704
Credit: 1,375,171,968
RAC: 123,539
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 44091 - Posted: 4 Aug 2016 | 11:29:11 UTC - in response to Message 44090.

That is a great comparison. Thanks.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2688
Credit: 1,177,140,599
RAC: 172,656
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44097 - Posted: 6 Aug 2016 | 20:59:58 UTC - in response to Message 44090.

With more basic apps such as Einstein (CUDA 3.2 & 5.5)

Don't let the CUDA version fool you: Einstein uses complex and carefully optimized code. They're not using advanced library functions from nVidia; instead they're doing the complicated stuff on their own or with other libraries.

And currently they're streaming through their complete array(s) with each operation, in the way a GPU is classically supposed to work. This makes their code significantly dependent on GPU memory bandwidth (my GTX970 runs at >80% memory controller utilization at 3500 MHz), which means any bigger GPU doesn't scale as well as its GFlops suggest, but is slowed down according to its memory bandwidth. And some other factors.. e.g. AMD Fury is not the home run at Eisntien one would expect due to its massive bandwidth, because a driver bug prohibits them from running more than 1 task concurrently, which is not enough to saturate a fast GPU.

Pascals are OK at Einstein, especially with eco tuning, but are not the homeruns which their raw GFlops suggest.

MrS
____________
Scanning for our furry friends since Jan 2002

Michael P. Gainor
Send message
Joined: 23 Jan 16
Posts: 15
Credit: 69,388,188
RAC: 0
Level
Thr
Scientific publications
wat
Message 44100 - Posted: 8 Aug 2016 | 13:05:11 UTC - in response to Message 44097.

So far I have found scaling to be pretty good in Folding@home when they are using there newer core 21, I get about 1.07 million Points per day about double what I got with my 980ti. However, on their older core 18 the scaling isn’t nearly as good. Although no one at F@H has ever confirmed this I have always found that every 100k PPD oddly enough is very similar to how many teraflops I should be getting.

Post to thread

Message boards : Graphics cards (GPUs) : Titan X