Advanced search

Message boards : Graphics cards (GPUs) : Big Maxwell GM2*0

Author Message
eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38887 - Posted: 10 Nov 2014 | 11:50:09 UTC

http://wccftech.com/article/generation-graphics-prospects-nvidia-big-daddy-maxwell-16ff-ports/

3072 CUDA cores/24 SMM/1.1GHz base reference clock? A larger L2 cache (3MB) with same 1/1 core/L2 ratio as GTX980. If a DP64 1/4 ratio (8 per 32c block @ 32per SMM) equals 768 DP64. At 1/2 ratio (16 per 32c block @ 64 per SMM) a monster 1536DP64 cores. I'm guessing 1/4 DP core ratio. If clocks are high enough this will equal GK110 960DP64 @ 1.1-1.7 teraflops of Double precision.

Here are some early numbers

http://www.sisoftware.eu/rank2011d/show_run.php?q=c2ffccfddbbadbe6deeadbeedbfd8fb282a4c1a499a98ffcc1f9&l=en

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38890 - Posted: 10 Nov 2014 | 20:51:48 UTC

the 1st article wrote:
The GK110 is roughly 550mm^2 and the limit of TSMC is at roughly 600mm^2. So can the GM200 exist on a 28nm Node? Absolutely, yes. Will it? Well, the consumer samples taped out a long time back, and they are sure as hell not on 16nm FinFET.

Which makes sense to me. I edited the title, so that it doesn't imply GM2x0 would be made in 16nm FinFET tech.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38959 - Posted: 18 Nov 2014 | 21:45:55 UTC

http://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/

For the Dual Administer: C.C 3.7 [?] (Larger Register(s) / More shared memory @ 128Kb per) Third GK*10 revision >>> GK210 Dual Tesla Board build upon 300TDP with 4992 Total CUDA cores (2496c/13SMX per Board/150TDP)

Revamped GPU boost:
"Using NVML your CUDA application can choose the best GPU Boost setting without any user intervention. Even when the applications clocks permission setting prevents your app from changing the application clocks, NVML can help you inform application users about this so they can consult with their system administrator to enable GPU Boost. To achieve exactly this the popular GPU-accelerated Molecular Dynamic application GROMACS will use NVML in its next release to control GPUBoost on NVIDIA® Tesla® accelerators."

For Tesla > GRID > Quadro: GPU Deployment Kit includes NVIDIA Management Library (NVML)

K80 lights 832 DP64 cores per Board [1664 Total DP64]

From look of GK210 SMX: the Polymorph Engine non-functional (Not on Die)
16 TMU per SMX (same ratio as C.C 3.0/3.5)

GK210: 4992 CUDA [300TDP_26SMX]
[13] 192c SMX per board
[26] Total SMX at 300 Total TDP @ 11.538 watts per SMX @ 0.060 w per core

Lower Power Operating Points than GM204 or GK110B revision. Inclusion (Polymorph Engine inside SMM/SMX) See Maxwell now thread.

Future GM2*0: 3072 CUDA [250TDP(?)_24SMM]
[24] 128c SMM @ 12.5 watts per @ 0.081 watt per core

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38979 - Posted: 20 Nov 2014 | 21:43:03 UTC - in response to Message 38959.

Lower Power Operating Points than GM204 or GK110B revision.

If you ran GM204 at 550 - 850 MHz with appropriate voltage it would easily beat GK210's power efficiency.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38984 - Posted: 20 Nov 2014 | 23:40:45 UTC - in response to Message 38979.

If you ran GM204 at GK210 clocks: GM204 single is less than GK210 Double Flops. Once Big Maxwell specs are confirmed- this could be only full compute worthy Titan Maxwell (dual card also) until late 2016 early 2017 Pascal arch.

I won't include a Maxwell GM204 SMM in same category as Kelper arch. All compute Kepler SMX includes as many DP64 cores in one SMX as whole GM204 die(64). Titan brand sold a lot of enabled GK110. Excluding Tesla and Quadro for 64bit- a Titan work's in most areas. The Titan DP64 driver setting lowers core clocks to ~600 MHz or so at lower voltage than clocks for Single Precision.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38996 - Posted: 21 Nov 2014 | 21:31:11 UTC - in response to Message 38984.

If you ran GM204 at GK210 clocks: GM204 single is less than GK210 Double Flops.

What?! GK210 runs DP at 1/3 its SP performance, so it's equivalent to 2880/3 = 960 shaders. GM204 has 2048 shaders for SP, which actually perform more like 1.4*2048 = 2867 Kepler-Shaders. At similar clocks GM204 SP would be about 3 times as fast as GK210 DP.

Any why are you talking about DP anyway? If you need DP and nVidia you're badly f*cked anyway, because nothing below an ultra-expensive Titan makes any sense. the amount of DP cores in gaming Maxwell doesn't matter, because performance would s*ck anyway. It's meant for compatibility, testing and debug, but not for pure number crunching. I know people are not aware of this and like to run Milkyway on nVidia cards, but that doesn't change the fact that DP-crunching on either Kepler or Maxwell, or mainstrea-Fermi for that matter, is a pretty bad idea.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39029 - Posted: 26 Nov 2014 | 19:34:39 UTC

CUDA7 (pre-production release) will be here around the new year. Production release near April. (NVidia last couple Major CUDA releases have been early in the year) Current CUDA driver is 6.5.30

For the upcoming release of GM2*0 Titan and GTX 960: here is the current (as) list prices for Kelper and Maxwell on Newegg with each board having a per core cost. Factoring in power costs for DC FP32 (ACEMD)--- Lower sustainable operating prices for GM204/107 efficiency as cost to purchase higher initially. GM204/GK110/GK104/GM107/GK106 board's power consumption varies [-30>+20%] with a sustained OP: each board creates unique traits- managing complex currents.

[list=]
250W TDP 2880 core Titan Black @ 1000usd @ 34.7 cents per core (960 Total DP/64 DP SMX enabled)
375W TDP 5760 core Titan-Z @ 1500usd @ 26c per core (1920Total DP/64 DP SMX enabled)
300W TDP 3072 core [refurished] GTX 690 @ 519 @ 16.9c per core ( 128Total DP/8DP core per SMX)
250W TDP 2880 core [refurished] GTX 780ti @ 419 @ 14.5c per core ( disabled 64DP SMX- driver enables 8 per SMX for 120 total DP)
225W TDP 2304 core [refurished] GTX 780 @ 319 @ 13.9c per core ( disabled 64DP SMX- driver enables 8 per SMX for 96 Total DP)
165W TDP 2048 core GTX 980 @ 549 @ 26.8c per core ( 64 Total DP64 core/ 4 per SMM)
145W TDP 1664 core GTX 970 @ 329 @ 19.7c per core ( 52 Total DP64 core/ 4 per SMM)
230W TDP 1536 core [refurished] GTX 770 @ 239 @ 15.5c per core ( 64 Total DP/ 8 per SMX)
195W TDP 1536 core [refurished] GTX 680 @ 229 @ 14.9c per core ( 64 Total DP/ 8 per SMX)
170W TDP 1344 core [refurished] GTX 670 @ 179 @ 13.3c per core ( 56 Total DP)
140W TDP 1344 core [refurished] GTX 660ti @ 159 @ 11.8c per core (56 Total DP)
170W TDP 1152 core [refurished] GTX760 @ 159 @ 13.8c per core (48 Total DP)
140W TDP 960 core GTX660 @ 134 @ 13.9c per core (40 Total DP)
110W TDP 768 core [refurished] GTX650ti @ 89 @ 11.5c per core (32 Total DP)
60W TDP 640 core GTX 750ti @ 129 @ 20.1c per core (20 Total DP)
55W TDP 512 core GTX 750 @ 109 @ 21.2c per core (16 Total DP)
25W TDP 384 core GT 630 @ 37 @ 9.4c per core (16 Total DP)
[/list]

C.C 5.2 Maxwell per core cost around 23cents- while Kepler C.C 3.0 much lower. C.C 5.0 per core cost higher than Non-DP enabled 3.5 C.C boards. Notice the GTX 980 per core cost same as Titan-Z.
1 GTX 970/980 offers GPUGRID best cost/runtime/power usage. Multiple GM107 boards are also a good choice. (this could change with GTX960 release and Maxwell's Titan purchase price)
Kelper GTX 660ti and GTX780(ti) show to be decent cost/power/runtime; depending upon silicon lottery for eco-tuning. Numerous choices exist from the eco-tune point of view. Outside of GPUGRID- C.C 3.5 still a more complete compute arch.
A OpenCL AMD Tahiti (2048/1792Total cores[512/448 DP64c]) is currently priced around 11cents per core. CUDA's DP accuracy different than OpenCL. Have a look at Math Libraries for each and see which you'd rather trust for long runtime compute.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39031 - Posted: 26 Nov 2014 | 22:53:12 UTC - in response to Message 39029.
Last modified: 26 Nov 2014 | 23:03:46 UTC

price | price | Total USD | per core | DP 250W TDP 2880 core Titan Black 1000 | 34.7c | 960 (64 DP SMX enabled) 375W TDP 5760 core Titan-Z 1500 | 26c | 1920 (64 DP SMX enabled) 300W TDP 3072 core GTX 690* 519 | 16.9c | 128 (8DP core per SMX) 250W TDP 2880 core GTX 780ti* 419 | 14.5c | 120 (disabled 64DP SMX- driver enables 8 per SMX) 225W TDP 2304 core GTX 780* 319 | 13.9c | 96 (disabled 64DP SMX- driver enables 8 per SMX) 165W TDP 2048 core GTX 980 549 | 26.8c | 64 (4 per SMM) 145W TDP 1664 core GTX 970 329 | 19.7c | 52 (4 per SMM) 230W TDP 1536 core GTX 770* 239 | 15.5c | 64 (8 per SMX) 195W TDP 1536 core GTX 680* 229 | 14.9c | 64 (8 per SMX) 170W TDP 1344 core GTX 670* 179 | 13.3c | 56 (8 per SMX) 140W TDP 1344 core GTX 660ti* 159 | 11.8c | 56 (8 per SMX) 170W TDP 1152 core GTX 760* 159 | 13.8c | 48 (8 per SMX) 140W TDP 960 core GTX 660 134 | 13.9c | 40 (8 per SMX) 110W TDP 768 core GTX 650ti* 89 | 11.5c | 32 (8 per SMX) 60W TDP 640 core GTX 750ti 129 | 20.1c | 20 55W TDP 512 core GTX 750 109 | 21.2c | 16 25W TDP 384 core GT 630 37 | 9.4c | 16 *=refurbished

I've just made your spreadsheet more readable.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39032 - Posted: 26 Nov 2014 | 23:23:27 UTC - in response to Message 39031.

I've just made your spreadsheet more readable.


Gentlemanly of you

For all those spreadsheet programs available: MS notepad doesn't organize!

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 39308 - Posted: 25 Dec 2014 | 20:30:42 UTC

http://www.chiploco.com/nvidia-geforce-gtx-titan-ii-3072-cores-12gb-memory-36760/
8gb memory is more likely, 12 gb is too much for only 3072 cuda cores
gtx 980 2048 cuda cores and 4 gb ram

So the new titan Z will have about 6000 cuda cores and maybe 16gb ram..?

And another BAD news http://wccftech.com/amd-nvidia-20nm-16nm-delayed/#ixzz3MvZ5AzSg Again.. So also in the segment of video cards coming here CPU scenarios" monopoly..

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 39313 - Posted: 26 Dec 2014 | 17:09:14 UTC

http://www.tweaktown.com/news/42269/amd-nvidias-next-gen-gpus-delayed-supply-constraints-blamed/index.html

I think it's a scam, agreed between Apple and NVIDIA. Everyone has to help in gain money ..
unfortunately, my group can not harm these companies ..
and also this project do not mind and action nvidia commons will not fall ..

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39314 - Posted: 26 Dec 2014 | 18:36:30 UTC - in response to Message 39313.

http://www.tweaktown.com/news/42269/amd-nvidias-next-gen-gpus-delayed-supply-constraints-blamed/index.html

I think it's a scam, agreed between Apple and NVIDIA. Everyone has to help in gain money ..
unfortunately, my group can not harm these companies ..
and also this project do not mind and action nvidia commons will not fall ..

There is a typo in that report, they said Apple instead of AMD; "so that leaves Apple and NVIDIA with a very limited supply of 20nm dies".

Basically TSMC will be busy making 20nm Qualcomm SoC's, so NVidia and AMD will have to wait their turn. This isn't such a bad thing - AMD and NVidia need bigger dies. That means TSMC will have refined the process by the time its AMD and NVidia's turn.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39350 - Posted: 1 Jan 2015 | 15:18:21 UTC

http://wccftech.com/nvidia-planning-ditch-maxwell-gpus-hpc-purposes-due-lack-dp-hardware-update-tesla-line-pascal-2016-volta-arriving-2017/

Once Big Maxwell specs are confirmed- this could be only full compute worthy Titan Maxwell (dual card also) until late 2016 early 2017 Pascal arch.

Kelper's HPC boards will reign until late 2016/early 2017 due to Maxwell's weak DP core structure (1 DP core in every 32c subset block and no 64bit/8byte banks like Kelper)
The GeForce Titan Maxwell (4 DP per 32c subset/32DP per SMM) will be the full feature compute Maxwell with less DP cores (768) compared to Kelper's 960/896/832 DP64 line-up unless GM2*0 has 64DP for every SMM (1/2ratio) This will keep Titan prices higher unless AMD 2015 offering +20% GM2*0.

The GTX 960(ti) will feature [3] different dies. (8SMM/10SMM/12SMM) The 8SMM could be 70-100TDP with Ti variants being 100-130TDP.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39354 - Posted: 1 Jan 2015 | 19:21:57 UTC

@eXaPower: let's wait and see, so far we've only got rumors (although lot's of it). I'm sure the biggest Maxwell chip will be better at DP than GK210. The market and profit margin for such chips in the form of Teslas and Quadros is just too large for nVidia to ignore. For gaming they could just as well give us 2 GM204 on one card instead of creating a new flag ship chip. And don't forget the higher efficiency of Maxwell per shader - some of this will also apply to DP once there are a lot of DP-capable shaders.

And you actually don't need separate shaders for SP and DP. I really don't understand why nVidia is doing this since big Kepler. There's surely a slight improvement in power efficiency, which the gaming cards gladly take. The DP shaders don't cost them much area/transistors, because there so few of them. But for the big chips providing plenty of DP shaders really costs a lot of area/transistors. So before nVidia chooses to outfit their new very power- and area-efficient flagship with too few DP units, they should rather sacrifice a little power efficiency and use combined shaders, where 2 or 4 SP units work together as 1 DP unit.

@Jozef: this is not a scam. TSMCs planar 20 nm process provides little power consumption and transistor performance benefits over 28 nm at hardly reduced cost per transistor. It only makes sense to use this process for chips with large win margins (Apple, TSMC flag ship chips to a lesser extent) and where every mW of power saved counts (mobile). Sure, power also counts for desktop GPUs, but it's not like Maxwell is doing badly.

And TSMCs 20 nm capacity is still very limited, so it's first used for who ever pays the most (probably Apple). Since it's a new process the yields are very probably also not good yet, which is OK for small chips (mobile SoCs), but prohibitive for monster GPUs.

It may look like a conspiracy, but it's really just economics - these guys need to make money, and they'll simply take the actions they think are best suited for this purpose. If they stagnate for too long someone else will take over their market.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39368 - Posted: 2 Jan 2015 | 15:08:16 UTC - in response to Message 39354.
Last modified: 2 Jan 2015 | 15:23:41 UTC

A GM200 Quadro been spotted in GPU-Z database: 3072CUDA/192?256?TMU/96ROPS. If "M6000" is indeed the replacement for K6000 than DP performance will be near 1.7 Teraflops. A forth coming announcement will be this month. (Could include a CUDA7 toolkit)

As for a Dual GM204 dropping: this could happen after AMD releases their 390 series. (I hope AMD top card is very strong and +20-30% GM204 to force NVidias hand) A specialty Dual GM204 might be released by Asus or someone- similar to ROG Mars GTX 760. Maybe the GTX 960ti is a Dual GPU candidate also?
Remember: the very successful GTX690 (June2012) was released when Tahiti and GK104 were battling for top spot until Feb 2013 when the Titan became the Flagship with GTX 780 being a stop gap. (May2013) The hot and power hungry reference Hawaii released with weaker DP than the very successful Tahiti. ( AMD best arch and an excellent card for OpenCL compute) Nvidia came back with GTX780ti.
Nvidia ruled the roast during 2014 with long in tooth Kelper and low-Mid range 512/640 Maxwell for the first half of 2014 and very strong gaming GM204 for second half. (AMD x295 success forced Nvidia to slash Titan-Z price in half) AMD cards are no where original release prices unlike NVidia who's premium price is worth it to some- due to Nvidia's Developer Program offering chock full of goodies for Linux and Windows)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39371 - Posted: 2 Jan 2015 | 20:47:51 UTC - in response to Message 39368.


0nm is smaller than expected ;p
Aside from the likely 'app cant read' oddities (which might suggest 20nm; though unlikely IMO), it looks reasonably plausible, though 6600MHz is a little strange (possibly Eco tuned 7GHz GDDR5). The bit that makes no sense is the PCIE2 board, but it didn't the last time I saw a GPU ES. This raises the question of 20nm, 16nm and thus release dates; I really didn't expect a 20nm anything from NV any time soon, and I still don't...

As for a GTX990, while I might be wrong, I totally expect this. A 300W TDP is likely and it might match a GTX Titan Z in terms of throughput performance for here but be >20% more power economically and a LOT cheaper to buy...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39376 - Posted: 2 Jan 2015 | 22:59:18 UTC - in response to Message 39371.

0nm is smaller than expected ;p

Ouch - there are going to be some serious short channel effects with such small structures! I heard the chip is going to be called Atom, but they're still fighting with Intel in court over it.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39377 - Posted: 2 Jan 2015 | 23:02:39 UTC - in response to Message 39371.

This raises the question of 20nm, 16nm and thus release dates; I really didn't expect a 20nm anything from NV any time soon, and I still don't...

You're right- Nvidia will be 16FinFet. Speculation: The first batch of GM200 is likely 28nm unless TSMC has an excellent recipe of being extremely silent with mis-information spread and 16nm yield is way above initial expectations. 20nm is for low power SOC (phones) and maybe ~75TDP AMD APU's. AMD could stay at 28nm and then go Global Foundries 14nm. We might be stuck at 28nm until next year. (28nm for Maxwell GPU worked out. Running FP32 full bore on Air cooling has temps below 65C and even 55C when properly tuned.)

As for a GTX990, while I might be wrong, I totally expect this. A 300W TDP is likely and it might match a GTX Titan Z in terms of throughput performance for here but be >20% more power economically and a LOT cheaper to buy...

For 24/7 compute (here) the Titan-Z Temps that I've seen are little high as are some GTX690's. Maxwell's core structure is able to keep silicon temps lower than Kelper- a dual GM204 at 300TDP is well with-in engineering. I'm curious to see how Maxwell is affected once more DP cores are added.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39385 - Posted: 3 Jan 2015 | 22:38:52 UTC

3DCenter also commented on "Big Maxwell has no DP units". They've got 2 rather convincing points:

- the statement comes straight from nVidia, aimed at professionals
- 3072 Maxwell Shaders with additional DP units may simply have been too large for 28 nm, where ~600 mm² is the feasible maximum

To this I'll add:
- the Maxwell design does not include the "combined" SP/DP shaders I mentioned above, so nVidia is not using this because they simply don't have them
- Maxwell was planned for 20 nm some 2+ years ago, there was not enough time for such a redesign since it was clear that the chips have to be built with 28 nm
- nVidia won't want the shader blocks to differ throughput chips, the more they can straight recycle the easier (also for software optimization)

And previously I wrote:

don't forget the higher efficiency of Maxwell per shader - some of this will also apply to DP once there are a lot of DP-capable shaders.

I still stand by this statement. However, most of Maxwells increased efficiency per Shader comes from the fact that the super-scalar shaders are not unused most of the time. But in DP there are fewer shaders anyway, so Kepler has not extra tax to pay for unused super-scalar units. Maxwell couldn't improve on this.. and the remaining benefits like better scheduling were probably not worth the cost for nVidia.

Thereby I mean the cost of outfitting GM210 with enough DP units to make it faster than GK210. This would probably have made the chip too large with 24 SMM, which means they would have needed to reduce the number of SMMs and sacrifice SP / gaming performance.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39500 - Posted: 16 Jan 2015 | 16:54:47 UTC

http://wccftech.com/nvidias-flagship-maxwell-gm200-gpu-core-pictured-reference-board-features-12-gb-vram-massive-die/

3072 Maxwell Shaders with additional DP units may simply have been too large for 28 nm, where ~600 mm² is the feasible maximum

TSMC stretched the limit: GM200 at 600 mm². Compute Capability for GM200 is unknown. It won't be 3.5/3.7 as this designation is for Kelper. C.C 5.5 is possible. Searching many CUDA7 C/C++ header for DP GM200 clues....

Maxwell has a 20nm GPU: TegraX1 (2SMM/[4] A-53 [4] A-57 ARM cores) I wonder if Nvidia sneaks a 20nm High performance GPU into the mix? GM200-400-A1 Quadro is confirmed at 28nm. This raises the question if GM200 will make into the GeForce line-up. A cut down version at 2688cores (21SMM) is possibly the first Geforce released. The first GK110 Tesla and original Titan were cut. Titan's are becoming rarer by the minute. Prices all over the place in the US.

A side note: TSMC is having a lot trouble with 16nm FinFet - losing QUALCOMM to Samsung. 2H 2016 is now when high performance wafers could be ready for full scale production. 3Q of 2015 was the initial estimate for 16nm.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39520 - Posted: 18 Jan 2015 | 12:40:33 UTC - in response to Message 39500.
Last modified: 18 Jan 2015 | 12:46:18 UTC

CC5.5 would make sense.

To equal Kepler for DP performance a 3072shader GM200 card would need to have 1/4 DP capable shaders and then it would still be likely have a ~250W TDP. There would be no purpose in doing that (possibly 1/3 but I doubt it). So I agree that GM200 is probably not going to be a high end DP card to replace GK Titans, and is going to be a lot more like a GM104 than you would expect from a big version of Maxwell.
I'm just expecting a slightly more refined architecture tweaked to use up to 12GB DDR5 and not much else unless it adds DirectX 12.1, OpenGL 4.6, or some updated port version.

So 50% bigger than a GTX980, a 384bit bus, 12GB GDDR5 (& likely a 980Ti version with 6GB) and a fat price tag.

NV could still launch a 990 and another dual Titan at a later date; performances would be well spaced out.

According to NV, the successor to Maxwell will be Pascal (previously called Volta) and this is still due in 2016, so I think GM200 is 28nm and a 16nm Pascal is more likely than what would be a fourth generation Maxwell by then. 1/2 to 1/4 DP might reappear on 16nm.

Off topic, Pascal is supposed to introduce 3D memory and Unified memory (CPU can access it) and NVlink for faster GPU to CPU and GPU to GPU communications. These abilities will widen possible research boundaries, making large multi-protein complex modelling (and similar) more accurate and whole organelle modelling possible.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

RaymondFO*
Send message
Joined: 22 Nov 12
Posts: 72
Credit: 14,040,706,346
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39521 - Posted: 18 Jan 2015 | 15:16:37 UTC

EVGA has this "Kingpin" (GTX 980 classified version) that has three (3) power inputs (8pin + 8pin + 6pin) available for pre-order starting 01 Feb 2015 for existing EVGA customers. To qualify as an existing EVGA customer, you must have already registered at least one (1) or more EVGA products on their web site.

http://www.evga.com/articles/00896/EVGA-GeForce-GTX-980-KINGPIN-Classified-Coming-Soon/

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39544 - Posted: 20 Jan 2015 | 22:00:00 UTC - in response to Message 39521.

That Kingpin is not big Maxwell. And pretty useless, unless you want to chase world records with deep sub-zero temperatures ;)

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40152 - Posted: 12 Feb 2015 | 2:08:13 UTC - in response to Message 39385.

3DCenter also commented on "Big Maxwell has no DP units". They've got 2 rather convincing points:

- the statement comes straight from nVidia, aimed at professionals
- 3072 Maxwell Shaders with additional DP units may simply have been too large for 28 nm, where ~600 mm² is the feasible maximum

To this I'll add:
- the Maxwell design does not include the "combined" SP/DP shaders I mentioned above, so nVidia is not using this because they simply don't have them
- Maxwell was planned for 20 nm some 2+ years ago, there was not enough time for such a redesign since it was clear that the chips have to be built with 28 nm
- nVidia won't want the shader blocks to differ throughput chips, the more they can straight recycle the easier (also for software optimization)

And previously I wrote:
don't forget the higher efficiency of Maxwell per shader - some of this will also apply to DP once there are a lot of DP-capable shaders.

I still stand by this statement. However, most of Maxwells increased efficiency per Shader comes from the fact that the super-scalar shaders are not unused most of the time. But in DP there are fewer shaders anyway, so Kepler has not extra tax to pay for unused super-scalar units. Maxwell couldn't improve on this.. and the remaining benefits like better scheduling were probably not worth the cost for nVidia.

Thereby I mean the cost of outfitting GM210 with enough DP units to make it faster than GK210. This would probably have made the chip too large with 24 SMM, which means they would have needed to reduce the number of SMMs and sacrifice SP / gaming performance.

MrS

ETA: excellent post ---- I've been following this ever evolving rumor more closely and surprisingly: 3Dcenter has again "confirmed" Double Precision compute is severely limited compared to enabled GK110.

It's possible the DP rumor is purposeful misinformation distortion by an insider being paid to create uncertainty. Deceptive tactics are nothing new within industries. I'm patiently awaiting trusted compute programs to PROVE weak DP performance. There will be a pile of intriguing bait click rumors before the launch of AMD or NVidia's 28nm "flagship".

The stacked HBM 4096bit bus hoopla about AMD 300 series dampens NVidia's secretive approach. Engaging in a continuously defensive posture about products allows for more rumors to crop up - even with a rising market share. (Never mind the 970 internal transmits issue topping out at 5% return rate) Will NVidia quell GM200 rumors or wait for real world performance results? The Professional market a different domain than game[ing]

Even if Maxwell DP matches Kelper: C.C3.5/3.7 will reign HPC until 16nm Pascal with possible ARM cores. Which engineering or science sector will upgrade from GK110 to GM200? A handful? An upgrade from Fermi to Maxwell is reasonable - unless of course: awful DP. At this point waiting for Pascal seems logical.

C.C3.5 well engineered structure is long in tooth (higher clocks and 32core sub-sets blocks are one of reasons Maxwell is faster for certain paths) C.C3.0 is still a decent Float performer - C.C5.0/5.2 has an edge for integer workloads.

The GPU Technology Conference in March is supposedly where the Quadro/Titan2/980ti/990 will be announced.

If GM200 limited FP64 workload is exposed - what's NVidia thinking behind building (besides profit) a supposedly professional market GPU (DP CUDA accounts for over 60%) - whose major compute component is suddenly missing or underwhelming?

As skgiven mentioned: what's the purpose of GM200 if DP is shunned? To be a +225W TDP FP32/int32/gaming GPU that's +20% a GTX780ti or 980? A GeForce with weak DP is understandable: each successive GP generation has lessened DP performance. It's concerning and mad to revise the Tesla/Quadro/Titan brand backwards by offering less compute options than prior generations. Without an impressive 1/2 ratio: the advantage for DP compute workloads is (1/4r) minimal or inherently less with Maxwell's 1/32 ratio. GK110 plenty capable even if GM200 1/4 supplies similar DP FLOPS. GK110 features became mainstream in all Maxwell's - nary an upgrade.

Maxwell adds a few gaming enhancements and HEVC advances. GM206 became the first GPU to adapt HDMI2.0 and full 265 video rendering. Kelper video SIP block is the first generation. GM107 second revision while GM204 is third with GM206 being the fourth revision. GM200 SIP will most likely be similar to GM206.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40216 - Posted: 19 Feb 2015 | 20:01:42 UTC - in response to Message 40152.

Thanks, eXa. Regarding the question:"Why GM2*0 without DP?" I think it's actually rather simple:

Build a fat gaming GPU. GK110 Titan sold well, and so will this one.

Quadros are mainly used for graphics, so they can use such a big Maxwell just fine.

Only offer it on Teslas like the K10 with GK104 GPUs, which are explicitly sold for their SP performance. The remaining market can stick to GK210 based cards. This doesn't cover the entire market, but probably a good share of it.

And finally there's the rising market of GPU virtualization, where one may benefit from better load balancing using fewer fat GPUs.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40353 - Posted: 4 Mar 2015 | 22:43:53 UTC

http://anandtech.com/show/9049/nvidia-announces-geforce-gtx-titan-x

No confirmation regarding double precision performance. More information will be released during the NVIDIA GPU Technology Conference (possible launch) in a couple weeks.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40404 - Posted: 9 Mar 2015 | 14:31:02 UTC - in response to Message 40353.
Last modified: 17 Mar 2015 | 16:39:16 UTC

Titan X will have 8 billion transistors, 3,072 CUDA Cores, 12 GB GDDR5 memory and will be based on Maxwell.
As the GTX980 has 5.2B transistors Titan X will likely be about 8/5.2=1.5 times faster than the GTX980, not taking into account any architectural changes that might infer better performance (or otherwise).

The suggestion is that it will have a 384bit memory bus width, which is 50% bigger than the GTX980 and would sit well with the ~50% cuda core increase.

Power draw is likely to be similar to the Titan Blacks 225-300 watts,
http://www.tomshardware.com/news/nvidia-geforce-gtx-titan-x,28694.html

It has been branded as a gaming card so I don't expect Titan X to reintroduce genuine high performance double precision (1/2) at the expense of sp. That might come in the form of a new Tesla some way down the road, and there already is a K40 with 2880 Cuda Cores & 1/3rd dp performance, albeit Kepler based.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40410 - Posted: 9 Mar 2015 | 19:36:44 UTC

I thought the Big Maxwell should be really fast and got its own one or more RPM chip(s) so that it need less interaction with the CPU.

But if it is only 1.5 times faster then a GTX980, then I will wait until 2016 to buy new GPU's.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40416 - Posted: 10 Mar 2015 | 15:09:28 UTC
Last modified: 10 Mar 2015 | 15:19:41 UTC

http://videocardz.com/55013/nvidia-geforce-gtx-titan-x-3dmark-performance

No SP/DP/Integer compute benchmarks as of yet. The performance comparison mentioned: 3D mark (extreme) Firestrike benchmark. An overclocked (1200MHz) TitanX is nearly on par with a 5760 CUDA TitanZ and 5632 GCN AMD295x. Compared to overclocked GTX980 - the Firestrike score is +20%.

Reported base clock is 1002MHz which translates into 6TeraFlops for 32bit - 1.4TeraFLOPS more than a 1126MHz reference GTX980.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40426 - Posted: 11 Mar 2015 | 16:17:38 UTC - in response to Message 40416.
Last modified: 11 Mar 2015 | 16:23:06 UTC

Based on those specs the Titan X should be at least 20% faster than a Titan Black which in turn would mean it's at least 25% faster than a GTX980.
However, it's much more likely that it will boost higher than the Titan Black and that GM200 will offer some new performance gain.
Even a conservative 6% boost & 6% design gain would mean the Titan X will be 40% faster than a GTX980.
Raise those anticipated gains to just 9% each and the performance would be ~50% better than a GTX980. That's more realistic and I wouldn't dismiss a 60% improvement.

In terms of performance/Watt it's a clear winner over the Titan Black. Basically >25% extra performance for the same power draw.

I don't expect to see much performance/Watt gain compared to existing Maxwell's.
12GB GDDR5 is a right lump and much more than what's needed for hear. It might however allow 2 or possibly 3 tasks to run simultaneously.

At $999 I don't think there will be many takers, especially when two GTX970's will likely more than match performance while costing significantly less ($309), and that's today's prices against an as yet unreleased GPU.

Hopefully the release of the Titan X will drive down the prices of the GTX980 and GTX970.

Obviously if you want to have one super-system then these will likely be the cards to buy, but you would be looking at a 1200W to 1500W PSU and high-end hardware throughout.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40430 - Posted: 11 Mar 2015 | 18:48:01 UTC - in response to Message 40426.

Hopefully the release of the Titan X will drive down the prices of the GTX980 and GTX970.

GM204 prices have fluctuated from below MSRP to way above - steadily selling since. AMD will be releasing new high performance GPU's with-in a few months. When AMD future core performance is above/better than NVidia - prices move downward even further.

In terms of performance/Watt it's a clear winner over the Titan Black. Basically >25% extra performance for the same power draw.

Reference Kelper/Maxwell Titan's clock (+-166MHz/836-1002MHz) difference equals to +1Tera (32bit) code instructions performance improvement.
5~TeraFLOPS 32bit performance for reference 2688/2880 CUDA GK110.
6.1Tera performance for 3072 CUDA GM200 32bit at 1002MHz base clock.
~6Tera 32bit instruction output is possible on a overclocked GK110 Titan.
~7TeraFLOPS = TitanX +1200MHz clock


For 32bit - the TitanX has an advantage. If GM200 SP/DP ratio is the standard Maxwell 1DP core in every 32c subset (4DPper128SMM) than overall performance/watt is slewed. I would compare the TitanX overall compute capabilities to a (120DPc) GTX780ti rather than 896 or 960 DP64 core enabled GK110 Titan. Will GM200 be 64bit worthy?

GM200's 64bit core/memory structure (performance) is unknown. Whitepaper analysis has yet to be revealed. GM107/204/206 Maxwell's 64bit C.C5.0/5.2 lacks the faster C.C3.0/3.5 Kelper warp/thread 64bit shared memory pipeline. (GK210) C.C3.7 upped to 128 bit data path. See CUDA performance guide.

Combining 32bit/64bit - GM204/206/107 fewer total DP cores (C.C5.0/5.2) executes less - lagging behind C.C3.0/3.5/3.7 advanced 64bit code instruction output. GK110 Kelper slower clocks (DP64 driver setting) point to 896/960 Titan 64bit cores energy management requirement. 32bit cores operate at lower wattage. Silicon with less 64bit cores will being down circuitry energy. So far - Maxwell trades less DP cores for higher 32bit core clocks. A 64DP SMX power draw shifts - hard to pin down amount of energy the core is being fed while computing DP.

GM200 raster operations 96(ROP) performance (Game Graphics) doubled compared to Kelper's 48. Faster [24] SMM Polymorph engine(s). GM200's revised Texture Mapping Units (192) same number as [GTX780] - 48/32 less than first generation 15SMX Titan (Black) and 14SMX Titan.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40490 - Posted: 17 Mar 2015 | 15:37:39 UTC

GM200 officially launches today. GTC starts at 9amPST/12EST.

For those interested:

https://registration.gputechconf.com/form/session-listing

Reference GM200 Titan PCB (NVTTM) similar to GTX690/770/780/780ti/Titan/Black/TitanZ 8 power phase (6+2) design with 6 MOSFETS and a OnSemi NCP4206 voltage chip. Layout is slightly different from GK110 or GK104.

GM204/206 overclocks really well: the GM200 will be capable of (1400-1500Mhz).

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40491 - Posted: 17 Mar 2015 | 16:13:54 UTC - in response to Message 40490.
Last modified: 17 Mar 2015 | 16:40:17 UTC

You can watch the presentation live now,

http://blogs.nvidia.com/blog/2015/03/16/live-gtc/

So $999 and dp cut down to 0.2TFlops - will probably make it better for here.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40494 - Posted: 17 Mar 2015 | 17:08:44 UTC

Mmm not a bad price and it looks more promising then I thought. I will start saving money so I can buy one in fall.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40499 - Posted: 17 Mar 2015 | 20:40:14 UTC - in response to Message 40491.
Last modified: 17 Mar 2015 | 21:27:48 UTC

dp cut down to 0.2TFlops

Very disappointing DP performance for a Flagship. Hat tip to 3D center in Germany for correctly predicating horrid DP.

GK110/GK210 (C.C3.5/3.7) is still the HPC compute flagship of NVidia. GM200 is a compute downgrade when factoring in all aspects of how a GPU creates. For anyone who writes in something other then 32bit: GM200 is not the GPU to buy.

GM200 at 1000$ with Intel Broadwell DP performance is outrageous. Nvidia could have used an Ultra/985/1*** series moniker instead of Titan and priced at 649$. Titan Moniker is now tainted. (The future ti version of GM200) at the reasonable price of 649$ with similar performance as GM200 "Titan" - being the replacement for 32bit overall GTX780ti compute capabilities which launched at 699$.

GM200 marketing as the Titan replacement is nonsense because loss of GK110 DP compute features. More gaming revisions less compute options. Many practical long term GPU usage options exist along with games. (GM200 ti release will be sometime in June when AMD shows off their potent GPU's.) NVidia will also release GM cut dies near AMD's launch.

Maxwell's overall DP has been exposed as weak: [32] 32bit memory banks rather [32] 64bit banks like Kelper. The GTX480/580/780 all have similar DP performance as a Maxwell "flagship".

For here: GM200 (a larger GM204/same Compute set) --- FP32 ACEMD performance will be outstanding.

Whomever owns the DP enabled GK110 compute card Titan - now has a GPU with longstanding value that will stay until 16nm Pascal or beyond.
When AMD 4096 core/64CU 390X flagship offers the Hawaii 1/8 ratio (512DP) 8DP per 64C CU -- AMD revised cores are little short of GK110 C.C3.5 complete compute arch. 2048c/32CU Tahiti's 1/4 ratio 512DP cores (16DPper64CU) has given NVidia trouble in some DP markets. CUDA's DP market share - being eroded by OpenACC/OpenMP is eminent if Pascel DP is slow for a "flagship" GPU. A decline of CUDA DP will be a foregone conclusion.

(Doubling the amount of ROPS). For graphics: this an upgrade for Maxwell filtering and display advances. Although Vulcan (OpenGL) and DirectX Kelper (vertex/pixel/tessellation/geometry) unified shaders work with the same feature levels as Maxwell. The revised (8TMUper128SMM) GM200 higher clocked Texture Mapping Units are faster compared to Kelper. 192 for 3072 GM200 cores. The (16TMUper192SMX) GK110 consists of 2304/192 > 2688/224TMU > 2880/240.

http://anandtech.com/show/9059/the-nvidia-geforce-gtx-titan-x-review/15

http://www.guru3d.com/articles_pages/nvidia_geforce_gtx_titan_x_review,10.html

http://www.tomshardware.com/reviews/nvidia-geforce-gtx-titan-x-gm200-maxwell,4091-6.html

Density of GDDR5 12GB heats up: temps higher than core or VRM temps.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40581 - Posted: 22 Mar 2015 | 17:03:16 UTC - in response to Message 40494.
Last modified: 22 Mar 2015 | 17:20:38 UTC

Mmm not a bad price and it looks more promising then I thought. I will start saving money so I can buy one in fall.

TJ, this card is 50% more hardware than a GTX980 for about double the price. That's a rather bad value proposition! While I generally support the argument of "fewer but faster GPUs" for GPU-Grid, this is too extreme for my taste. Better wait for AMDs next flag ship, which should arrive within the next 3 months. If it's as good as expected at priced around 700$, nVidia might offer a GTX980Ti based on a slightly cut-down GM200 at a more sane ~700$. Like the original Titan and GTX780Ti - the latter came later but with far better value (for SP tasks).

@eXa: don't equate "compute" with DP. Of course GM200 is weak if you need serious DP - but that's no secret. NVidia knows this, of course, and has made GK210 for those people (until Pascal arrives). For any other workload GM200 is a monster and mostly a huge improvement over GK110. There are a lot of such computing workloads. Actually, even if you have a DP capable card you'd still be better off if you can use it in SP mode.

If this thing is worth the "Titan" name is really irrelevant from my point of view. Few people bought the original Titan for it's DP capabilities. Fine.. they'll know enough to stick to Titan and Titan Black until Pascal. But most are just using them for games & benchmarks. They may want DP, but they don't actually need it.

GM200 at 1000$ with Intel Broadwell DP performance is outrageous.

Only if you measure it by your expectations, formed by the 1st generation of Titans.

And don't forget: including DP units in GM200 would have made it far too big (and made it loose further clock speed & yield, if it was possible at all). Or, at the same size but with DP the chip would have had significantly less SP shaders. The difference to GTX980 would have been too small (such a chip would still loose clock speed compared to smaller Maxwells) and hence a tougher sell for any market that the current GM200 appeals to.

I admit I didn't believe the initial rumor of GM200 without DP. But now it does make a lot of sense. Especially since most Maxwell improvements (keeping the SP shaders busy) wouldn't help in DP, because Kepler was never limited here (in the same way) as it is in SP.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40592 - Posted: 23 Mar 2015 | 13:27:59 UTC

When will the GPUGrid app support BigMaxwell?
Does anyone have such card to test the current client?
I'm planning to sell my old cards (GTX670s and GTX680s), and buy one BigMaxwell, as I want to reduce the heat (and the electricity bills) generated by my cards for the summer :)
I didn't bought more GTX980s because of this card (the Titan X), and now I don't plan to buy more than one until its price drops a little.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40593 - Posted: 23 Mar 2015 | 20:57:31 UTC - in response to Message 40592.
Last modified: 23 Mar 2015 | 21:47:20 UTC

I guess nobody has a GTX Titan X attached and working yet, otherwise it would appear in Performance

While Titan X would do as much work as two GTX780's using ~64% of the power, it is a very expensive unit:

In the UK you can get a Titan X for ~£900, a GTX980 for ~£400, or a GTX970 for ~£260.

So, you could buy 2 GTX970's, do ~24% more work and save £380,
or buy 2 GTX980's, save £100 and do ~33% more work,
or you could get 3 GTX970's, save £120 and do ~70% more work.

In theory, performance/Watt is about the same, and can be tweaked a lot by tuning. So you could reduce the voltage of each 970 to use 20W less power and still match the Titan X for performance.

As the Titan X is a flagship GPU it costs a lot, but it has already driven down the price of the GTX980.

It use to be the case that performance/Watt scaled with die size. While I'm not sure that's still the case the Titan X is carrying 12GB GDDR5. If that had scaled compared to the GTX980 it would only have been 6GB.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40603 - Posted: 24 Mar 2015 | 16:58:30 UTC - in response to Message 40592.

Does anyone have such card to test the current client?

Performance Tab: TitanX -- NOELIA_1mgx (short)

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 40604 - Posted: 24 Mar 2015 | 17:47:28 UTC - in response to Message 40592.

When will the GPUGrid app support BigMaxwell?


Soon, but not imminently. AFAIK, no one's attached one yet.
(It's working here in the lab).


I'm planning to sell my old cards (GTX670s and GTX680s),


High roller! I'd have throught 980s would be more cost effective?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40608 - Posted: 25 Mar 2015 | 0:39:25 UTC - in response to Message 40604.
Last modified: 25 Mar 2015 | 0:41:21 UTC

When will the GPUGrid app support BigMaxwell?

Soon, but not imminently. AFAIK, no one's attached one yet.
(It's working here in the lab).

FYI, it's working now (on Windows 8.1).
Here is a short workunit processed successfully on a Titan X. (Thanks to eXaPower for pointing it out)

I'm planning to sell my old cards (GTX670s and GTX680s),

High roller! I'd have throught 980s would be more cost effective?

They are.
But as I will apparently loose my 2nd place on the overall toplist, I'd like to do it in a stylish manner.
So I shall continue to have the fastest GPUGrid host on the planet at least. ;)

[CSF] Thomas H.V. DUPONT
Send message
Joined: 20 Jul 14
Posts: 732
Credit: 100,630,366
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 40609 - Posted: 25 Mar 2015 | 7:11:15 UTC

https://twitter.com/TEAM_CSF/status/580627791298822144
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40613 - Posted: 25 Mar 2015 | 15:05:10 UTC - in response to Message 40608.
Last modified: 25 Mar 2015 | 15:08:06 UTC

--- NOELIA_1mg --- (980) began work -- Titan X finished it. Host# 196801 resultid=14020924

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40617 - Posted: 25 Mar 2015 | 19:08:42 UTC - in response to Message 40613.
Last modified: 25 Mar 2015 | 19:08:54 UTC

--- NOELIA_1mg --- (980) began work -- Titan X finished it. Host# 196801 resultid=14020924

And we see the Windows limitation of four again...only 4GB of that awesome 12 is recognized and used.

I think I will wait for the "real" big Maxwell with its own CPU. At this time 1178 Euro for a EVGA one is too much for my wallet at the moment.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40618 - Posted: 25 Mar 2015 | 22:25:29 UTC - in response to Message 40604.

MJH wrote:
I'd have thought 980s would be more cost effective?

Yep, and even more so the GTX970's (see SKs post).

@TJ: don't worry about the memory, ~1 GB per GPU-Grid task is still fine.

And don't wait for any miraculous real big Maxwell. GM200 is about as bis as they can go on 28 nm, and on Titan X it's already fully enabled. I'd expect a cut-down version of GM200 at some point, but apart from that the next interesting chips from nVidia are very probably Pascal's.

And about this "integrating CPU" talk: the rumor mill may have gotten Tegra K1 and X1 wrong. These are indeed Kepler and Maxwell combined with ARM CPUs.. as a complete mobile SoC. Anything else wouldn't make much sense in the consumer range (and AMD is not giving them any pressure anyway), so if they experiment with GPU + closely coupled CPU I'd expect this first to arrive together with NVlink and Open Power server for HPC. And priced accordingly.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40624 - Posted: 26 Mar 2015 | 1:01:42 UTC - in response to Message 40618.


@TJ: don't worry about the memory, ~1 GB per GPU-Grid task is still fine.

MrS

Thanks for the explanation. Haha I don't worry about the memory but crunchers on Windows pay quite a lot for 8GB that cannot be used with this new card.

It depends on my financial conditions but I think I wait for a GTX980Ti, but not before fall, as summer turns to warm for 24/7 crunching (without AC).


____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40628 - Posted: 26 Mar 2015 | 20:36:50 UTC - in response to Message 40608.
Last modified: 26 Mar 2015 | 20:43:24 UTC

But as I will apparently loose my 2nd place on the overall toplist, I'd like to do it in a stylish manner. So I shall continue to have the fastest GPUGrid host on the planet at least. ;)

(ROBtheLionHeart) Overclocked 780ti (XP) NOELIA's are a hair faster than you're mighty 980 (XP). Rarely a sight not to see you (RZ) with the fastest times! The Performance Tab engaging comparative tool for crunchers a new way to learn about work units and expected GPU performance.

Current Long run only Maxwell RAC per day (crunching 24/7) including OS factors:

1.15-1.3mil for [1] Titan X (future Ti GM200 version - unknown SMM count and release date)
750-850k [1] 980
600-700k [1] 970
350-450k [1] 960
225-300k [1] 750ti
175-250k [1] 750

skgiven's Throughput performances and Performances/Watt chart -- relative to a GK110 GTX Titan.
    Performance GPU Power GPUGrid Performance/Watt
    211% GTX Titan Z (both GPUs) 375W 141%
    116% GTX 690 (both GPUs) 300W 97%
    114% GTX Titan Black 250W 114%
    112% GTX 780Ti 250W 112%
    109% GTX 980 165W 165%
    100% GTX Titan 250W 100%
    93% GTX 970 145W 160%
    90% GTX 780 250W 90%
    77% GTX 770 230W 84%
    74% GTX 680 195W 95%
    64% GTX 960 120W 134%
    59% GTX 670 170W 87%
    55% GTX 660Ti 150W 92%
    53% GTX 760 130W 102%
    51% GTX 660 140W 91%
    47% GTX 750Ti 60W 196%
    43% GTX 650TiBoost 134W 80%
    37% GTX 750 55W 168%
    33% GTX 650Ti 110W 75%

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40635 - Posted: 26 Mar 2015 | 23:05:13 UTC - in response to Message 40628.
Last modified: 26 Mar 2015 | 23:23:13 UTC

Roughly 56% faster than a Titan,

https://www.gpugrid.net/forum_thread.php?id=1150

What are the boost clocks?

http://www.tomshardware.com/reviews/nvidia-geforce-gtx-titan-x-gm200-maxwell,4091-6.html
Suggests it boosts to 1190MHz but quickly drops to 1163MHz (during stress tests) and that it's still dropping or increasing in steps of 13MHz, as expected.
1190 is ~15% shy of where I can get my GTX970 to boost, so I expect it will boost higher here (maybe ~1242 or 1255MHz).
Apparently the back gets very hot. I've dealt with this in the past by blowing a air directly onto the back of a GPU and by using a CPU water cooler (as that reduces radiating heat).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40638 - Posted: 26 Mar 2015 | 23:43:56 UTC - in response to Message 40635.
Last modified: 27 Mar 2015 | 0:43:42 UTC

Roughly 56% faster than a Titan.

As you expected for GPUGRID ACEMD: an increase of ~60%

Suggests it boosts to 1190MHz but quickly drops to 1163MHz (during stress tests) and that it's still dropping or increasing in steps of 13MHz, as expected. 1190 is ~15% shy of where I can get my GTX970 to boost, so I expect it will boost higher here (maybe ~1242 or 1255MHz). Apparently the back gets very hot. I've dealt with this in the past by blowing a air directly onto the back of a GPU and by using a CPU water cooler (as that reduces radiating heat).

12Gb of Memory is really hot (over 100C) for prolong use. As a fan blowing on the back helps- Will a back plate lower temps with such density or hinder by holding more heat to those outer (opposite the heat sink) memory pieces? Custom Water block (back) plate? (Full CPU/GPU water cooling loop) Only air cooling at 110C: down clocking of GDDR5 with a voltage drop could help. Longevity concern: can GDDR5 sustain +100C temps? I think the power techup website (Titan X review) shows the model number to reference. GDDR5 rated at 70C/80C/90C? Over bound Temperatures will certainly impact long term overclocking and boost rate prospects.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40642 - Posted: 27 Mar 2015 | 8:40:49 UTC - in response to Message 40638.
Last modified: 27 Mar 2015 | 9:18:54 UTC

Cooling 12GB appears to be a big issue and it's likely hindering boost and power consumption, but without the 12GB it wouldn't be a Titan, saying as it doesn't have good dp.
Hopefully a GTX980Ti with 6GB will appear soon - perhaps 2688 cuda cores, akin to the original Titan, but 35 to 40% faster for here and <£800, if not <$700 would make it an attractive alternative to the Titan X.

For the Titan X, excellent system cooling would be essential for long term crunching, given the high memory temps. I would go for direct air cooling on the back of one or possibly 2 cards, but if I had 3+ cards I would want a different setup. The DEVBOX's just use air cooling, with 2 large front case fans, but I suspect the memory still runs a bit hot. Just using liquid wouldn't be enough, unless it included back cooling. I guess a refrigerated thin-oil system would be ideal, but that would be DIY and Cha-Ching!

In my experience high GDDR temps effects performance of lesser cards too, and I found stability by cooling the back of some cards. While tools such as MSI Afterburner allow you to cool the GPU via the fans they don't even report the memory temps. It's often the situation that the top GPU (closest the the CPU) in an air cooled case runs hot. While this is partially from heat radiation it's mostly because the airflow over the back of the card comes from the CPU so it's already warm/hot. A basic CPU water cooler is sufficient to remove this issue, and for only about twice the cost of a good CPU heatsink and fan it's a lot cheaper than a GPU water cooler.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40648 - Posted: 27 Mar 2015 | 12:21:09 UTC - in response to Message 40642.
Last modified: 27 Mar 2015 | 12:35:13 UTC


While tools such as MSI Afterburner allow you to cool the GPU via the fans they don't even report the memory temps.

Many voltage controllers are dumb with VID lines only. (Titan X) No i2C support. Driver read only rather than software. There are few a PCB (Maxwell) with the ability to manual read temps or voltages on the back of PCB: 980 strix and all gun metal Zotac. PNY also allows manual measurements on one of dual fan OC models with i2c support. All Asus strix have i2C advanced support. A few Zotac models support i2c. Most others (MSI/EVGA/Gigabyte) don't. Advanced i2C support on a PCB is helpful.

http://i2c.info/i2c-bus-specification

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40659 - Posted: 27 Mar 2015 | 23:42:48 UTC
Last modified: 27 Mar 2015 | 23:56:27 UTC

https://www.gpugrid.net/forum_thread.php?id=3551

For the Titan X, excellent system cooling would be essential for long term crunching, given the high memory temps. I would go for direct air cooling on the back of one or possibly 2 cards, but if I had 3+ cards I would want a different setup. The DEVBOX's just use air cooling, with 2 large front case fans, but I suspect the memory still runs a bit hot. Just using liquid wouldn't be enough, unless it included back cooling. I guess a refrigerated thin-oil system would be ideal, but that would be DIY and Cha-Ching!


Cooling 12GB appears to be a big issue and it's likely hindering boost and power consumption, but without the 12GB it wouldn't be a Titan, saying as it doesn't have good dp. Hopefully a GTX980Ti with 6GB will appear soon - perhaps 2688 cuda cores, akin to the original Titan, but 35 to 40% faster for here and <£800, if not <$700 would make it an attractive alternative to the Titan X.

Recent reports point to a full GM200 980Ti with 6GB - June/July launch.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40661 - Posted: 28 Mar 2015 | 14:44:27 UTC

The easiest way to cool the memory on the back side would be small passive aluminum RAM coolers. They're obviously not as strong as full water cooling, but cost almost nothing and given some airflow could easily shave off 20 - 30°C. There's not enough space for this in tightly packed multi-GPU configurations, though.

And as far as I know the boost mode doesn't deal with memory at all. There's only a secondary interarction via the power draw. But the memory chips don't consume much power anyway, so there's going to be a negligible change with temperature.

Regarding longevity: 100°C is hot for regular chips, but fine for e.g. voltage regulators. Not sure about memory chips. One would guess nvidia has thought this through, as they can't afford Titans failing left and right after a short time. Did they consider continous load? I'm not sure, but I hope they expect people paying 1000$ for a GPU to use them for more than the occasional game.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40670 - Posted: 28 Mar 2015 | 23:32:45 UTC - in response to Message 40661.

You would think so, but I have a colleague who has bought four (4) 30 inch screens only to play flight simulator!

So perhaps the Titan X builders aim mostly on gamers who seems willingly to invest heavily in hardware. And they forget the 24/7 crunchers. They make Tesla-like boards for calculating (hence no monitors can be attached) for dedicated crunching. These cards are a lot more expensive but are build for heavy use with the best possible parts.

But I guess the Titan X is heavily tested and some brands use the best parts in their cards too. I am a big fan of EVGA stuff, but you know that off course. Their FOC's are more expensive but have extra high grade components.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40672 - Posted: 29 Mar 2015 | 1:13:33 UTC - in response to Message 40659.
Last modified: 29 Mar 2015 | 1:14:25 UTC

Cooling 12GB appears to be a big issue and it's likely hindering boost and power consumption, but without the 12GB it wouldn't be a Titan, saying as it doesn't have good dp. Hopefully a GTX980Ti with 6GB will appear soon - perhaps 2688 cuda cores, akin to the original Titan, but 35 to 40% faster for here and <£800, if not <$700 would make it an attractive alternative to the Titan X.

Recent reports point to a full GM200 980Ti with 6GB - June/July launch.

That is very good news indeed.
Perhaps I'll wait for that card then.
6GB is ten times more than a GPUGrid task needs so it's an overkill.
Then 12GB would be ... complete waste of money.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40678 - Posted: 29 Mar 2015 | 14:05:56 UTC
Last modified: 29 Mar 2015 | 14:09:02 UTC

As more data aggregates, I get more confused.

.............e1s6_3-GERARD_FXCXCL12_LIG_11631322-0-1-RND5032_0 Titan--X--, Win 8.1, 28591 sec
e2s13_e1s6f79-GERARD_FXCXCL12_LIG_10920801-0-1-RND3888_0 GTX-980-, Win XP, 25607 sec
...........e1s16_7-GERARD_FXCXCL12_LIG_14907632-0-1-RND4211_0 GTX780Ti, Win XP, 29397 sec

e11s54_e4s195f119-NOELIA_27x3-1-2-RND1255_0 GTX-980-, Win XP, 11936 sec
........e7s6_e4s62f37-NOELIA_27x3-1-2-RND3095_0 GTX780Ti, Win XP, 12019 sec
....e7s18_e4s62f162-NOELIA_27x3-1-2-RND1763_0 GTX-980-, Win 8.1, 14931 sec

e2s1_792f101-NOELIA_3mgx1-1-2-RND1677_0 GTX-980-, Win 7., 18160 sec (30.7% more time)
..e3s12_47f95-NOELIA_3mgx1-1-2-RND3015_0 Titan--X--, Win8.1, 13893 sec (23.5% faster)

e5s155_2x164f180-NOELIA_S1S4adapt4-2-4-RND5279_1 GTX780Ti, Win XP, 12530 sec
....e8s83_e1s20f75-NOELIA_S1S4adapt4-3-4-RND3849_0 Titan--X--, Win 8.1, 12774 sec
e7s13_e5s186f149-NOELIA_S1S4adapt4-3-4-RND3162_0 GTX-980-, Win XP, 13822 sec
.....-e4s4_e1s27f81-NOELIA_S1S4adapt4-2-4-RND4550_0 GTX-980-, Win XP, 14080 sec

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40679 - Posted: 29 Mar 2015 | 14:38:27 UTC - in response to Message 40678.

There may be a pattern:

NOELIA_3mgx1: 75919 atoms, performs very well
NOELIA_S1S4adapt4: 60470 atoms, performs OK
GERARD_FXCXCL12: 31849 atoms, performs bad

The less atoms a WU contains, the less each time step take and the more often CPU intervention is needed. And the more difficult it is to make good use of many shaders. Direct evidence for this is the low GPU usage of the Gerard WUs with "few" atoms.

Maybe the CPU support of that Titan X is not configured as well as for the other GPUs? Maybe too many other CPU tasks are running, and / or he's not using "swan_sync=0" in addition to the WDDM overhead.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40686 - Posted: 29 Mar 2015 | 16:37:48 UTC - in response to Message 40679.
Last modified: 29 Mar 2015 | 16:49:57 UTC

There may be a pattern:

NOELIA_3mgx1: 75919 atoms, performs very well
NOELIA_S1S4adapt4: 60470 atoms, performs OK
GERARD_FXCXCL12: 31849 atoms, performs bad

The less atoms a WU contains, the less each time step take and the more often CPU intervention is needed. And the more difficult it is to make good use of many shaders. Direct evidence for this is the low GPU usage of the Gerard WUs with "few" atoms.

Maybe the CPU support of that Titan X is not configured as well as for the other GPUs? Maybe too many other CPU tasks are running, and / or he's not using "swan_sync=0" in addition to the WDDM overhead.

2% [4% Total] Hyper Threads usage for each ACEMD process. [2] physical cores for AVX DP @ ~60% Total CPU. (OS accounts 1%) GK107 exhibits this behavior: 98% core for NOELIA > GERARD 94%. (13773 atom) SDOERR_villinpubKc 87%. Powerful GPU's affected even greater.

Comparing top two time: GERARD_FXCXCL12_LIG_11543841 --- (31843 Natoms)
-Titan X (Win8.1) Time per step [1.616 ms.] 8.8% faster than [1.771 ms] GTX780ti (XP)

GTX780ti (XP) output 91% of Titan X (Win8.1) for this particular unit.

Titan X without WDDM tax lower's time per step and Total Runtime another 10%?

GERARD_FXCXCL12_LIG_11543841 --- (31843 Natoms) [28,216.69/30,980.94] 9% Runtime difference Titan X (Win8.1) > 780ti (XP)

Titan X possibly 20~% faster with (XP) at similar power consumption. Titan X core clocks are unknown (overclocked?) further complicating actual differences among (footprint) variables.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40687 - Posted: 29 Mar 2015 | 16:56:45 UTC - in response to Message 40679.

Over the last couple of years the performance variation between task types has increased significantly. This has made it awkward to compare cards of different generation/even sub-generation. Task performances vary significantly because different tasks challenge the GPU in different ways, even exposing the limitations of more subtle design differences.

While DooKey's 5820K system (with the Titan X) may be using lots of CPU, it's a 6 cores/12 thread system, and while he has a high rac at Universe@home (a CPU project), he does not have that system hooked up there,
http://boincstats.com/signature/-1/user/21224/sig.png
http://boincstats.com/en/stats/-1/user/detail/21224/projectList
http://universeathome.pl/universe/hosts_user.php?userid=2059

The CPU could be dropping into a lower power state, he could be gaming or playing with the setup/config:

3855MHz GDDR seems like a heavy OC for 3500MHz/7GHz memory.
While SKHynix have 4GB memory chips rated at 8GHz, I don't think the Titan X uses them?

Several errors here,
https://www.gpugrid.net/results.php?hostid=196801
and looking at some of the tasks, there are a lot of recoveries, before the failures,
https://www.gpugrid.net/result.php?resultid=14040943

The WDDM overhead could well be greater with the bigger cards and it might not be a constant, it could vary itself by task type (atom count/CPU demand). Last time we had a look there did seem to be some variation with GPU performance.

A significant overall performance gain might come by running more than one WU, as with the GTX970 and GTX980 on W7, Vista, W8 & W10 preview (WDDM2.0). Perhaps 3 WU's on a Titan X, would be optimal?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40695 - Posted: 30 Mar 2015 | 11:31:57 UTC - in response to Message 40686.
Last modified: 30 Mar 2015 | 12:16:05 UTC

GERARD_FXCXCL12 update:

Zoltan's 980 reclaimed [25,606.92] fastest runtime and [1.462 ms] time per step.

RZ's 980 9% faster than DooKey's [1.616 ms.] Titan X and 11-19% better than GTX780ti [1.771 ms.]

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41008 - Posted: 2 May 2015 | 12:18:09 UTC

http://videocardz.com/55419/nvidia-geforce-gtx-980-ti-confirmed-to-feature-6gb-memory

GM200-310 GPU will feature custom PCB cooling solutions. GM200-400 Titan X available cooling options are a water block or reference blower.

As of now: unknown if GM200-310 GTX980Ti is a (24SMM) full die or 20-23 SMM cut down.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41017 - Posted: 3 May 2015 | 11:15:45 UTC - in response to Message 41008.
Last modified: 4 May 2015 | 21:15:58 UTC

The 980Ti was inevitable and it's design predictable.

A 24SMM would be a 6GB Titan X. It would be competing with itself.
Still think it will have 2688 cuda cores.
While a 225W TDP is possible that's more like the Quadro style - expect them to keep it at 250W.
Core clock will likely be somewhere between 1088 and 1127, but could be higher with a 250W TDP. Whatever, it's the actual boost that really matters for here. Given the cooling options, there might be an increased performance variation between manufacturers implementations.
Wonder if they will stick with 7GHz GDDR5; 5months ago skhynix announced 8GHz GDDR5, but these are no longer listed on their site.
Probably $750 to $800 based on current prices (the GTX980 is still selling at release price and the GTX Titan X has went up from $999 to >$1170). That said AMD/ATI releases will have a say on NV prices.
My only real concern is that the performance might not scale as well (at present) for the GM200's, though it might if you ran 2 WU's on the same card.

Update - Will be 7GHz and 250W.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41128 - Posted: 23 May 2015 | 11:10:55 UTC
Last modified: 23 May 2015 | 11:12:54 UTC

http://videocardz.com/55566/nvidia-geforce-gtx-980-ti-performance-benchmarks

The GTX980ti launches in ~two weeks. (a new 7.5 CUDA toolkit will also be released) 980ti compute set is C.C 5.2 -- same as Titan X and 980/970/960. The GTX980ti consists of 2816CUDA (22SMM) with 176 Texture Mapping Units. ROPS are either 96 or 88 depending upon GM200 disablement process.

Off topic, Pascal is supposed to introduce 3D memory and Unified memory (CPU can access it) and NVlink for faster GPU to CPU and GPU to GPU communications. These abilities will widen possible research boundaries, making large multi-protein complex modelling (and similar) more accurate and whole organelle modelling possible.

Now the question is: when will 16nm Pascal/Volta be released? 12-18Months? Maxwell overclocking abilities are impressive compared to Kelper. Will 1500-1600+MHz clocks be possible on 16nm?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41193 - Posted: 29 May 2015 | 9:21:23 UTC - in response to Message 41128.
Last modified: 29 May 2015 | 9:37:10 UTC

Looks like NV added 128cores more than I thought they would, possibly suggesting better yields or architecture (no 3.5/4GB issues here, as promised).
While the 980Ti has 37.5% more Cuda Cores than the 980 the base clock (as reported) is relatively low (1000MHz vs 1127MHz) with Boost being reported as 1076 (980's being 1215). However reports also suggest the 980Ti will stick a 1200MHz base clock. On one of my 970's my base clock is 1111MHz, boost is rated at 1238 but actual boost sensors report 1325, and I've had it over 1340 without issue. So it's likely that the boost will be around the same if not better than the GTX980, especially if it has 250W TDP to call on.
That said, the Titan X didn't appear to scale well, performance varied greatly by task type and we know that the 4GB GTX900's can run 2 tasks per card on recent Windows operating systems to obtain greater overall throughput. Haven't seen anyone running 2 or 3 WU's on a Titan X, but I would expect it might scale better in terms of overall throughput (credits) if they did.
As with all the Maxwell GPU's it's likely that you will be able to preferentially tune the GPU; for maximum throughput/performance (credit) or performance per Watt.
Running 1 WU I don't expect the 980Ti to be 37.5% faster than a GTX980, and that's despite having a 384bit bus (the 980's is 256bits), but it might still be one of the choicest top of the range card for here, if you are prepared to run multiple WU's or wait on a possible app update, which might be available with the CUDA 7.5 tool-kit (if there is anything in it for Matt)?
However I would expect it to be at least 90% as fast as a GTX Titan X for here, and possibly more than the core count would suggest (91.67%) and despite the clocks.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 41209 - Posted: 29 May 2015 | 20:11:36 UTC
Last modified: 29 May 2015 | 20:48:58 UTC

http://www.techpowerup.com/212981/asus-gigabyte-and-msi-geforce-gtx-980-ti-reference-graphics-cards-pictured.html#comments

as say skgiven more CUDA cores but lower clocks .. classic from nVidia

just for you I will give one comparison and the story: I bought in early October 2014 this card
http://www.newegg.com/Product/Product.aspx?Item=N82E16814500360
http://www.zotac.com/products/graphics-cards/geforce-900-series/product/geforce-900-series/detail/geforce-gtx-980-amp-extreme-edition.html

in my opinion the best nvidia GFC 980 based, while published as custom design pcb and cooling..

Why? this card is stepping smoothly on the project GPUGrid on 1460-70 mhz on the core .At higher fell work... Results of speed was always on the second place third place behind Zoltan 780Ti and other 780Ti graphics cards on the Linux.. I had it as Main card to win 8.1 ..... sooo
And all this to 60-65 degrees!!!
because nvidia TDP limited to 110, and not as a reference for the 120
As I was reading here Titan X is a big fiasco. Regarding speed on the GPUGrid tasks ..
It only confirms my experiences . -)))

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,118,845,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 41210 - Posted: 29 May 2015 | 20:34:55 UTC - in response to Message 41209.
Last modified: 29 May 2015 | 21:29:05 UTC

The second part: There is one very interesting and informative web page about this my card
http://www.gamersnexus.net/hwreviews/1666-zotac-gtx-980-extreme-benchmark-review-overclocking

The question is if those GF cards are NO limited 110 versus 120 watts TDP and should I then maybe get on the 85-95 degrees performance 1600MHz with no problems...
  But I think that there is some kind of engineering problem...
It no longer matters card I sold a few months ago. The man who owns a two or three .. yes we live in high society and what or..?

the whole story what im trying to explain is that titan X or 980 Ti for me are unsurprisingly but awaiting nVidia products..

eXa power and others There are only Internet theorists and never owned a a graphics card. All info assessed according to Google ..

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41219 - Posted: 31 May 2015 | 23:20:47 UTC
Last modified: 1 Jun 2015 | 0:18:43 UTC

http://anandtech.com/show/9306/the-nvidia-geforce-gtx-980-ti-review

http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-980-ti-review,1.html

http://www.tomshardware.com/reviews/nvidia-geforce-gtx-980-ti,4164-8.html

980ti MSRP is 649usd. 980 MSRP 499$. Custom PCB (and reference PCB) Hybrid models for both Titan X and GTX980ti will manage GM200 temps (35-60C) during the summer computing ACEMD 24/7 unlike typical +80C core reference blower model.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41222 - Posted: 1 Jun 2015 | 8:12:46 UTC - in response to Message 41219.

http://anandtech.com/show/9306/the-nvidia-geforce-gtx-980-ti-review
http://www.guru3d.com/articles-pages/nvidia-geforce-gtx-980-ti-review,1.html
http://www.tomshardware.com/reviews/nvidia-geforce-gtx-980-ti,4164-8.html

Profile Francois Normandin
Send message
Joined: 8 Mar 11
Posts: 71
Credit: 654,432,613
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 41225 - Posted: 1 Jun 2015 | 12:26:40 UTC

Over 1 million credit/day seem realistic.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41228 - Posted: 1 Jun 2015 | 13:00:31 UTC - in response to Message 41225.
Last modified: 1 Jun 2015 | 13:01:45 UTC

Estimated current Long run only Maxwell RAC per day (crunching 24/7) including OS variability factors:

1.1-1.4mil for [1] Titan X
1.05-1.35mil [1] 980ti
700-850k [1] 980
550-700k [1] 970
325-450k [1] 960
200-300k [1] 750ti
150-250k [1] 750

GM200/GM204/GM206 Price per core (current MSRP):

-Titan X: 32.5cents (999$)
-GTX980ti: 23 (649$)
-GTX980: 24.3 (499$)
-GTX970: 19.7 (329$) or 18.5 if MSRP is 309$
-GTX960: 19.4 (199$)
-GTX750ti :23.2 (149$)
-GTX750: 25.1 (129$)

GTX980ti offers Maxwell's best price/RAC ratio computing ACEMD. 980 and 970 also provides an excellent price/RAC. Possibly an App (update) improvement will help GM200/204 scaling to further increase RAC.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41262 - Posted: 6 Jun 2015 | 1:08:09 UTC

So anyone have one of the 980ti's on here yet? Was just able to order one today, should have it running by sunday.

But I'm curious how close they are to the titan x here.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41263 - Posted: 6 Jun 2015 | 6:56:56 UTC - in response to Message 41262.
Last modified: 6 Jun 2015 | 7:00:31 UTC

Should be >90% as fast as a GTX Titan X.
Cuda core count suggests it will be 91.67% but the bus is the same width (384bit) giving the 980Ti a 9% wider bus to cuda core ratio. In theory that should nudge performance up a bit, probably over 93%. Having 6GB less GDDR5 might also free up a little bit of extra power for the GPU to call on. Bin variation could come into play (boosts) and some models might prove better than others.

The Titan X's performance running the GPUGrid's app was all over the place; it didn't scale well/properly for tasks. Some runtimes were reasonable but a bit less than would be expected given the cuda core count, other performances were poor and there was great variation in performance relative to the GM204 cards (not that we've had many results to go on) - suggesting the app didn't utilize the GM200 GPU well.
So, it would better to compare the 980Ti against the well established performances of the GTX980.

Compared to the GTX980 and going by cuda core count the 980Ti performance should be 137.5%, but GM200 vs GM204 isn't a direct comparison. The GM200's have a greater ROP ratio (small -ve for here I think) while the bus to shader ratio is slightly favourable. The big question is if the app/tasks will scale well, as is.

At present the challenge will be to get the best out of the 980Ti in terms of credit/day. To do that you may have to run 2 or possibly 3 tasks simultaneously and not use much CPU for anything else. The bigger the GPU the more of an impact other factors have on performance.

Look forward to seeing your runtimes.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

[CSF] Thomas H.V. DUPONT
Send message
Joined: 20 Jul 14
Posts: 732
Credit: 100,630,366
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 41272 - Posted: 7 Jun 2015 | 13:01:57 UTC - in response to Message 41263.

Look forward to seeing your runtimes.

Yes, absolutely.
Thanks in advance to 5pot for sharing ;)
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41275 - Posted: 7 Jun 2015 | 20:37:39 UTC - in response to Message 41272.
Last modified: 7 Jun 2015 | 20:38:09 UTC

Looks like it didn't turn up...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41276 - Posted: 8 Jun 2015 | 0:05:25 UTC

It did not. I should have read that next day air ships "next business day". Which could theoretically mean it will arrive on Monday or Tuesday. Either way, I'll have it up and running once it arrives. So at the absolute latest, expect results to be turning in by Wed.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41280 - Posted: 10 Jun 2015 | 2:03:47 UTC

The card is up and running, with a boost clock (from factory) @ 1315MHz. Not bad at all, considering the boost clock listed is 1190. Results will start reporting over-night, and temps are sitting at 60C w/ a 51% fan setting @ 1187mV.

This is also with a 780 in slot2 (not running yet). Further, after looking at how fast it's currently crunching, puts a Gerard_FXCXCL in 25k seconds. It's moving at about .004/s.

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41283 - Posted: 10 Jun 2015 | 16:06:28 UTC

My EVGA 980Ti SC is up and running - initial impressions are quick and quite cool in my setup; time & returned results will tell more.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41284 - Posted: 10 Jun 2015 | 16:56:43 UTC - in response to Message 41280.

My EVGA 980Ti SC is up and running - initial impressions are quick and quite cool in my setup; time & returned results will tell more.

I would like to thank all GPUGRID crunchers - who are healing the world one molecule at a time. I appreciate the incredible dedication of resources - offered freely and willingly to (BOINC) noble GPUGRID project. Has the GPUGRID Project ever had an inside cruncher appreciation event in it's history? Or any type of cruncher event? Team events occur outside of the project. What type of event can we have to celebrate the project's crunchers? The performance tab could offer new category types for such an event.

The card is up and running, with a boost clock (from factory) @ 1315MHz. Not bad at all, considering the boost clock listed is 1190. Results will start reporting over-night, and temps are sitting at 60C w/ a 51% fan setting @ 1187mV.

This is also with a 780 in slot2 (not running yet). Further, after looking at how fast it's currently crunching, puts a Gerard_FXCXCL in 25k seconds. It's moving at about .004/s.

Thank you for reporting. GM200 at 60C pretty good on (air)? 25K GERALD Runtimes are similar to RZ fastest 980. If time permits: Could you comment on the task's core/MCU/BUS/power usage?

Current GM200 comparisons: NOELIA_ETQ_unboun/NOELIA_ETQ_boun performance tab [8] Titan X builds -- 6 are 40 lane PCIe CPU's [5] x99 chipset -- [1] Dual socket Haswell Xeon -- [1] Z87/97 35W T series -- [1] Z87/97 K series. The LC system(s) temperature computing ACEMD: low 30's to mid 40's. GM200's ACEMD temps vary 32C-85C (Liquid vs. reference air).

ACEMD on GM204 is stable at 1.5GHz. GM200 owner forums report >1400MHz(folding@home) and +1.5GHz for 3dmarks firestrike.

GM200 optimal capability yet to be fully realized. (Once the projects scientists GERALD and NOElIA tasks are finalized - maybe an app update appears.) GM200 (WU utilization) is less than optimal. Current WDDM Titan X GERALD runtimes are as fast Petebe's XP 970. Linux and XP is clearly faster by >10% than WDDM. 2 tasks at a time on WDDM OS's = higher utilization. Total tasks crunched per day also rise by 10%* See ETA's "multiple WU" thread at number crunching.

GM200 upgrade path is another year or two away. (PCI4.0/NVlink/HBM2 are a part of Pascal (Volta) technology advancement.)

GM200 thermal energy (BTU output):
1 GM200 GPU (300W) at 24552 BTU/day > 1023 BTU/hr > 17.05 BTU/min > 0.2843 BTU/sec
--- 3/4 GPU's (900-1200W) set-up 75-100K BTU/day ---

Powerful dispersion for GM200's massive die is essential. Efficient disperse control assists in keeping thermal energy from destroying the PCB. +300W PCB's are aided by the advantages of liquid thermodynamics or a very powerful air cooled heatsink. Gigabyte 980ti cooler is rated for 600W. GM200's 6/12GB GDDR5 Controller accounts for 50+W.

An example of adaptive cooling: EVGA's hybrid GM200/204 120mm rad (Pump mounted near PCB with GPU only LC - fan and a metal plate cool VRM/mem circuitry). I've sold (bartered) off spare parts (m2SSD/RAM/HDD) for savings towards the 980ti. Nearly garnered 5/8 of the 980ti MSRP. Having the older pieces paying for new purchases - certainly helps reduce current out of pocket expenses. I'm not sure if I want Galax's HoF or the 8+6pin EVGA Hydro copper or [2]8pin Zotac block.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41285 - Posted: 10 Jun 2015 | 20:38:13 UTC - in response to Message 41284.
Last modified: 10 Jun 2015 | 20:45:29 UTC

Linux and XP is clearly faster by >10% than WDDM.

This is likely to be higher with bigger cards.

2 tasks at a time on WDDM OS's = higher utilization. Total tasks crunched per day also rise by 10%* See ETA's "multiple WU" thread at number crunching.

Likely to be higher with the GM200's, so long as you free up sufficient CPU resources. I would try 2 tasks at a time and then 3. If 3 better, then 4 (no CPU tasks).

Would definitely use SWAN_SYNC=1 (and restart).

Going by Localizer's returns -NOELIA_ETQunboundx1 tasks look like they are only 2 or 3% faster on a GTX980Ti than a GTX980, but SWAN_SYNC isn't on and who knows how much the CPU is being pushed?

On my GTX970's these NOELIA_ETQunboundx1 tasks are struggling to push the cards hard enough to keep them interested (GPU0: power 20%, usage 83%, 41C, 405MHz [downclocked], GPU1 power 76%, usage 67%, 64C, 1266MHz).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41286 - Posted: 10 Jun 2015 | 20:42:52 UTC

Huh, I hadn't even realized swan_sync had to be started again. I remember it wasn't needed for awhile. I'll change that when I get back home.

Gerald utilization was around 75%, but the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds. Lol

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41287 - Posted: 10 Jun 2015 | 22:06:10 UTC - in response to Message 41286.
Last modified: 10 Jun 2015 | 22:10:01 UTC

My guess is that if someone used a Titan X or 980Ti on XP or Linux it would perform much better and on Vista-W10 if SWAN_SYNC=1 was used, no CPU tasks were running and 2 or more tasks were run at the same time the overall performance would be much better. I would even suggest that running 2 tasks on Linux/XP might be beneficial.

On W7 I had to alter the L1-P5 performance using NVidia inspector just to get my clocks normal. You really need to watch the performance details carefully - these tasks are prone to downclocking (but that might make them better for running 2 tasks simultaneously).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41288 - Posted: 11 Jun 2015 | 0:06:05 UTC - in response to Message 41285.
Last modified: 11 Jun 2015 | 0:26:04 UTC

Going by Localizer's returns -NOELIA_ETQunboundx1 tasks look like they are only 2 or 3% faster on a GTX980Ti than a GTX980, but SWAN_SYNC isn't on and who knows how much the CPU is being pushed?

NOELIA_ETQunboundx (no SWAN_SYNC) GK107's show a 4% utilization drop (88/92%) with 2 CPU task vs. none. GM200/GM204 will see at least >4% if CPU has other WU's with or without SWAN. On GERALD's: a Quad CPU + GM204 will lose <5% GPU utilization with two CPU tasks. With 1 task = 3% while having 4 CPU WU causes a lose of 7%. (Haven't tested utility lose on current NOEILA with GM204/107.) I assume - with(out) SWAN - no matter how many PCIe3.0 lanes are active: any GPU's core/MCU/BUS ACEMD usage will still drop when the CPU has other tasks computing.

On my GTX970's these NOELIA_ETQunboundx1 tasks are struggling to push the cards hard enough to keep them interested (GPU0: power 20%, usage 83%, 41C, 405MHz [downclocked], GPU1 power 76%, usage 67%, 64C, 1266MHz).

This batch looks to be working the GPU's cache harder. Runtime theory: The 1/1 Titan X 24SMM vs. 22SMM 1/1. with 3072MB L2 cache affect runtimes ever so slightly. Will the 980ti truly be faster than the Titan X at ACEMD? A stalled cache can render Maxwell overclocks null and void.

Currently: (10k/30K) runtime - My 750's 4/1 (4SMM/2048MB L2 cache) NOELIA_ETQunboundx output is 33% of the 980ti. . The 980ti is a (512/2816CUDA) larger GPU by 5.5x. 750 core amount is 18.1% the GTX980ti. From the look of it: the 980ti (10K) are twice GK104's runtimes at 16-28K. GTX980ti ETQunboundx runtimes are 8 to 9x better than my GK107 at (77K).

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41289 - Posted: 11 Jun 2015 | 4:29:14 UTC

By the way, I thought it was SWAN_SYNC=0? Or did that change to 1?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41290 - Posted: 11 Jun 2015 | 7:53:23 UTC - in response to Message 41289.
Last modified: 11 Jun 2015 | 8:39:52 UTC

It's no longer 'necessary' for a good performance, but in my experience it still speeds things up a few %. The 334 driver brought things back to normal re controlling the CUDA runtime to use a low-CPU mode without the performance dropping away (prior to that it was needed to get better performance and made more of a difference). On Windows it might just need to exist now, but on Linux I think it needs to be 1 (both used to be 0).
Basically SWAN_SYNC polls the CPU continuously; keeping processes fresh. When things didn't work properly process lasso and changing priority were also/alternatively sometimes used to expedite the tasks and with some success, but all this totally depends on CPU usage; completely saturate the CPU and you're just not going to get good results. Also varies by CPU type/model.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41298 - Posted: 11 Jun 2015 | 15:45:12 UTC

Enabled swan_sync, took another task off WCG, got 7.26 on Gerard. Over clock is a tad over 1400MHz now with a .12mV voltage increase.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41300 - Posted: 11 Jun 2015 | 19:07:33 UTC - in response to Message 41298.
Last modified: 11 Jun 2015 | 19:10:08 UTC

That's around 9.5% of an improvement over your previous setup and not far off the performance of a Titan X (on the chart; 150% vs 156%), not that I know the setup that was used there (optimizations, CPU usage).

You said the Gerald utilization was around 75%. What was the NOELIA utilization?
What's the memory run at by default? (3505MHz or 3005MHz)?

You should try to run 2 tasks at a time on the GTX980Ti to get some idea of what the gain is. I'm trying this now on two GTX970's and it looks promising, but these GM's are high maintenance; I have to use NVidia inspector to keep the clocks high.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41301 - Posted: 11 Jun 2015 | 19:19:02 UTC

With swan sync on, Gerald was actually 81%. Memory default I'll have to look at when I get back home, and I don't remember noelia usage.

I'm going to Japan for 2 weeks on monday, so I probably won't be attempting to run 2 tasks til I get back.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41302 - Posted: 11 Jun 2015 | 23:42:55 UTC - in response to Message 41300.
Last modified: 11 Jun 2015 | 23:51:17 UTC

You should try to run 2 tasks at a time on the GTX980Ti to get some idea of what the gain is. I'm trying this now on two GTX970's and it looks promising, but these GM's are high maintenance; I have to use NVidia inspector to keep the clocks high.

A look at hostid=137780 (win8.1) 970 computing two at a time reveals ~8WU per 24hr NOELIA_ETQunboundx.
(2 tasks finish in 22k/sec) The 970's 2WUat gain up to 40% ETQunboundx daily improvement vs. 1WUat 970's.
A Win7 GTX980ti daily ETQunboundx = 10.1WU (1WU per 2.36hr) >20% higher than (2WUat) 970.
One ETQ task at a time GPU's: RZ's 980 output 9WU per day. (My 17SMM completes 9 per day or 1WU per 2.66Hr).
980ti supplies 27.3% more cores than a 980 - the 980ti present daily ETQ performance 10% higher than the fastest XP 980 and >15% WDDM 980's. (10-15% figures concern 1 WU at a time.)

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41306 - Posted: 12 Jun 2015 | 6:24:02 UTC

Just to get back to you, the memory is at 3505MHz.

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41307 - Posted: 12 Jun 2015 | 7:24:30 UTC
Last modified: 12 Jun 2015 | 7:36:05 UTC

Had a couple of days with the 980Ti now. In my setup it runs comfortably at 1400, any higher and I lose a few WUs.

I have enabled swan_sync and Noelia WUs come in at about 8.5K seconds, just finishing up my first Gerard and that looks to run sub 7 hours if the last few % don't throw any surprises.

Thanks for the tip about swan_sync - am now using that on all my hosts. With swan_sync on I am getting 75% usage on Noelia WUs & 79-80% usage on Gerard WUs.

Can anyone post a link to the 'How-to' thread on Multiple WUs on a single GPU & I'll give that a go....

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41308 - Posted: 12 Jun 2015 | 8:41:21 UTC - in response to Message 41307.
Last modified: 12 Jun 2015 | 9:09:30 UTC

the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds.

and now your returning in <9K:

Run time 8,985.24
CPU time 8,834.44
https://www.gpugrid.net/result.php?resultid=14259463

I ran 2 NOELIA tasks at a time on my GTX970's to see what I could do. These are from the faster card:

e3s32_e1s258f65-NOELIA_ETQunboundx2-1-2-RND7802_0 10992237 21,718.60 20,296.04 75,000.00
e5s76_e1s526f70-NOELIA_ETQunboundx2-0-2-RND6354_0 10992080 21,638.48 20,227.95 75,000.00

My fastest NOELIA by itself:
e1s451_1-NOELIA_ETQunboundx1-0-2-RND8377_0 10984428 13,889.76 13,889.76 75,000.00

Suggests a 28% overall improvement on a little GTX970.

Localizer,
75% usage on Noelia WUs & 79-80% usage on Gerard WUs.

Plenty of room there for 2 tasks :)

How To create and use an app_config.xml file in Windows
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41310 - Posted: 12 Jun 2015 | 16:20:29 UTC

@localizer. I'm surprised you're losing WUs above 1400. I'm sitting at over 1450 now, and still think I may be able to push higher.

Do you have a reference card, or one with better cooling? So far, I'm rather impressed with the clocks these can achieve on a minimal to no voltage increase.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41313 - Posted: 12 Jun 2015 | 16:33:18 UTC - in response to Message 41310.

5pot, your GPU is under 60C whereas localizer's is reaching 71C.

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41318 - Posted: 12 Jun 2015 | 23:01:02 UTC
Last modified: 12 Jun 2015 | 23:02:46 UTC

Well, I tried to hit 1500, and walked away for the night.

Whoops. Lol. Won't be able to stop it til tomorrow. I do think with an overvoltage I can make it stable. I had it working fine at 1460 without an overvolt.

C'est la vie

E: unless there's a way I can make it stop getting tasks through the website I suppose.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41320 - Posted: 13 Jun 2015 | 6:21:47 UTC - in response to Message 41318.
Last modified: 13 Jun 2015 | 6:26:46 UTC

E: unless there's a way I can make it stop getting tasks through the website I suppose.


You should be able to do that by editing your preferences and deselecting,
    Use NVIDIA GPU Enforced by version 6.10+
    the ACEMD GPU apps
    [If no work for selected applications is available, accept work from other applications?]
    Use Graphics Processing Unit (GPU) if available.


https://www.gpugrid.net/prefs.php?subset=project
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41321 - Posted: 13 Jun 2015 | 7:06:17 UTC - in response to Message 41310.

........... my 980Ti is in a SFF case and is a reference card.

I'm confident I can stabilise it over 1400 - but have not had time to fine tune yet.

[CSF] Thomas H.V. DUPONT
Send message
Joined: 20 Jul 14
Posts: 732
Credit: 100,630,366
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 41322 - Posted: 13 Jun 2015 | 7:13:41 UTC - in response to Message 41308.

the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds.

and now your returning in <9K:

Run time 8,985.24
CPU time 8,834.44
https://www.gpugrid.net/result.php?resultid=14259463


https://twitter.com/TEAM_CSF/status/609620024215633920
____________
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41358 - Posted: 15 Jun 2015 | 12:51:28 UTC - in response to Message 41320.

E: unless there's a way I can make it stop getting tasks through the website I suppose.


You should be able to do that by editing your preferences and deselecting,
    Use NVIDIA GPU Enforced by version 6.10+
    the ACEMD GPU apps
    [If no work for selected applications is available, accept work from other applications?]
    Use Graphics Processing Unit (GPU) if available.


https://www.gpugrid.net/prefs.php?subset=project



Thank you for reminding me of this, changed it when you posted. I've got it clocked down to 1375 for the 2 weeks I'll be away. Want to make sure I don't lose that much crunching time for an OC.

@Localizer, these cards can do really well. Good luck with the fine tuning of your card.

@Dupont, thanks for the shout out. :)

When I get back, I'll try and hit 1500. But I'm beginning to think I'll need to overvolt a little more than I would like to run it permentantly at. But it's always fun to see where the max is.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41395 - Posted: 24 Jun 2015 | 16:17:43 UTC
Last modified: 24 Jun 2015 | 16:20:21 UTC

Running 1 WU I don't expect the 980Ti to be 37.5% faster than a GTX980

NOELIA_46 (467x) update:
-single 980ti WU (2.903ms) per step performances (+18.3 to 30%) capable a (3.553ms) GTX980.
-980ti per step efficiency (+30 to 50%) the routine 970 (4.168 ms).
-GTX750 10.878ms per step (-73.3%) a 980ti.
-Surprisingly: GM107 have nearly the same NOELIA_46 runtimes as cut GK104's.

So, it would better to compare the 980Ti against the well established performances of the GTX980.

Compared to the GTX980 and going by cuda core count the 980Ti performance should be 137.5%, but GM200 vs GM204 isn't a direct comparison. The GM200's have a greater ROP ratio (small -ve for here I think) while the bus to shader ratio is slightly favorable. The big question is if the app/tasks will scale well, as is.

56k/atom NOELIA_46 WU realizing GM200's (and GM107/206/204) superior ACEMD performance. Noelia per WU utilization tends to be higher than GERALD 32K/atom design. GM200's single WUat throughout can steadily be +30% the fastest GM204's at ACEMD. When configured in peak compute conditions - a 50% increase over GM204 is viable.

If the many ACEMD environment factors that can increase (Maxwell) WU runtime are active - a 50 to 70% output lose incurs compared to 2WUat. (I've purposely found the negative limit for my 970 at 1.5GHz. Next up: Achieving top performance. ACEMD is sensitive to slowdown factors.)

skgiven recently outlined some ACEMD performance factors:

That is a bit faster, albeit only ~2.5%. This is because the latest apps are quite good at utilizing the GPU without much CPU reliance and you have 4 real cores with 4 true threads. On CPU's with HT the difference tends to be higher.
Using SWAN_SYNC=1 might squeeze out a bit more GPU performance, but again only a few percent.
Increasing the GPU's GDDR5 up to 3500MHz (on the GM204's it typically drops to 3005MHz) also helps but only by 0.5% to 3% IIRC. However if you are running more than one task at a time on your GPU it's likely to be more important as the MCU tends to be higher (varies by task type).
You might also be able to OC slightly.
These improvements multiply and together can make a significant overall improvement:
Using 3/4 CPU's = 102.5%
Using SWAN_SYNC = 102.5%
3500MHz GDDR5 = 101%
Running 2 tasks = 128% (varies by task type)
GPU OC of =3% = 103%
Overall = 1.025*1.025*1.01*1.28*1.03*100% = 139.899% (~40% more work).

The climate models tax the CPU more heavily than other CPU tasks, but with 4cores/4threads it's not a big problem. On an i7 (4cores/8threads) it is a problem as each tasks competes for the same resources (cores). There is almost zero difference in running 7 climate models rather than 8 on an i7.

For efficiency sake: PCIe x8 required for any Maxwell - big or small. Single GPU or multi - ACEMD is influenced and very sensitive to serial links and <3GHz CPU's. An under clocked CPU and MB whose PCIe lanes are saturated - reckons runtimes. Demanding BUS tasks on x4 are driven away from optional performance. ACEMD is one of the BOINC tasks affected 20% on x4 compared to x8/16. Primegrid's N17 Genefer is another. PCIe NVMe/SATAe SSD can hamper Multi GPU's compute runtimes. (z97 chipset) BOINC USB stick and HDD storage factors show zero inference with the GPU unlike a PCIe SSD. Skylake CPU's will improve upon transfers.

FYI: Maxwell BIOS tweaker program interprets clocks involved with the GPU. For example: the GM107 L2 cache clock -26-39MHz the GPC. GM204 or GM200 stock vBIOS sometimes show a 150-200MHz difference between the four boost/clock states XBAR/SYS/GPC/L2C.

AMD's hybrid cooled flagship (Fury X) is available today - will GM200 MSRP drop in a couple of weeks?
http://wccftech.com/amd-radeon-r9-fury-launch-reviews-roundup/

Toms hardware:
Whereas GM200 measures 601mm², Fiji is almost as large at 596mm². AMD crams a claimed 8.9 billion transistors into that space, and then mounts the chip on a 1011mm² silicon interposer, flanking it with four stacks of High Bandwidth Memory.

HBM achieves its big bandwidth numbers by stacking DRAM vertically. Each die has a pair of 128-bit channels, so four create an aggregate 1024-bit path. This first generation of HBM runs at a relatively conservative 500MHz and transfers two bits per clock. GDDR5, in comparison, is currently up to 1750MHz at four bits per clock (call it quad-pumped, to borrow a term from the old Pentium 4 front-side bus days). That’s the difference between 1 Gb/s and 7 Gb/s. Ouch. But factor in the bus width and you have 128 GB/s per stack of HBM versus 28 GB/s from a 32-bit GDDR5 package. A card like GeForce GTX 980 Ti employs six 64-bit memory controllers. Multiply that all out and you get its 336 GB/s specification. Meanwhile, Radeon R9 Fury X employs four stacks of HBM, which is where we come up with 512 GB/s.

The water-cooling rule of thumb comes to mind right away: use one centimeter of radiator length per 10W of power. Almost 90 °C at the motherboard slot indicates that the VRM pins have passed 100 °C.


Thermal imagery:
http://www.tomshardware.com/reviews/amd-radeon-r9-fury-x,4196-8.html

SP/DP ratio is 1/16 - similar as Tonga GCN1.2 cores rather than GCN1.1 gaming Hawaii 1/8 (Hawaii Firepro series is 1/2 SP/DP). If a professional Fiji arch is released: FP64 performance might be 1/2 or 1/4 ratio or 1/8 or stay at 1/16.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41472 - Posted: 4 Jul 2015 | 4:30:44 UTC

http://www.guru3d.com/articles_pages/zotac_geforce_gtx_980_ti_amp_extreme_review,8.html

GM200 (MSI/ASUS/EVGA/ZOTAC/GALAX/INNo/Giga) custom PCB released as a couple were reviewed. Overclocked Ti's surpass reference Titan X and Fury X performance. Custom PCB GM200 have 50$ or more premium asking price over ref.

The 10 phase (8+2) - GM200 3 slot heavy set AMP extreme Zotac (320W OP) 'so far' has the highest out of box (1203 TDP base/1355boost) clocks. Rated higher than a 18 phase HoF. The not yet released 17 phase (14+3) Kingpin edition is rumored to be (52MHz) 4 bins better at stock TDP core/boost. 14+3 Classified EVGA's are rated 1 bin less at stock TDP base clock than Zotac's. Gigabyte G1 has shown to be an overclocking beast with it's 600W heatsink and 10 (8+2) phase custom PCB. The ref 6+2 phase PCB limits WC GM200 if lucky 1500 24/7 before VRM start breaking down. GM200 LC or air cooled Custom PCB's >1600MHz is the forefront of 24/7 overclocks. (At these speeds: overclock's will either fail quickly or last to pass depending on the code. Precision testing mandatory for error free long-term compute) There is a 100MHz difference for my 970 between three BOINC projects: 1.5GHz for ACEMD to 1.6GHz for Primegrid's CUDA PPSsieve. POEM's OpenCL is in the middle.

Zotac 980ti EX seems not to include (3) 980/970 AMP 13 phase (8+3+2) omega/extreme features: the L2N BIOS switch - manual voltage read out points - Texas Instruments MSP430 USB microcontroller that can be custom programmed.

EVGA hybrid/Inno black/ZOTAC articstorm with a (full water block and 3 fans) are the standard 6+2 phase ref PCB with VRM heatsink. Buying a ref 980ti and EK water block costs a touch more than readymade GM200 LC.

All these choices are making for decision zap. My newly purchased Z97 MSI MPOWER MB with 3 x16PCIe3.0 slots is waiting for an upgrade. The furthest PCIe slot from CPU true 8lane circuitry - not a 4 lane. (underside of MB shows 8 as does the power slot pins in a 16pin physical slot.) So with 3 GPU's it should be 8x/8x/8x MB. The ti on x8 slot should be enough.

A 980ti/970/750 MB around 1.7mil RAC/day at 500W (GPU). If I replace the 750 with a 980 - RAC 2.2mil/day at 650W GPU and <750W system total. Is a platinum 1000W PSU with a 85amp 12V single rail enough to drive a 980ti/980/970 24/7/365 who are all 1.5GHz? Or 1200W/>100amp rail PSU?

Consumption/RAC ratio for all Maxwell's are much better than Kelper's. My 970 is currently at 125W for NOELIA_ETQ and 140W for GERALD with <73% core usage. The 970 pulled 175W on ~83% core usage NOELIA_S4 during April. 212W is highest I've seen (AIDA64 Integer) benchmark and SiSoftware. For DP code: 52 total DP cores = 130W. An overclocked GM200/204/206 will operate 30% above "rated" TDP if a heatsink can handle the extra heat. GM200 overclocks can add an extra <50% performance vs. reference speeds dependent upon application MHz scaling.

A side note: GTX950(ti) GM206 is supposedly going to be released soon: 768 or 896 cores. (ROP/memory sub-system is unknown) If 950(ti) cache and ROPS are 960's size: a reasonably fast GPU. As 750(ti) before it - the 950(ti) a choice economical system GPU for ACEMD compute.

in my opinion the best nvidia GFC 980 based, while published as custom design pcb and cooling..

Indeed - although some were experiencing problematic voltage settings that limited overclocks. Zotac's (broken) firestorm program supposedly designed to work the USB MC. Multi GPU's on MB cause firestorm to stop working. (RIVA based overclocked programs also have an issue with GM204 AMP omega/extreme voltage sliders while NV inspector works fine.) I only recommend NV inspector for overclocks. NVinsp footprint is much smaller than MSI or EVGA.

Zotac created 4000 total omega/extreme: 1000 omega 970/1000 omega 980/1000 extreme 970/1000 extreme 980. Zotac's Customer service was informative and helpful towards me - providing detailed GPU dynamics. e.g. life expectancy algorithm absolute values - assuring ~1.6GHz for years to come. So far stable OC's held three months non-stop BOINC projects.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41486 - Posted: 6 Jul 2015 | 8:36:39 UTC - in response to Message 41472.

That was some good info. I personally always have a hard time swallowing how expensive sone of those boards get. Those they're for benching, but still.

I'm sorely trying to wait til Pascal before I do my next actual build.

I really should sell some parts though, I've got too many old ones just laying around. They get so inefficient here so quick.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41493 - Posted: 6 Jul 2015 | 15:28:36 UTC - in response to Message 41486.

Here's what one of the world known OCer's (Has a GPU in his name) says about Maxwell:

(OCN's Classified/Kingpin owner thread) Originally Posted by k|ngp|n:

Honestly speaking, I think most end users don't even realize how maxwell gpus are voltage capped at ambient type cooling. I can tell by many of the comments at OC.net, elsewhere, and also here in these card XOC bios threads. Especially compared to kepler. KP 780ti scaled great on voltage with air/water temps. Basically, more voltage = more clocks no matter what temperature.
With 980 and later gpus including titanX, the scaling on air/water has all but almost gone. I would say about 95% of all maxwell 980,titan-X, and 980ti gpus NO MATTER what vga brand pcb it is on, DO NOT SCALE with more voltage than 1.25v-1.275v at temps warmer than 25c or so. There is no magical bios that can effectively remove this.

This is exactly why almost every moderate-good asic titanX, 980, and yes 980ti clock around 1550mhz MAX AVERAGE at say 45-60C loading temps.
If you put 0c and colder on the card, you will see MUCH different behavior than what you see on air (green garbage all over screen when raising volts over 1.23-1.25vv or so)
Cards with very good ASIC value (75% and up) will tend to have the most "overclocking", but just like about every other maxwell gpu, they cannot overvolt past 1.23v-1.25v.
So highest asic cards like 80% +are almost always going to be the ones that can 1600+ on air/water, and again they do it pretty much WITHOUT overvolting over 1.23v-1.25v. Maxwell gpus with lower asic value like 65% will not be so great at air/water because these low asic gpus need voltage to scale compared to match the overclock of the high asic gpus( USING SAME USABLE VOLTAGE 1.23-1.25v)

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41576 - Posted: 28 Jul 2015 | 14:28:48 UTC

Posted 30th of March:

GERARD_FXCXCL12 update:

Zoltan's 980 reclaimed [25,606.92] fastest runtime and [1.462 ms] time per step.

RZ's 980 9% faster than DooKey's [1.616 ms.] Titan X and 11-19% better than GTX780ti [1.771 ms.]

Since GM200 March release - numerous GM200's attached to GPUGRID. Currently Petebe's GERARD_FXCXCL12 (newly released batches) 22k/sec 980ti runtime(s) 13.3% faster than XP's best 980's. The 980ti (1.289ms) per step 12.9% quicker (1.462ms).

XP OS light speed systems are leading the pack as always. (Maxwell upgrade) from Fermi and Kelper on XP or Linux garner significant performance benefits. 2WU at a time Maxwell GPU's WDDM system(s) almost make up the performance gap vs. XP/Linux one WU at a time.

Zoltan's test 980ti has the fastest GERALD_FXCXCL12 WU ever completed.
https://www.gpugrid.net/forum_thread.php?id=4100&nowrap=true#41570

(1.114ms. per step) 23% faster than (RZ) 980 XP system. The host that RZ tested the 980ti with also had a 970 completing GERALD's in 30.5k/sec. This is the fastest 970 GERALD single WUat ever recorded. Time per step was 1.745ms.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 851
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41580 - Posted: 28 Jul 2015 | 15:51:19 UTC - in response to Message 41576.

... The host that RZ tested the 980ti with also had a 970 completing GERALD's in 30.5k/sec. This is the fastest 970 GERALD single WUat ever recorded. Time per step was 1.745ms.

That GTX 970 is an EVGA GeForce GTX 970 SSC ACX 2.0+, and I haven't pushed it more than the factory overclock.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41586 - Posted: 28 Jul 2015 | 21:49:36 UTC
Last modified: 28 Jul 2015 | 22:19:48 UTC

https://xdevs.com/guide/maxwell_big_oc/
980ti 'Kingpin' has an impressive feature set compared to the reference design and other custom PCB's. Cost is 200$ or more than the 980ti MSRP.

Post to thread

Message boards : Graphics cards (GPUs) : Big Maxwell GM2*0

//