Big Maxwell GM2*0

Message boards : Graphics cards (GPUs) : Big Maxwell GM2*0
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41287 - Posted: 10 Jun 2015, 22:06:10 UTC - in response to Message 41286.  
Last modified: 10 Jun 2015, 22:10:01 UTC

My guess is that if someone used a Titan X or 980Ti on XP or Linux it would perform much better and on Vista-W10 if SWAN_SYNC=1 was used, no CPU tasks were running and 2 or more tasks were run at the same time the overall performance would be much better. I would even suggest that running 2 tasks on Linux/XP might be beneficial.

On W7 I had to alter the L1-P5 performance using NVidia inspector just to get my clocks normal. You really need to watch the performance details carefully - these tasks are prone to downclocking (but that might make them better for running 2 tasks simultaneously).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41288 - Posted: 11 Jun 2015, 0:06:05 UTC - in response to Message 41285.  
Last modified: 11 Jun 2015, 0:26:04 UTC

Going by Localizer's returns -NOELIA_ETQunboundx1 tasks look like they are only 2 or 3% faster on a GTX980Ti than a GTX980, but SWAN_SYNC isn't on and who knows how much the CPU is being pushed?

NOELIA_ETQunboundx (no SWAN_SYNC) GK107's show a 4% utilization drop (88/92%) with 2 CPU task vs. none. GM200/GM204 will see at least >4% if CPU has other WU's with or without SWAN. On GERALD's: a Quad CPU + GM204 will lose <5% GPU utilization with two CPU tasks. With 1 task = 3% while having 4 CPU WU causes a lose of 7%. (Haven't tested utility lose on current NOEILA with GM204/107.) I assume - with(out) SWAN - no matter how many PCIe3.0 lanes are active: any GPU's core/MCU/BUS ACEMD usage will still drop when the CPU has other tasks computing.

On my GTX970's these NOELIA_ETQunboundx1 tasks are struggling to push the cards hard enough to keep them interested (GPU0: power 20%, usage 83%, 41C, 405MHz [downclocked], GPU1 power 76%, usage 67%, 64C, 1266MHz).

This batch looks to be working the GPU's cache harder. Runtime theory: The 1/1 Titan X 24SMM vs. 22SMM 1/1. with 3072MB L2 cache affect runtimes ever so slightly. Will the 980ti truly be faster than the Titan X at ACEMD? A stalled cache can render Maxwell overclocks null and void.

Currently: (10k/30K) runtime - My 750's 4/1 (4SMM/2048MB L2 cache) NOELIA_ETQunboundx output is 33% of the 980ti. . The 980ti is a (512/2816CUDA) larger GPU by 5.5x. 750 core amount is 18.1% the GTX980ti. From the look of it: the 980ti (10K) are twice GK104's runtimes at 16-28K. GTX980ti ETQunboundx runtimes are 8 to 9x better than my GK107 at (77K).
ID: 41288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41289 - Posted: 11 Jun 2015, 4:29:14 UTC

By the way, I thought it was SWAN_SYNC=0? Or did that change to 1?
ID: 41289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41290 - Posted: 11 Jun 2015, 7:53:23 UTC - in response to Message 41289.  
Last modified: 11 Jun 2015, 8:39:52 UTC

It's no longer 'necessary' for a good performance, but in my experience it still speeds things up a few %. The 334 driver brought things back to normal re controlling the CUDA runtime to use a low-CPU mode without the performance dropping away (prior to that it was needed to get better performance and made more of a difference). On Windows it might just need to exist now, but on Linux I think it needs to be 1 (both used to be 0).
Basically SWAN_SYNC polls the CPU continuously; keeping processes fresh. When things didn't work properly process lasso and changing priority were also/alternatively sometimes used to expedite the tasks and with some success, but all this totally depends on CPU usage; completely saturate the CPU and you're just not going to get good results. Also varies by CPU type/model.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41298 - Posted: 11 Jun 2015, 15:45:12 UTC

Enabled swan_sync, took another task off WCG, got 7.26 on Gerard. Over clock is a tad over 1400MHz now with a .12mV voltage increase.
ID: 41298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41300 - Posted: 11 Jun 2015, 19:07:33 UTC - in response to Message 41298.  
Last modified: 11 Jun 2015, 19:10:08 UTC

That's around 9.5% of an improvement over your previous setup and not far off the performance of a Titan X (on the chart; 150% vs 156%), not that I know the setup that was used there (optimizations, CPU usage).

You said the Gerald utilization was around 75%. What was the NOELIA utilization?
What's the memory run at by default? (3505MHz or 3005MHz)?

You should try to run 2 tasks at a time on the GTX980Ti to get some idea of what the gain is. I'm trying this now on two GTX970's and it looks promising, but these GM's are high maintenance; I have to use NVidia inspector to keep the clocks high.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41301 - Posted: 11 Jun 2015, 19:19:02 UTC

With swan sync on, Gerald was actually 81%. Memory default I'll have to look at when I get back home, and I don't remember noelia usage.

I'm going to Japan for 2 weeks on monday, so I probably won't be attempting to run 2 tasks til I get back.
ID: 41301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41302 - Posted: 11 Jun 2015, 23:42:55 UTC - in response to Message 41300.  
Last modified: 11 Jun 2015, 23:51:17 UTC

You should try to run 2 tasks at a time on the GTX980Ti to get some idea of what the gain is. I'm trying this now on two GTX970's and it looks promising, but these GM's are high maintenance; I have to use NVidia inspector to keep the clocks high.

A look at hostid=137780 (win8.1) 970 computing two at a time reveals ~8WU per 24hr NOELIA_ETQunboundx.
(2 tasks finish in 22k/sec) The 970's 2WUat gain up to 40% ETQunboundx daily improvement vs. 1WUat 970's.
A Win7 GTX980ti daily ETQunboundx = 10.1WU (1WU per 2.36hr) >20% higher than (2WUat) 970.
One ETQ task at a time GPU's: RZ's 980 output 9WU per day. (My 17SMM completes 9 per day or 1WU per 2.66Hr).
980ti supplies 27.3% more cores than a 980 - the 980ti present daily ETQ performance 10% higher than the fastest XP 980 and >15% WDDM 980's. (10-15% figures concern 1 WU at a time.)
ID: 41302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41306 - Posted: 12 Jun 2015, 6:24:02 UTC

Just to get back to you, the memory is at 3505MHz.
ID: 41306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
localizer

Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41307 - Posted: 12 Jun 2015, 7:24:30 UTC
Last modified: 12 Jun 2015, 7:36:05 UTC

Had a couple of days with the 980Ti now. In my setup it runs comfortably at 1400, any higher and I lose a few WUs.

I have enabled swan_sync and Noelia WUs come in at about 8.5K seconds, just finishing up my first Gerard and that looks to run sub 7 hours if the last few % don't throw any surprises.

Thanks for the tip about swan_sync - am now using that on all my hosts. With swan_sync on I am getting 75% usage on Noelia WUs & 79-80% usage on Gerard WUs.

Can anyone post a link to the 'How-to' thread on Multiple WUs on a single GPU & I'll give that a go....
ID: 41307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41308 - Posted: 12 Jun 2015, 8:41:21 UTC - in response to Message 41307.  
Last modified: 12 Jun 2015, 9:09:30 UTC

the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds.

and now your returning in <9K:

Run time 8,985.24
CPU time 8,834.44
https://www.gpugrid.net/result.php?resultid=14259463

I ran 2 NOELIA tasks at a time on my GTX970's to see what I could do. These are from the faster card:

e3s32_e1s258f65-NOELIA_ETQunboundx2-1-2-RND7802_0 10992237 21,718.60 20,296.04 75,000.00
e5s76_e1s526f70-NOELIA_ETQunboundx2-0-2-RND6354_0 10992080 21,638.48 20,227.95 75,000.00

My fastest NOELIA by itself:
e1s451_1-NOELIA_ETQunboundx1-0-2-RND8377_0 10984428 13,889.76 13,889.76 75,000.00

Suggests a 28% overall improvement on a little GTX970.

Localizer,
75% usage on Noelia WUs & 79-80% usage on Gerard WUs.

Plenty of room there for 2 tasks :)

How To create and use an app_config.xml file in Windows
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41310 - Posted: 12 Jun 2015, 16:20:29 UTC

@localizer. I'm surprised you're losing WUs above 1400. I'm sitting at over 1450 now, and still think I may be able to push higher.

Do you have a reference card, or one with better cooling? So far, I'm rather impressed with the clocks these can achieve on a minimal to no voltage increase.
ID: 41310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41313 - Posted: 12 Jun 2015, 16:33:18 UTC - in response to Message 41310.  

5pot, your GPU is under 60C whereas localizer's is reaching 71C.

FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41318 - Posted: 12 Jun 2015, 23:01:02 UTC
Last modified: 12 Jun 2015, 23:02:46 UTC

Well, I tried to hit 1500, and walked away for the night.

Whoops. Lol. Won't be able to stop it til tomorrow. I do think with an overvoltage I can make it stable. I had it working fine at 1460 without an overvolt.

C'est la vie

E: unless there's a way I can make it stop getting tasks through the website I suppose.
ID: 41318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41320 - Posted: 13 Jun 2015, 6:21:47 UTC - in response to Message 41318.  
Last modified: 13 Jun 2015, 6:26:46 UTC

E: unless there's a way I can make it stop getting tasks through the website I suppose.


You should be able to do that by editing your preferences and deselecting,
    Use NVIDIA GPU Enforced by version 6.10+
    the ACEMD GPU apps
    [If no work for selected applications is available, accept work from other applications?]
    Use Graphics Processing Unit (GPU) if available.


https://www.gpugrid.net/prefs.php?subset=project


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41320 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
localizer

Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41321 - Posted: 13 Jun 2015, 7:06:17 UTC - in response to Message 41310.  

........... my 980Ti is in a SFF case and is a reference card.

I'm confident I can stabilise it over 1400 - but have not had time to fine tune yet.
ID: 41321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Thomas H.V. DUPONT

Send message
Joined: 20 Jul 14
Posts: 732
Credit: 130,089,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 41322 - Posted: 13 Jun 2015, 7:13:41 UTC - in response to Message 41308.  

the interesting tidbit for me was that the 980ti is doing noelia in around 10k seconds, with my older 680 rig doing them in 18k seconds.

and now your returning in <9K:

Run time 8,985.24
CPU time 8,834.44
https://www.gpugrid.net/result.php?resultid=14259463


https://twitter.com/TEAM_CSF/status/609620024215633920
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres
ID: 41322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41358 - Posted: 15 Jun 2015, 12:51:28 UTC - in response to Message 41320.  

E: unless there's a way I can make it stop getting tasks through the website I suppose.


You should be able to do that by editing your preferences and deselecting,
    Use NVIDIA GPU Enforced by version 6.10+
    the ACEMD GPU apps
    [If no work for selected applications is available, accept work from other applications?]
    Use Graphics Processing Unit (GPU) if available.


https://www.gpugrid.net/prefs.php?subset=project



Thank you for reminding me of this, changed it when you posted. I've got it clocked down to 1375 for the 2 weeks I'll be away. Want to make sure I don't lose that much crunching time for an OC.

@Localizer, these cards can do really well. Good luck with the fine tuning of your card.

@Dupont, thanks for the shout out. :)

When I get back, I'll try and hit 1500. But I'm beginning to think I'll need to overvolt a little more than I would like to run it permentantly at. But it's always fun to see where the max is.
ID: 41358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41395 - Posted: 24 Jun 2015, 16:17:43 UTC
Last modified: 24 Jun 2015, 16:20:21 UTC

Running 1 WU I don't expect the 980Ti to be 37.5% faster than a GTX980

NOELIA_46 (467x) update:
-single 980ti WU (2.903ms) per step performances (+18.3 to 30%) capable a (3.553ms) GTX980.
-980ti per step efficiency (+30 to 50%) the routine 970 (4.168 ms).
-GTX750 10.878ms per step (-73.3%) a 980ti.
-Surprisingly: GM107 have nearly the same NOELIA_46 runtimes as cut GK104's.

So, it would better to compare the 980Ti against the well established performances of the GTX980.

Compared to the GTX980 and going by cuda core count the 980Ti performance should be 137.5%, but GM200 vs GM204 isn't a direct comparison. The GM200's have a greater ROP ratio (small -ve for here I think) while the bus to shader ratio is slightly favorable. The big question is if the app/tasks will scale well, as is.

56k/atom NOELIA_46 WU realizing GM200's (and GM107/206/204) superior ACEMD performance. Noelia per WU utilization tends to be higher than GERALD 32K/atom design. GM200's single WUat throughout can steadily be +30% the fastest GM204's at ACEMD. When configured in peak compute conditions - a 50% increase over GM204 is viable.

If the many ACEMD environment factors that can increase (Maxwell) WU runtime are active - a 50 to 70% output lose incurs compared to 2WUat. (I've purposely found the negative limit for my 970 at 1.5GHz. Next up: Achieving top performance. ACEMD is sensitive to slowdown factors.)

skgiven recently outlined some ACEMD performance factors:

That is a bit faster, albeit only ~2.5%. This is because the latest apps are quite good at utilizing the GPU without much CPU reliance and you have 4 real cores with 4 true threads. On CPU's with HT the difference tends to be higher.
Using SWAN_SYNC=1 might squeeze out a bit more GPU performance, but again only a few percent.
Increasing the GPU's GDDR5 up to 3500MHz (on the GM204's it typically drops to 3005MHz) also helps but only by 0.5% to 3% IIRC. However if you are running more than one task at a time on your GPU it's likely to be more important as the MCU tends to be higher (varies by task type).
You might also be able to OC slightly.
These improvements multiply and together can make a significant overall improvement:
Using 3/4 CPU's = 102.5%
Using SWAN_SYNC = 102.5%
3500MHz GDDR5 = 101%
Running 2 tasks = 128% (varies by task type)
GPU OC of =3% = 103%
Overall = 1.025*1.025*1.01*1.28*1.03*100% = 139.899% (~40% more work).

The climate models tax the CPU more heavily than other CPU tasks, but with 4cores/4threads it's not a big problem. On an i7 (4cores/8threads) it is a problem as each tasks competes for the same resources (cores). There is almost zero difference in running 7 climate models rather than 8 on an i7.

For efficiency sake: PCIe x8 required for any Maxwell - big or small. Single GPU or multi - ACEMD is influenced and very sensitive to serial links and <3GHz CPU's. An under clocked CPU and MB whose PCIe lanes are saturated - reckons runtimes. Demanding BUS tasks on x4 are driven away from optional performance. ACEMD is one of the BOINC tasks affected 20% on x4 compared to x8/16. Primegrid's N17 Genefer is another. PCIe NVMe/SATAe SSD can hamper Multi GPU's compute runtimes. (z97 chipset) BOINC USB stick and HDD storage factors show zero inference with the GPU unlike a PCIe SSD. Skylake CPU's will improve upon transfers.

FYI: Maxwell BIOS tweaker program interprets clocks involved with the GPU. For example: the GM107 L2 cache clock -26-39MHz the GPC. GM204 or GM200 stock vBIOS sometimes show a 150-200MHz difference between the four boost/clock states XBAR/SYS/GPC/L2C.

AMD's hybrid cooled flagship (Fury X) is available today - will GM200 MSRP drop in a couple of weeks?
http://wccftech.com/amd-radeon-r9-fury-launch-reviews-roundup/

Toms hardware:
Whereas GM200 measures 601mm², Fiji is almost as large at 596mm². AMD crams a claimed 8.9 billion transistors into that space, and then mounts the chip on a 1011mm² silicon interposer, flanking it with four stacks of High Bandwidth Memory.

HBM achieves its big bandwidth numbers by stacking DRAM vertically. Each die has a pair of 128-bit channels, so four create an aggregate 1024-bit path. This first generation of HBM runs at a relatively conservative 500MHz and transfers two bits per clock. GDDR5, in comparison, is currently up to 1750MHz at four bits per clock (call it quad-pumped, to borrow a term from the old Pentium 4 front-side bus days). That’s the difference between 1 Gb/s and 7 Gb/s. Ouch. But factor in the bus width and you have 128 GB/s per stack of HBM versus 28 GB/s from a 32-bit GDDR5 package. A card like GeForce GTX 980 Ti employs six 64-bit memory controllers. Multiply that all out and you get its 336 GB/s specification. Meanwhile, Radeon R9 Fury X employs four stacks of HBM, which is where we come up with 512 GB/s.

The water-cooling rule of thumb comes to mind right away: use one centimeter of radiator length per 10W of power. Almost 90 °C at the motherboard slot indicates that the VRM pins have passed 100 °C.


Thermal imagery:
http://www.tomshardware.com/reviews/amd-radeon-r9-fury-x,4196-8.html

SP/DP ratio is 1/16 - similar as Tonga GCN1.2 cores rather than GCN1.1 gaming Hawaii 1/8 (Hawaii Firepro series is 1/2 SP/DP). If a professional Fiji arch is released: FP64 performance might be 1/2 or 1/4 ratio or 1/8 or stay at 1/16.
ID: 41395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 41472 - Posted: 4 Jul 2015, 4:30:44 UTC

http://www.guru3d.com/articles_pages/zotac_geforce_gtx_980_ti_amp_extreme_review,8.html

GM200 (MSI/ASUS/EVGA/ZOTAC/GALAX/INNo/Giga) custom PCB released as a couple were reviewed. Overclocked Ti's surpass reference Titan X and Fury X performance. Custom PCB GM200 have 50$ or more premium asking price over ref.

The 10 phase (8+2) - GM200 3 slot heavy set AMP extreme Zotac (320W OP) 'so far' has the highest out of box (1203 TDP base/1355boost) clocks. Rated higher than a 18 phase HoF. The not yet released 17 phase (14+3) Kingpin edition is rumored to be (52MHz) 4 bins better at stock TDP core/boost. 14+3 Classified EVGA's are rated 1 bin less at stock TDP base clock than Zotac's. Gigabyte G1 has shown to be an overclocking beast with it's 600W heatsink and 10 (8+2) phase custom PCB. The ref 6+2 phase PCB limits WC GM200 if lucky 1500 24/7 before VRM start breaking down. GM200 LC or air cooled Custom PCB's >1600MHz is the forefront of 24/7 overclocks. (At these speeds: overclock's will either fail quickly or last to pass depending on the code. Precision testing mandatory for error free long-term compute) There is a 100MHz difference for my 970 between three BOINC projects: 1.5GHz for ACEMD to 1.6GHz for Primegrid's CUDA PPSsieve. POEM's OpenCL is in the middle.

Zotac 980ti EX seems not to include (3) 980/970 AMP 13 phase (8+3+2) omega/extreme features: the L2N BIOS switch - manual voltage read out points - Texas Instruments MSP430 USB microcontroller that can be custom programmed.

EVGA hybrid/Inno black/ZOTAC articstorm with a (full water block and 3 fans) are the standard 6+2 phase ref PCB with VRM heatsink. Buying a ref 980ti and EK water block costs a touch more than readymade GM200 LC.

All these choices are making for decision zap. My newly purchased Z97 MSI MPOWER MB with 3 x16PCIe3.0 slots is waiting for an upgrade. The furthest PCIe slot from CPU true 8lane circuitry - not a 4 lane. (underside of MB shows 8 as does the power slot pins in a 16pin physical slot.) So with 3 GPU's it should be 8x/8x/8x MB. The ti on x8 slot should be enough.

A 980ti/970/750 MB around 1.7mil RAC/day at 500W (GPU). If I replace the 750 with a 980 - RAC 2.2mil/day at 650W GPU and <750W system total. Is a platinum 1000W PSU with a 85amp 12V single rail enough to drive a 980ti/980/970 24/7/365 who are all 1.5GHz? Or 1200W/>100amp rail PSU?

Consumption/RAC ratio for all Maxwell's are much better than Kelper's. My 970 is currently at 125W for NOELIA_ETQ and 140W for GERALD with <73% core usage. The 970 pulled 175W on ~83% core usage NOELIA_S4 during April. 212W is highest I've seen (AIDA64 Integer) benchmark and SiSoftware. For DP code: 52 total DP cores = 130W. An overclocked GM200/204/206 will operate 30% above "rated" TDP if a heatsink can handle the extra heat. GM200 overclocks can add an extra <50% performance vs. reference speeds dependent upon application MHz scaling.

A side note: GTX950(ti) GM206 is supposedly going to be released soon: 768 or 896 cores. (ROP/memory sub-system is unknown) If 950(ti) cache and ROPS are 960's size: a reasonably fast GPU. As 750(ti) before it - the 950(ti) a choice economical system GPU for ACEMD compute.

in my opinion the best nvidia GFC 980 based, while published as custom design pcb and cooling..

Indeed - although some were experiencing problematic voltage settings that limited overclocks. Zotac's (broken) firestorm program supposedly designed to work the USB MC. Multi GPU's on MB cause firestorm to stop working. (RIVA based overclocked programs also have an issue with GM204 AMP omega/extreme voltage sliders while NV inspector works fine.) I only recommend NV inspector for overclocks. NVinsp footprint is much smaller than MSI or EVGA.

Zotac created 4000 total omega/extreme: 1000 omega 970/1000 omega 980/1000 extreme 970/1000 extreme 980. Zotac's Customer service was informative and helpful towards me - providing detailed GPU dynamics. e.g. life expectancy algorithm absolute values - assuring ~1.6GHz for years to come. So far stable OC's held three months non-stop BOINC projects.
ID: 41472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Graphics cards (GPUs) : Big Maxwell GM2*0

©2025 Universitat Pompeu Fabra