Advanced search

Message boards : Graphics cards (GPUs) : Tesla

Author Message
Profile Mumak
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Scientific publications
Message 40419 - Posted: 10 Mar 2015 | 21:11:00 UTC

Just in case somebody is interested how some exotic GPUs perform, I did a run on a Tesla K20c:
Runtime: 24,097 s
GPU load: ~95%
Temperature: ~60 C
Power: ~120 W

For comparison, my 750 Ti does the same task in 40,300 s.

Sure, Teslas are better in DPFP. The only such app I tried was Milkyway, which however uses OpenCL, so it's not ideal. The performance there was comparable to a RADEON HD7970/280X.

[CSF] Thomas H.V. DUPONT
Send message
Joined: 20 Jul 14
Posts: 732
Credit: 130,089,082
RAC: 106,877
Scientific publications
Message 40420 - Posted: 11 Mar 2015 | 6:40:09 UTC - in response to Message 40419.

Thanks Mumak! Really interesting :)
I expected better performance from a GPU of this quality...
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0

Profile Mumak
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Scientific publications
Message 40421 - Posted: 11 Mar 2015 | 9:45:36 UTC

Oh, I had ECC enabled on the Tesla. Switching off and giving it another run..

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Scientific publications
Message 40422 - Posted: 11 Mar 2015 | 12:39:44 UTC

Excellent power/performance ratio for [13]SMX (2496CUDA) while computing FP32. You're GK110 120W operating points: 0.04w per core or 9.23W per 192CUDA SMX. 120W is 1024CUDA GTX960 domain. A 12 SMX 780 can sip 145W at 836MHz or lower. (reference base clock)

Not including Maxwell's GM204 GTX970 - cut GK110 are the best eco-tuners NVidia produced. Cut GK104 are also able eco-tuners as GTX660ti and GTX760 have proven.
Will a full GK110 see ~150W at 95% core? Lowest is about 165W or so. This really good for an eco-tune even as GK110 are capable of maximizing every available ounce of power at 1.2GHz/250W.

There are a lot of DP64 enabled GK110 running FP32 ACEMD. Maybe a FP64 ACEMD app will be created for those specific high performance DPFP GPU's?

GPUGRID Role account
Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Scientific publications
Message 40423 - Posted: 11 Mar 2015 | 13:50:11 UTC

You can use nvidia-smi to increase the application clocks. This will extract another 10% or so.

Profile Mumak
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Scientific publications
Message 40424 - Posted: 11 Mar 2015 | 14:15:35 UTC

Thanks for the hint. Current/default clock is 705 MHz, max should be 758.
Will first finish the current WU to see the difference between ECC/Non-ECC, then will try some OC ;-)

Profile Mumak
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Scientific publications
Message 40425 - Posted: 11 Mar 2015 | 16:05:31 UTC

Decided to try max GPU clock (758 MHz), the WU has not finished yet.
Just for comparison (running NOELIA_PO now):
705 MHz - 133 W
758 MHz - 150 W

Profile Mumak
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Scientific publications
Message 40434 - Posted: 12 Mar 2015 | 9:13:17 UTC

NOELIA_ot, 758 MHz, ECC off: 27k seconds

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Scientific publications
Message 40589 - Posted: 22 Mar 2015 | 23:19:16 UTC - in response to Message 40420.

I expected better performance from a GPU of this quality...

There's no reason to do so. It's just the same GK110 chips as GTX780/Ti and Titan/Z. It gains energy efficiency by being run at very low clock speeds and voltages. To some extent thiis could easily be done on other cards as well. Although most will prefer a high performance state anyway.

A comparison to a stock GTX960 sounds impressive, but that GPU is driven quite hard up to maximum voltages around 1.20 V and has a lot of room to run more efficiently for a minor performance loss (down to 1.10 - 1.00 V).

Maybe a FP64 ACEMD app will be created for those specific high performance DPFP GPU's?

Why? Even the super expensive Titan looses 2/3 of the maximum throughput in DP mode. If the app can get by with 32 bit it's always better to use only 32 bit. That's why "mixed precision" with 16 bit fp enhancements will become a topic for nVidia with Pascal.

A valid reason would be to use new physical models which might not be possible in FP32. But I don't think it's the precision which limits, it's probably more often the flow control which makes these tasks better suited to CPUs.

Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Tesla
