Ampere 10496 & 8704 & 5888 fp32 cores!

Message boards : Graphics cards (GPUs) : Ampere 10496 & 8704 & 5888 fp32 cores!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
bozz4science

Send message
Joined: 22 May 20
Posts: 110
Credit: 115,525,136
RAC: 0
Level
Cys
Scientific publications
wat
Message 55692 - Posted: 5 Nov 2020, 22:51:43 UTC - in response to Message 55691.  

Kudos to you! From what I understand so far, I have to support Zoltan's opinion. I really must stress that I am very keen on efficiency as that is sth that everyone should factor in their hardware decisions.

Anyway, it still seems to offer a valid value proposition for me. Especially at this price point. And it is yet very early and future support of RTX 30xx cards could definitely offer some potential for optimisation.
ID: 55692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 55693 - Posted: 5 Nov 2020, 23:04:32 UTC - in response to Message 55691.  
Last modified: 5 Nov 2020, 23:21:34 UTC

I got my EVGA 3070 black today and did some testing
...
Testing results with CUDA 11.1 special app:
about 75-85% speed vs. my 2080ti on my collection of WUs
Thank you for all the effort you have put in this benchmark!
Regrettably your benchmarks confirmed my expectations.
Performance wise it's a bit better than I've expected (67.6%+10%~74.4% of the 2080 Ti).
Power consumption wise it seems as of yet that it's not worth to invest in upgrading from the RTX 2*** series for crunching.


take it with a grain of salt so far, which is exactly why I made the disclaimer. as you yourself mentioned, the types of calculations are different, and if SETI is performing a large number of INT calculations than this result wouldn't be unexpected. Ampere should see the most benefit from a pure FP32 load, and according to your previous comments, GPUGRID should be mostly FP32. It could also be that source code changes might be necessary to take full advantage of the new architecture. the new SETI app has ZERO source code changes from the older 10.2 app, I simply compiled it with the 11.1 CUDA library instead of 10.2.

that's why I was attempting to build a FAHbench version with CUDA 11.1, but I hit a snag there and will have to wait.

I don't do FAH, but since users here have said that GPUGRID is similar in work performed and software used, the 3070 should perform on par with the 2080ti.

check this page for a comparison: https://folding.lar.systems/folding_data/gpu_ppd_overall

showing the F@h PPD of a 3070 *just* behind the 2080ti.
at $500 and 220W, that makes sense. not as power efficient as I'd like, but pushing it beyond the 2080ti nonetheless
ID: 55693 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55694 - Posted: 6 Nov 2020, 0:53:58 UTC - in response to Message 55690.  
Last modified: 6 Nov 2020, 0:59:46 UTC

From the issue raised on the OpenMM github repo, it seems they let their SSL certificate expire back in September.

And no one has done anything about it.

https://github.com/openmm/openmm-org/issues/38

https://github.com/openmm/openmm

This OpenMM forum?

https://simtk.org/plugins/phpBB/indexPhpbb.php?group_id=161&pluginname=phpBB
ID: 55694 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55695 - Posted: 6 Nov 2020, 1:11:47 UTC

The compiler optimizations for Zen 3 haven't made it into any linux kernel yet either. GCC11 and CLANG12 are supposed to get znver3 targets in the upcoming 5.10 kernel next April for the 21.04 distro release.

https://www.phoronix.com/scan.php?page=news_item&px=AMD-Zen-3-Linux-Expectations
ID: 55695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 55720 - Posted: 10 Nov 2020, 22:15:32 UTC
Last modified: 10 Nov 2020, 23:05:23 UTC

got the 3070 up and running on Einstein@home for some more testing.


As far as Einstein performance:

the 2080ti does the current batch GW tasks in about 4 minutes, using 225W (360 t/day)
the 3070 does the current batch GW tasks in about 5 minutes, using 170W (288 t/day)
the 2080ti does the FGRP tasks in about 6 minutes, using 225W (240 t/day)
the 3070 does the FGRP tasks in about 7:20 minutes, using 150W (196.4 t/day)

so for Einstein GW tasks, the 3070 is about 5% more efficient
and for the Einstein GR tasks, the 3070 is about 19% more efficient

now this isnt to say you should buy Ampere (or even nvidia cards) for Einstein, since some of the newer AMD cards perform much better there (faster and overall more efficient). But this will serve as an additional data point with a different processing type. you can see the efficiency improvements here, especially on the GR tasks.
ID: 55720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55731 - Posted: 12 Nov 2020, 23:54:31 UTC - in response to Message 55720.  
Last modified: 12 Nov 2020, 23:56:42 UTC

I wonder if you slowed down the 2080Ti (reducing GPU voltage accordingly as well) by 20% to match the speed of the 3070, would it be about the same effective? (or even better in the case of GW tasks)
ID: 55731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 55732 - Posted: 13 Nov 2020, 1:57:11 UTC - in response to Message 55731.  

the 2080ti was already power limited to 225W. its not possible (under Linux) to reduce the voltage. there are just no tools for it. reducing the power limit will have the indirect effect of reducing voltage, but I don't have control of the voltage directly. I could power limit further, but it will only slow the card further. you start to lose too much clock speed below 215-225W in my experience (across 6 different 2080tis). the 2080ti was also watercooled with temps never exceeding 40C so it had as much of an advantage as it could have had.

the 3070 was run at a 200W power limit, but it never even came close to that in these Einstein loads, with speed probably only limited by temp/clock boost bins. But that's really par for the course on mid-level Nvidia cards running Einstein tasks, the GR tasks are just light weight and dont pull a lot of power, and the GW tasks are more CPU bound than anything else so you dont get full GPU utilization. further efficiency gains could probably be made here on the 3070 with more power limiting and overclocking, but I didn't bother for this test.
ID: 55732 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SolidAir79

Send message
Joined: 22 Aug 19
Posts: 7
Credit: 168,393,363
RAC: 0
Level
Ile
Scientific publications
wat
Message 56525 - Posted: 15 Feb 2021, 21:18:14 UTC

Any apps i can use yet on my 30s cards?
ID: 56525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 56527 - Posted: 15 Feb 2021, 22:19:59 UTC - in response to Message 56525.  

Any apps i can use yet on my 30s cards?


nope.
ID: 56527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 56590 - Posted: 17 Feb 2021, 0:25:32 UTC

https://www.gpugrid.net/workunit.php?wuid=27026028

with no Ampere compatible app, and no mechanism to prevent work from being sent to incompatible systems, situations like this will only become more common as more and more users upgrade to these new cards.

3 out of the 6 systems that have handled this WU were using Ampere cards and failed because of it.

If we had an Ampere compatible CUDA 11.1 app, this would have been completed by the first system.
ID: 56590 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 56601 - Posted: 17 Feb 2021, 12:29:28 UTC - in response to Message 56590.  
Last modified: 17 Feb 2021, 12:29:40 UTC

I've "saved" a workunit earlier (my host was the 7th crunching it - the 1st successful one). It had been sent to two ampere cards before.
ID: 56601 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 56604 - Posted: 17 Feb 2021, 15:03:36 UTC - in response to Message 56601.  

i have a couple like that that I similarly saved. even some _7s (8th)

this one, 50% of the users had RTX 30-series https://www.gpugrid.net/workunit.php?wuid=27025291
ID: 56604 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Asghan

Send message
Joined: 30 Oct 19
Posts: 7
Credit: 405,900
RAC: 0
Level

Scientific publications
wat
Message 57019 - Posted: 25 Jun 2021, 6:51:57 UTC

Is there any update regarding nVidia Ampere Workunits??
My Ampere cards are getting bored -.-
ID: 57019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 57355 - Posted: 21 Sep 2021, 12:31:24 UTC
Last modified: 21 Sep 2021, 12:32:54 UTC

my 3080Ti (limited to 300W mind you) did this ADRIA task in under 9.5hrs

https://www.gpugrid.net/result.php?resultid=32642251

anyone with a high power 3090 or 3080Ti run faster? my model 3080Ti will only go up to 366W, but I know some 3080Tis and 3090s can reach into the 400-500W range.
ID: 57355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 57980 - Posted: 1 Dec 2021, 1:42:27 UTC - in response to Message 57355.  

I am not running 3090s or 3080ti cards but I do have some times/comparisons for high-end Turing and Ampere GPUs for Adria tasks.

Nvidia Quatro RTX6000- 8.4 hours
Nvidia RTX A6000- 6.7 hours
ID: 57980 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57995 - Posted: 1 Dec 2021, 16:59:57 UTC - in response to Message 57980.  

I am not running 3090s or 3080ti cards but I do have some times/comparisons for high-end Turing and Ampere GPUs for Adria tasks.

Nvidia Quatro RTX6000- 8.4 hours
Nvidia RTX A6000- 6.7 hours
The NVidia Quadro RTX 6000 is a "full chip" version of the RTX 2080Ti (4608 vs 4352 CUDA cores)
while the NVidia RTX A6000 is the "full chip" version of the RTX 3090 (10752 vs 10496 CUDA cores).
The rumoured RTX 3090Ti will have the "full chip" also.
The RTX 3080 Ti has 10240 CUDA cores.
(The real world GPUGrid performance of the Ampere architecture cards scales with the half of the advertised number of CUDA cores).
ID: 57995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 57996 - Posted: 1 Dec 2021, 17:18:51 UTC - in response to Message 57995.  


(The real world GPUGrid performance of the Ampere architecture cards scales with the half of the advertised number of CUDA cores).



Is that true for all NVidia GPUs or just Ampere? Just out of curiosity, why is it this way?
ID: 57996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58002 - Posted: 1 Dec 2021, 19:37:41 UTC - in response to Message 57996.  

Just guessing here but since every new generation of Nvidia cards has basically doubled or at least increased the CUDA core count and since the GPUGrid app as well as a very few other project apps are really well coded for parallelization of computation, you can state that the crunch time scales with more cores.

You can tell how well optimized an application is by how much sustained utilization it produces and how close to the max TDP of the card the app runs.

The GPUGrid apps and the Minecraft apps are the only two apps that I know of that will run at 97-100 utilization through the entire computation at the full power capability of the card.

Kudos to the app developers of these projects. Job well done!
ID: 58002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58003 - Posted: 1 Dec 2021, 19:43:18 UTC - in response to Message 57996.  
Last modified: 1 Dec 2021, 20:29:26 UTC

(The real world GPUGrid performance of the Ampere architecture cards scales with the half of the advertised number of CUDA cores).

Is that true for all NVidia GPUs or just Ampere? Just out of curiosity, why is it this way?
It's true only for the Ampere architecture.

As you can see on the picture above, the number of FP32 units have been doubled in the Ampere architecture (the INT32 units have been "upgraded"), but it resides within the (almost) same streaming multiprocessor (SM), so it could not feed much better that many cores within the SM. From a cruncher's point of view the number of SMs should have been doubled as well (by making "smaller" SMs).
The other limiting factor is the power consumption, as the RTX 3080Ti (RTX3090 etc) easily hits the 350W power limit with this architecture.

https://www.reddit.com/r/hardware/comments/ikok1b/explaining_amperes_cuda_core_count/
https://www.tomshardware.com/features/nvidia-ampere-architecture-deep-dive
https://support.passware.com/hc/en-us/articles/1500000516221-The-new-NVIDIA-RTX-3080-has-double-the-number-of-CUDA-cores-but-is-there-a-2x-performance-gain-
ID: 58003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58005 - Posted: 1 Dec 2021, 21:42:05 UTC

As long as you can keep an INT32 operation out of the warp scheduler, then Ampere series can do two FP32 operations on the same clock.
ID: 58005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Graphics cards (GPUs) : Ampere 10496 & 8704 & 5888 fp32 cores!

©2025 Universitat Pompeu Fabra