Ampere 10496 & 8704 & 5888 fp32 cores!

Author	Message
ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 55613 - Posted: 16 Oct 2020, 18:46:12 UTC I think Zoltan is a bit vague, that's why I find his point hard to understand. What I think he means: let's ignore the INT32 for a moment and only focus on the additional FP32 units. Which data can they work on? 1) Do they work on the same data as the 1st set of FP32 units? If so, this would severly limit their usefulness to special code containing consecutive independent FP32 instructions. 2) Or do tey work on additional data, making them universally usable? By data I'm referring to a "warp" here, a group of 32 pixels or numbers which move together through the GPU pipeline and onto which the same instructions are executed. If case 1) was true we'd have a super scalar architecture, like Fermi (apart from the biggest chips) and Kepler. There nVidia realized the utilized of the additional superscalar units was so bad, it actually wasn't worth it and fixed this in Maxwell. In Kepler each SM had 4 "data paths" for 4 warps, each with 32 "pixels". These can make good use of 128 FP32 units. In addition there were 50% more / 64 superscalar FP32 units, which could be used in case any of the 4 warps contained such independent instructions. Going to Maxwell nVidia changed this to just 4 warps and 128 FP32 per SM. Performance per SM dropped by ~10%, but they were able to pack more of them into the same chip area, more than offetting this loss. I don't think tey went back to a superscalar design. Instead I think the total number of warps through a SM has doubled from Turing to Ampere. This also matches the doubled L1 bandwidth to feed this beast. Without double the throughput per SM this wouldn't make sense. MrS Scanning for our furry friends since Jan 2002 ID: 55613 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 55643 - Posted: 27 Oct 2020, 20:38:57 UTC RTX3070 is here and should appear in shops in 2 days. Availability could be better than for the bigger cards, but demand will probably be very high. MrS Scanning for our furry friends since Jan 2002 ID: 55643 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level Scientific publications	Message 55644 - Posted: 28 Oct 2020, 0:10:36 UTC Waiting for the compute results for the cards. Would be the correct comparison of compute performance against the RTX 2080 Ti since the wattages are comparable. ID: 55644 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55645 - Posted: 28 Oct 2020, 15:34:43 UTC we need a proper CUDA 11.1+ app from GPUGRID before we can compare anything ID: 55645 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 55648 - Posted: 29 Oct 2020, 9:38:50 UTC - in response to Message 55645. Sure. I was just mentioning the new release as it may make it easier for Toni to get access to a card. MrS Scanning for our furry friends since Jan 2002 ID: 55648 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 55649 - Posted: 29 Oct 2020, 15:22:44 UTC Well I tried and failed to get my hand on Ampere cards once again. I wanted to buy 2. No thanks am not going pay 1000$ or 1200usd to a Ebay scalper for 3070 or 3080. If you haven't heard: Nvidia is saying no ample 3080 3090 supply until 2021. The Rtx 3070 was sold out instantly this morning at microcenter store in Cambridge MA. (Same place I purchased a 2080ti openbox Asus rog cod edition for under 800$.) Amazon and newegg are sold out too. Better luck next time. I'd like to test 3080 and 3070 out against the 2080ti. Unbelievable how much demand there is for Ampere. I arrived 30 mins before opening. Crazy how people waited overnight. Funny thing is among all the creators and gamers only a few have heard of BOINC platform. Many knew about folding at home and of course coin mining. Sadly none heard about GPUGRID. All this hoopla reminds me of old apple phone releases. ID: 55649 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55650 - Posted: 29 Oct 2020, 20:16:51 UTC - in response to Message 55649. Last modified: 29 Oct 2020, 20:20:06 UTC you have to go in-person (and usually at least a day in advance, waiting in line like black friday) to get the cards from Microcenter. you can't buy 30-series online. one of the better methods for online shopping (in the US) is the queue system at EVGA's website. that's what I did this morning. they had a launch-day specific sku with a free t-shirt on their 3070 Black model for $499. they only had a limited number from what I can tell since you could only be added to the queue for about 5 minutes from about 6:05-6:10am PST. I'm in line for both the launch-day 3070 Black and the normal 3070 Black. I expect to get my email to buy the launch day card either today or tomorrow. You can still get in the queue for the 3070 or 3080, or 3090, but just know that you'll have thousands ahead of you at this point so you might be waiting a long time before you are able to actually buy it. don't even bother trying to buy from Best Buy or Newegg. they don't have any bot protections like EVGA does. most of the cards sold there this morning went to the bots. but even if you have a Ampere card in-hand. you will be unable to test on GPUGRID until the admins/devs release a new application that supports the new cards. A few people have tried already and the tasks just error out immediately because the current app doesnt support them. ID: 55650 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55651 - Posted: 30 Oct 2020, 0:53:38 UTC - in response to Message 55650. well. my email came, order placed. EVGA RTX 3070 Black on the way :) now just need to wait for GPUGRID to release a new app! ID: 55651 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level Scientific publications	Message 55652 - Posted: 30 Oct 2020, 2:32:17 UTC - in response to Message 55651. Congratz Ian. Now you get to be the first guinea pig of a new app. ID: 55652 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 55654 - Posted: 31 Oct 2020, 17:43:41 UTC - in response to Message 55651. Ian, have you tried FAH scoring it yet? I guess you might be able to crunch for Greg Bowman until Toni's crew updates ACEMD. Folding@Home is more generous with points. I don't know if they are based on cobblestones or not. If so, Bowmanlab's app runs faster. ID: 55654 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 55655 - Posted: 31 Oct 2020, 18:27:42 UTC Too bad that F@H is no longer on the BOINC platform so that points would apply. I know Dr Bowman is a Stanford graduate and they competed with Berkley on this stuff during development. I think they need to develop the F@H user interface further to match the power that BOINC gives experienced crunchers. That said, I don't know of any other GPU based research dealing directly with a COVID-19 vaccine out there and in communicating with Greg I have found him to be quite approachable and conversational. ID: 55655 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55659 - Posted: 1 Nov 2020, 3:12:47 UTC - in response to Message 55654. I will receive the card next Thursday 11/5. I don’t participate in FaH. And from what I understand, their application is not CUDA, but rather OpenCL. At least FAHbench is. So it likely won’t see the most benefit from the new architecture. Having a CUDA 11.1 app is crucial for this. What I did do in order to try to test CUDA performance is ive recompiled the SETI special CUDA application for CUDA 11.1. And I have an offline benchmarking tool and some old SETI workunits so I can check relative CUDA performance between a 2080ti and the 3070. This should give me a good baseline of relative performance to expect with Ampere when the GPUGRID app finally gets updated. ID: 55659 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 55660 - Posted: 1 Nov 2020, 15:43:37 UTC - in response to Message 55659. Last modified: 1 Nov 2020, 15:57:17 UTC Thanks for pointing out the openCL vs CUDA factor, Ian. I didn't think about FAHcore using a different function. That makes me wonder if PassMark GPU direct compute scores might be a better reference than FAH scores for predicting performance running ACEMD. (edit)After doing some checking I see PassMark's bench is not CUDA either. Is there a CUDA1.1 benchmark test anywhere? ID: 55660 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 55661 - Posted: 1 Nov 2020, 15:50:43 UTC - in response to Message 55659. Last modified: 1 Nov 2020, 16:01:15 UTC I don’t participate in FaH. And from what I understand, their application is not CUDA, but rather OpenCL. At least FAHbench is. Well, I do. The present FAHcore 22 v0.0.13 is CUDA, though the exact CUDA version number used is not disclosed. folding@home wrote: As of today, your folding GPUs just got a big powerup! Thanks to NVIDIA engineers, our Folding@home GPU cores — based on the open source OpenMM toolkit — are now CUDA-enabled, allowing you to run GPU projects significantly faster. Typical GPUs will see 15-30% speedups on most Folding@home projects, drastically increasing both science throughput and points per day (PPD) these GPUs will generate. This post of mine earlier in this thread discussed the performance gain of switching FAHcore from OpenCL to CUDA. As NVidia helped them to develop this new app, and NVidia gives the largest support for the folding@home project of all the corporations, I suppose they did their best (i.e. it's using CUDA11 to get the most out of their brand new cards). Slightly supporting this supposition is that they asked us to upgrade our drivers, and that the ... core22 0.0.13 should automatically enable CUDA support for Kepler and later NVIDIA GPU architectures ... aligns with that CUDA 11 is backwards compatible with Kepler. Ian&Steve C. wrote: So it likely won’t see the most benefit from the new architecture. Having a CUDA 11.1 app is crucial for this. That's true, it's also crucial to make these cards usable for GPUGrid in any way. What I did do in order to try to test CUDA performance is ive recompiled the SETI special CUDA application for CUDA 11.1. And I have an offline benchmarking tool and some old SETI workunits so I can check relative CUDA performance between a 2080ti and the 3070. I'm really interested in those results! This should give me a good baseline of relative performance to expect with Ampere when the GPUGRID app finally gets updated. This is where our opinions sunder: SETI (and other analytical applications) are using a lot of FFTs, which could benefit from the "extra" FP32 units, while MD simulations are not. So I don't consider SETI as an adequate benchmark for GPUGrid. ID: 55661 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 55662 - Posted: 1 Nov 2020, 17:52:46 UTC Thank you for sharing your insight, Zoltan. I much appreciate your perspective and experience in DC. ID: 55662 · Rating: 0 · rate: / Reply Quote

bozz4science Send message Joined: 22 May 20 Posts: 110 Credit: 115,525,136 RAC: 0 Level Scientific publications	Message 55663 - Posted: 1 Nov 2020, 18:21:03 UTC After the recent rollout of CUDA support on F@H, that I shared with you in the very same thread initially, there now exist specific CUDA cores that helped to increase performance of NIVIDA cards tremendously on F@H. As Zoltan pointed out, there are still many unknown variables around the provided data, so take the performance charts provided on their website. Risking to be off-topic in this thread... I like to diversify my contribution and occasionally run F@H for a couple WUs before switching back to GPUGrid. What I like about their infrastructure is that you can check out a project description for all currently running WUs in addition to a preference you can set directly in the software to specify what cause (disease) you want to focus your computational sources on. You can choose from various targets, such as high priority projects, Parkinsons, cancer, Covid-19, Alzheimer, Huntington. Those projects change from time to time depending on what research is currently being conducted. In the past there have been other research projects around Dengue fever, Chagas disease, Zika virus, Hepatitis C, Ebola virus, Malaria, antibiotic resistance and Diabetes. If there is currently no work for the chosen preference, it immediately switches to the highest priority projects. In that sense, I feel like F@H offers an informational advantage over GPUGrid, but certainly lacks the flexibility and options for cruncher that F@H lacks. My card (GTX750 Ti) finishes most tasks at F@H ahead of the first deadline and thus receives the early finish bonus points. It matches my RAC on GPUGrid if running exclusively 24/7 at around 100k credits. In the end I believe both projects have their own place and complement each other. Anyways, I am also looking forward to seeing those benchmark results. Awesome that you managed to grab one in spite of the current sparsity of supply! ID: 55663 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55668 - Posted: 1 Nov 2020, 19:05:31 UTC - in response to Message 55661. if they get the CUDA changes into the FAHbench application, I’ll try it out. But last time I looked at it, the benchmarking app still only used opencl. When comparing something like two different cards you need to eliminate as many variables as possible. Using the standard FAH app without the control the run the same exact work units over and ove the best I could do is run run each card for a few months to get average PPD numbers from each. I’m just not willing to put that level of effort into it, when I can likely get the same results from a quicker benchmark. From what I’m reading, to get the benefit of the new data pipeline, you need CUDA 11.1, not just cuda 11. While seti does different types of calculation, the comparison of architectures and models should be very comparable. When I moved from SETI to GPUGRID the hierarchy of GPUs remained the same. So the relative performance of 3070 to 2080ti observed with SETI should translate pretty closely with what we will probably observe on GPUGRID (provided they build an 11.1 app also). ID: 55668 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55689 - Posted: 5 Nov 2020, 21:02:55 UTC - in response to Message 55668. I got my EVGA 3070 black today and did some testing PLEASE KEEP IN MIND: this is very preliminary testing with an application that does different calculation types than GPUGRID. this is merely an attempt to compare against another BOINC project using a CUDA app in a controlled benchmark kind of way. testing platform: Asrock X570M Pro4 mATX (x16 slots on position 1 and 4) AMD Ryzen9 3900X @ 4.20GHz 16GB GSkill DDR4-3600 CL14 Phanteks mATX case with only 4 PCI expansion slots SETI@home special application recompiled by me for CUDA 11.1 on Linux Had a bit of trouble getting it to work in my Linux system at first. Thought it might be the drivers, 3070 "needs" the 455.38 driver, but previous tests and info from others have shown that slightly old drivers usually work just fine for new cards when the architecture is the the same, so 455.23.04 should work just fine, only reporting a generic card name like "Nvidia Graphics Device". So i went through the trouble to install the nvidia drivers from the run file. installed 455.38 with little issue, and it still was being very flaky, randomly dropping the GPU, extreme lag navigating the desktop, very low GPU utilization, or even failing to boot. Narrowed it down to the PCIe gen3 ribbon risers and the card constantly trying to run at PCIe gen4 no matter what settings I used in the BIOS (I set all slots to gen3 but nvidia-settings still reported running at gen4). due to the motherboard and case layout, i was unable to install the card directly into the motherboard bottom slot due to lack of enough space and i don't have any gen4 capable risers. So i carefully removed and set aside the 2080ti which is custom watercooled and hooked in-line with the CPU to be able to plug the 3070 directly into the top slot. Finally working as it should. no random issues. Testing results with CUDA 11.1 special app: about 75-85% speed vs. my 2080ti on my collection of WUs 1980-1995 MHz (same as 2080ti) 14000MHz mem clock on both 2080ti and 3070 98-99% GPU Utilization reported but only ~190-200W power as reported by nvidia-smi unrestrained, 220W PL (2080ti used 225W, at the PL) keep in mind that Ampere went back to 128 cuda cores per SM, like Pascal, but unlike Pascal the data paths are different. Also remember that the special app seems to heavily favor SM count, likely due to the way it is coded. it is my guess that either SETI calculations use more integer math, or both FP32 data paths are not being used. 3070 has 46 SMs, the 2080ti has 68 SMs so with 67% the SMs (and CUDA cores), its giving ~80% the performance, not bad. but power used comparison isn't so kind. at 89% power used, its giving ~80% performance. this might be able to be improved upon with power limiting, but I didn't test this right now So I haven't lost hope yet. I'm hopeful that if GPUGRID does use mostly FP32, and they recode their app to take advantage of both pipelines in a CUDA 11.1 app, that we could see better performance. I'm waiting for the new app to be ready. ID: 55689 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 55690 - Posted: 5 Nov 2020, 22:30:43 UTC - in response to Message 55689. Last modified: 5 Nov 2020, 22:32:07 UTC i attempted to build FAHbench with CUDA support (the information says it's possible), but I'm hitting a snag at configuring OpenMM. I get errors when trying to configure OpenMM with CMake (as outlined in the FAHbench instructions) I cant check the OpenMM docs since http://docs.openmm.org seems to be a dead link I can't ask about it on the forums unless I register I can't register unless they accept my request (says it'll take 72 hrs) ID: 55690 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 55691 - Posted: 5 Nov 2020, 22:40:15 UTC - in response to Message 55689. Last modified: 5 Nov 2020, 22:41:10 UTC I got my EVGA 3070 black today and did some testing ... Testing results with CUDA 11.1 special app: about 75-85% speed vs. my 2080ti on my collection of WUs Thank you for all the effort you have put in this benchmark! Regrettably your benchmarks confirmed my expectations. Performance wise it's a bit better than I've expected (67.6%+10%~74.4% of the 2080 Ti). Power consumption wise it seems as of yet that it's not worth to invest in upgrading from the RTX 2*** series for crunching. ID: 55691 · Rating: 0 · rate: / Reply Quote