NVidia GPU Card comparisons in GFLOPS peak

Author	Message
jlhal Send message Joined: 1 Mar 10 Posts: 147 Credit: 1,077,535,540 RAC: 0 Level Scientific publications	Message 21221 - Posted: 22 May 2011, 10:18:54 UTC - in response to Message 21158. 1st system Win 7 Pro x64: Gigabyte GTX460 1GB GeForce GTX 460 (driver version 27061, CUDA version 4000, compute capability 2.1, 962MB , 641 GFLOPS peak) 2nd system Xubuntu 11.04 AMD64: Gigabyte GTX460SO 1GB GeForce GTX 460 (driver version unknown , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak) Minor precisions: 1st system Win 7 Pro x64: Gigabyte GTX460OC 1GB GeForce GTX 460OC(driver version 2.70.61, CUDA version 4000, compute capability 2.1, 962MB (?) , 641 GFLOPS peak) 2nd system Xubuntu 11.04 AMD64: Gigabyte GTX460SO 1GB GeForce GTX 460 SO(driver version 270.41.06 , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak) Lubuntu 16.04.1 LTS x64 ID: 21221 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 21293 - Posted: 3 Jun 2011, 7:28:02 UTC I suspect that the 962 MB is 1 GB minus whatever Windows 7 has reserved for its own use. ID: 21293 · Rating: 0 · rate: / Reply Quote

Kate Cowles Send message Joined: 4 May 10 Posts: 3 Credit: 2,734,534 RAC: 0 Level Scientific publications	Message 21304 - Posted: 5 Jun 2011, 1:54:16 UTC - in response to Message 20534. I run GPUgrid on a GT 430 under Ubuntu 10.04.2. WUs take about 32 hrs. so there's no problem getting them returned in time. Thu 02 Jun 2011 07:48:48 AM CDT NVIDIA GPU 0: GeForce GT 430 (driver version unknown, CUDA version 4000, compute capability 2.1, 1023MB, 45 GFLOPS peak) ID: 21304 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 21305 - Posted: 5 Jun 2011, 10:06:59 UTC - in response to Message 21304. Hi Kate, 45 GFlops peak for a GT430 does not sound right, more like your Ion ;) A GT430 should be reporting as about 180 GFlops peak, and that's after considering it can't use all the cuda cores, otherwise it would be about 270. Perhaps you are using a very old version of Boinc (for Linux systems we cannot see the Boinc version)? ID: 21305 · Rating: 0 · rate: / Reply Quote

Kate Cowles Send message Joined: 4 May 10 Posts: 3 Credit: 2,734,534 RAC: 0 Level Scientific publications	Message 21306 - Posted: 5 Jun 2011, 12:47:38 UTC - in response to Message 21305. Last modified: 5 Jun 2011, 12:48:21 UTC Hi skgiven, I'm using BOINC 6.10.17, so not all that old. Maybe I'll upgrade to 6.10.58 one of these days and see whether that makes a difference in the GFLOPs calculation. I noticed on an earlier message in this thread that somebody was reporting 179 GFLOPs for this card. I had been startled when I first ran BOINC on it and saw the 45 GFLOPs (you're right -- not much different from my ION 2, which shows 39). The GT 430 runs GPUgrid WUs a little over 3 times as fast as the ION 2, so there is indeed something wrong with the GFLOPs calculation for the 430. It's slow -- but not THAT slow. In any case, I'm pretty happy with the GT 430 for my purposes. It's not going to set any speed records, but it runs cool (nvidia-smi is showing it at 54 C right now while working at 98% capacity on a GPUgrid WU), and doesn't draw much power. It's double-precision, so I can run Milky Way. And it's just fine for testing and debugging CUDA code for Fermi cards. Kate ID: 21306 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 21307 - Posted: 5 Jun 2011, 13:00:54 UTC - in response to Message 21306. I'm using BOINC 6.10.17, so not all that old. Maybe I'll upgrade to 6.10.58 one of these days and see whether that makes a difference in the GFLOPs calculation. Kate Yes, that will explain it. The corrected GFLOPs calculation for Fermi-class cards (assuming 32 cores per multiprocessor, instead of 8) wasn't introduced until v6.10.45 - your version 6.10.17 dates back to October 2009, which is long before details of the Fermi range were available. Note that your GT 430 actually has 48 cores per MP, so the calculation will still be a bit on the low side. BUT: - that only affects the cosmetic display of speed in the message log at startup. It doesn't affect the actual processing speed in any way. ID: 21307 · Rating: 0 · rate: / Reply Quote

Kate Cowles Send message Joined: 4 May 10 Posts: 3 Credit: 2,734,534 RAC: 0 Level Scientific publications	Message 21308 - Posted: 5 Jun 2011, 15:00:31 UTC - in response to Message 21307. Yes, that will explain it. The corrected GFLOPs calculation for Fermi-class cards (assuming 32 cores per multiprocessor, instead of 8) wasn't introduced until v6.10.45 - your version 6.10.17 dates back to October 2009, which is long before details of the Fermi range were available. Note that your GT 430 actually has 48 cores per MP, so the calculation will still be a bit on the low side. BUT: - that only affects the cosmetic display of speed in the message log at startup. It doesn't affect the actual processing speed in any way. Thanks for the clear explanation. So BOINC v6.10.17 calculates correctly for the ION (which does have only 8 cores per MP) but not for the GT 430. Well, this gives me a reason (if only cosmetic) to upgrade to 6.10.58. ID: 21308 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 21309 - Posted: 5 Jun 2011, 15:18:34 UTC - in response to Message 21306. It's double-precision, so I can run Milky Way. You can - just don't do it, if you can help it ;) Your "not-so-fast" card is only 1/12th its sp performance under dp. Any ATI is going to walk all over it. I think GPU-Grid and Einstein can make much better us of this (sp) power. MrS Scanning for our furry friends since Jan 2002 ID: 21309 · Rating: 0 · rate: / Reply Quote

Dagorath Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level Scientific publications	Message 21311 - Posted: 5 Jun 2011, 20:49:26 UTC Fri 03 Jun 2011 02:34:07 PM MDT Starting BOINC client version 6.12.26 for x86_64-pc-linux-gnu . . . Fri 03 Jun 2011 02:34:07 PM MDT NVIDIA GPU 0: GeForce GTX 570 (driver version unknown, CUDA version 3020, compute capability 2.0, 1279MB, 1425 GFLOPS peak) ID: 21311 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 23179 - Posted: 29 Jan 2012, 20:45:27 UTC - in response to Message 21311. Last modified: 26 Mar 2012, 21:42:09 UTC The update is just to add three cards (GTX 560 448 Ti, GTX 560 and GT 545) *Relative Comparison of Recommended Cards, with approximated* CC Correction Factor values (in brackets): GTX 590 GF110 40nm Compute Capable 2.0 2488 GFlops peak (3359) GTX 580 GF110 40nm Compute Capable 2.0 1581 GFlops peak (2134) GTX 570 GF110 40nm Compute Capable 2.0 1405 GFlops peak (1896) GTX 560 Ti 448 GF110 40nm Compute Capable 2.0 1311 GFlops peak (1770) GTX 480 GF100 40nm Compute Capable 2.0 1345 GFlops peak (1816) GTX 295 GT200b 55nm Compute Capable 1.3 1192 GFlops peak (1669) GTX 470 GF100 40nm Compute Capable 2.0 1089 GFlops peak (1470) GTX 465 GF100 40nm Compute Capable 2.0 855 GFlops peak (1154) GTX 560 Ti GF114 40nm Compute Capable 2.1 1263 GFlops peak (1136) GTX 285 GT200b 55nm Compute Capable 1.3 695 GFlops peak (973) GTX 560 GF114 40nm Compute Capable 2.1 1075 GFlops peak (967) GTX 275 GT200b 55nm Compute Capable 1.3 674 GFlops peak (934) GTX 260-216 GT200b 55nm Compute Capable 1.3 596 GFlops peak (834) GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 768MB (816) GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 1GB (816) GTX 550 Ti GF116 40nm Compute Capable 2.1 691 GFlops peak (622) GTS 450 GF106 40nm Compute Capable 2.1 601 GFlops peak (541) GT 545 GF116 40nm Compute Capable 2.1 501 GFlops peak (451) This update is based on the previous table, and is by performance after considering compute capability, highest first. I have only included CC1.3, CC2.0 and CC2.1 cards, but the comparison was originally based on CC1.1 cards. Only Reference clocks are listed and only for optimized cards by the recommended methods. As usual, there are accuracy limitations and it’s lifetime is limited by the apps/drivers in use. New cards were added based on GFlops peak and CC, rather than resurveying different cards (demonstrated to be reliable). Comparable but different systems (CPUs) were used, not all cards used the same drivers, only one task type was looked at. Some of these comparisons are adapted from when we used the 6.11app but the correction factors are still valid. When a new app is released I will review these cards/values for consistency; normally the cards relative performances don't change too much, but on at least 2 occasions things did change in the past. For CC2.0 and CC2.1 it's less likely. When the next generation of NVidia turns up things will probably start changing, and obviously an ATI app would add a few cards. Correction Factors Used** CC1.1 = 1.00 CC1.2 = 1.30 CC1.3 = 1.40 CC2.0 = 1.35 CC2.1 = 0.90 Thanks for the posts, FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 23179 · Rating: 0 · rate: / Reply Quote

oldDirty Send message Joined: 17 Jan 09 Posts: 22 Credit: 3,805,080 RAC: 0 Level Scientific publications	Message 23211 - Posted: 1 Feb 2012, 22:44:23 UTC ATI means only GCN 1D Cards? ID: 23211 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 23222 - Posted: 2 Feb 2012, 20:12:54 UTC - in response to Message 23211. ATI is not clear yet, since the development of the app is not finished. MrS Scanning for our furry friends since Jan 2002 ID: 23222 · Rating: 0 · rate: / Reply Quote

nenym Send message Joined: 31 Mar 09 Posts: 137 Credit: 1,431,087,071 RAC: 58,001 Level Scientific publications	Message 24258 - Posted: 5 Apr 2012, 12:48:00 UTC I did a comparation of longrun tasks crunched by GTX560Ti CC2.1 (host ID 31329, 875GHz) and GTX570 (host ID 101638, 730GHz). Host with GTX570 is running Linux, all cores free for GPUGRID, SWAN_SYNC set. Host with GTX560Ti is running Win XP x64, no CPU core free, Swan_sync not used. .................................Run time................theoretical RAC task type...........GTX560i...GTX570......GTX560i...GTX570..Ratio (Run time) PAOLA...............61 500......35 600.......126 400.....219 000.....1.74 NATHAN_CB1....25 700......14 700.......120 400.....210 500.....1.75 GIANNI...............54 200......30 000.......116 200.....201 000.....1.8 NATHAN_FAX4..74 800.......34 000........82 470.....181 440.....2.2 What to think about it? PAOLA, NATHAN_CB1 and GIANNI tasks uses all CUDA cores of GTX560Ti CC2.1 or anything different? ID: 24258 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 24261 - Posted: 5 Apr 2012, 15:02:33 UTC - in response to Message 24258. The GTX560i has 384shaders of which 256 are usable by GPUGrid, due to the science requirements/projects app and GPU architecture. A reference GTX560i has a GPU @ 822MHz A reference GTX570 has a GPU at 732MHz and 480 shaders. 480732/256822=1.67. Your slightly higher ratio can be explained by SWAN_SYNC being used on Linux. All this has been known for some time, hence my table with Correction factors: Correction Factors Used CC1.1 = 1.00 CC1.2 = 1.30 CC1.3 = 1.40 CC2.0 = 1.35 CC2.1 = 0.90 1.35/0.9=1.5=2/3 of the shaders. Take GPU frequency into the equation and you can see it all adds up: 822/732=1.12, 1.5*1.12=1.68 FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 24261 · Rating: 0 · rate: / Reply Quote

Robert Gammon Send message Joined: 28 May 12 Posts: 63 Credit: 714,535,121 RAC: 0 Level Scientific publications	Message 26209 - Posted: 7 Jul 2012, 17:21:36 UTC - in response to Message 23179. Is it time for another update of this table, especially with a new batch of cards out to replace most of the 200 series cards (increasingly hard to find) ID: 26209 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 26232 - Posted: 8 Jul 2012, 20:36:09 UTC - in response to Message 26209. The GeForce 600 series cards are about 60% more efficient in terms of performance per Watt, compared to the GeForce 500 series. There are several issues. The range is quite limited - we only have 3 high end cards (690, 680 and 670). These are all CC3.0 and all expensive. At the low end there is the GT 640 (a recent release), and some GK OEM cards. We are still lacking the mid range cards, which tend to make up a large portion of cards here. When they turn up we might be in a position to start comparing performances and price. The present cards have a variety of clock rates, making relative comparisons somewhat vague, and I cannot see the GPU freq. as this is not reported in the tasks 'Stderr output' file when run with the 4.2 app, and the 600 cards can't run tasks on the 3.1app. The old issue of system setup is still there, but now has an additional dimension; the same GPU in a low end system with a low clock CPU is going to under perform compared to a well optimized system with a high end CPU. Now we also have to consider PCIE3 vs PCIE2 performance. Thus far this has not been done for here (anyone with a PCIE3 system post up PCIE3 vs PCIE2 performances). There is also the issue of so many different task types. The performance of each varies when comparing the 500 to 600 series cards. I think it's likely that GPUGrid will continue to run many experiments, so there will continue to be a variety of tasks with different performances. This will make a single table more difficult and may necessitate maintenance for accuracy. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 26232 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 26236 - Posted: 8 Jul 2012, 21:09:33 UTC - in response to Message 26232. In the past there hasn't been much of a performance difference between PCIe 2 at 16x and 8x lanes, so I suspect 16 lanes at PCIe 2 or 3 won't matter much either. We'd "need" significantly faster GPUs (or smaller WUs) to make the difference count more. MrS Scanning for our furry friends since Jan 2002 ID: 26236 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 26238 - Posted: 8 Jul 2012, 22:02:50 UTC - in response to Message 26236. Going back to the GTX200 series, that was the unmeasured speculation - x16 vs x8 didn't matter. With the Fermi cards it was measured. Zoltan said he saw a ~10% drop in performance from PCIE2 x16 to x8 on a GTX480, for some tasks. For x4 vs x16 the drop was more like 20%. ref. I think I eventually came to a similar conclusion and did some measurements. With a new app, new performances, and new cards it's worth a revisit. That said, present tasks utilize the GPU more on the new app. I think GPU memory might be better utilized for some tasks and generally speaking the CPU is used less. While tasks will change, this suggest less importance for PCIE at present. How much that offsets the increase in GF600 GPU performance remains to be seen/measured. What I'm not sure about is the controller. How much actual control does it have over the lanes? Can it allocate more than 8 lanes when other lanes are free; is a 16 lane slot absolutely limited to 8 lanes when the first slot is occupied, for example (on an x16, x8 board but both having x16 connectors)? FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 26238 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 26239 - Posted: 8 Jul 2012, 22:41:02 UTC - in response to Message 26238. What I'm not sure about is the controller. How much actual control does it have over the lanes? Can it allocate more than 8 lanes when other lanes are free; is a 16 lane slot absolutely limited to 8 lanes when the first slot is occupied, for example (on an x16, x8 board but both having x16 connectors)? It could be limited by the motherboard as well, but according to ark.intel.com, both Ivy Bridge CPU line, the socket 1155 CPUs and the socket 2011 CPUs have only one PCIe 3.0 x16 lane, configurable in three variations: 1x16, 2x8, 1x8 & 2x4. There are simply not enough pins on the CPU socket to support more PCIe lanes. So if a MB had two (or more) real PCIe 3.0 x16 connectors, it would have to have one (or more) PCIe 3.0 x16 bridge chip (like the one the GTX 690 has on board). I don't know how the AMD CPUs support PCIe 3.0, but I think they are the same. ID: 26239 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 26241 - Posted: 8 Jul 2012, 23:10:19 UTC - in response to Message 26239. Last modified: 8 Jul 2012, 23:23:27 UTC I thought 2011 had 40 PCIE3 lanes? That should be enough for 2*16. I know 1155 can only support 1GPU@x16 or two @x8, even if it is PCIE3 (by Bios hack/upgrade), though the on die IB effort should be faster. AMD CPU's don't support PCIE3 at all! Only their GPU's are PCIE3. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 26241 · Rating: 0 · rate: / Reply Quote