NVidia GPU Card comparisons in GFLOPS peak

Message boards : Graphics cards (GPUs) : NVidia GPU Card comparisons in GFLOPS peak
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 17 · Next

AuthorMessage
jlhal

Send message
Joined: 1 Mar 10
Posts: 147
Credit: 1,077,535,540
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21221 - Posted: 22 May 2011, 10:18:54 UTC - in response to Message 21158.  

1st system Win 7 Pro x64:
Gigabyte GTX460 1GB
GeForce GTX 460 (driver version 27061, CUDA version 4000, compute capability 2.1, 962MB , 641 GFLOPS peak)

2nd system Xubuntu 11.04 AMD64:
Gigabyte GTX460SO 1GB
GeForce GTX 460 (driver version unknown , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak)


Minor precisions:

1st system Win 7 Pro x64: Gigabyte GTX460OC 1GB
GeForce GTX 460OC(driver version 2.70.61, CUDA version 4000, compute capability 2.1, 962MB (?) , 641 GFLOPS peak)

2nd system Xubuntu 11.04 AMD64: Gigabyte GTX460SO 1GB
GeForce GTX 460 SO(driver version 270.41.06 , CUDA version 4000, compute capability 2.1, 1024MB , 730 GFLOPS peak)
Lubuntu 16.04.1 LTS x64
ID: 21221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21293 - Posted: 3 Jun 2011, 7:28:02 UTC

I suspect that the 962 MB is 1 GB minus whatever Windows 7 has reserved for its own use.
ID: 21293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kate Cowles

Send message
Joined: 4 May 10
Posts: 3
Credit: 2,734,534
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 21304 - Posted: 5 Jun 2011, 1:54:16 UTC - in response to Message 20534.  

I run GPUgrid on a GT 430 under Ubuntu 10.04.2. WUs take about 32 hrs. so there's no problem getting them returned in time.

Thu 02 Jun 2011 07:48:48 AM CDT NVIDIA GPU 0: GeForce GT 430 (driver version unknown, CUDA version 4000, compute capability 2.1, 1023MB, 45 GFLOPS peak)


ID: 21304 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21305 - Posted: 5 Jun 2011, 10:06:59 UTC - in response to Message 21304.  

Hi Kate,
45 GFlops peak for a GT430 does not sound right, more like your Ion ;)
A GT430 should be reporting as about 180 GFlops peak, and that's after considering it can't use all the cuda cores, otherwise it would be about 270.
Perhaps you are using a very old version of Boinc (for Linux systems we cannot see the Boinc version)?
ID: 21305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kate Cowles

Send message
Joined: 4 May 10
Posts: 3
Credit: 2,734,534
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 21306 - Posted: 5 Jun 2011, 12:47:38 UTC - in response to Message 21305.  
Last modified: 5 Jun 2011, 12:48:21 UTC

Hi skgiven,

I'm using BOINC 6.10.17, so not all that old. Maybe I'll upgrade to 6.10.58 one of these days and see whether that makes a difference in the GFLOPs calculation.

I noticed on an earlier message in this thread that somebody was reporting 179 GFLOPs for this card. I had been startled when I first ran BOINC on it and saw the 45 GFLOPs (you're right -- not much different from my ION 2, which shows 39). The GT 430 runs GPUgrid WUs a little over 3 times as fast as the ION 2, so there is indeed something wrong with the GFLOPs calculation for the 430. It's slow -- but not THAT slow.

In any case, I'm pretty happy with the GT 430 for my purposes. It's not going to set any speed records, but it runs cool (nvidia-smi is showing it at 54 C right now while working at 98% capacity on a GPUgrid WU), and doesn't draw much power. It's double-precision, so I can run Milky Way. And it's just fine for testing and debugging CUDA code for Fermi cards.

Kate
ID: 21306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 214
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21307 - Posted: 5 Jun 2011, 13:00:54 UTC - in response to Message 21306.  

I'm using BOINC 6.10.17, so not all that old. Maybe I'll upgrade to 6.10.58 one of these days and see whether that makes a difference in the GFLOPs calculation.

Kate

Yes, that will explain it. The corrected GFLOPs calculation for Fermi-class cards (assuming 32 cores per multiprocessor, instead of 8) wasn't introduced until v6.10.45 - your version 6.10.17 dates back to October 2009, which is long before details of the Fermi range were available.

Note that your GT 430 actually has 48 cores per MP, so the calculation will still be a bit on the low side. BUT: - that only affects the cosmetic display of speed in the message log at startup. It doesn't affect the actual processing speed in any way.
ID: 21307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kate Cowles

Send message
Joined: 4 May 10
Posts: 3
Credit: 2,734,534
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 21308 - Posted: 5 Jun 2011, 15:00:31 UTC - in response to Message 21307.  


Yes, that will explain it. The corrected GFLOPs calculation for Fermi-class cards (assuming 32 cores per multiprocessor, instead of 8) wasn't introduced until v6.10.45 - your version 6.10.17 dates back to October 2009, which is long before details of the Fermi range were available.

Note that your GT 430 actually has 48 cores per MP, so the calculation will still be a bit on the low side. BUT: - that only affects the cosmetic display of speed in the message log at startup. It doesn't affect the actual processing speed in any way.


Thanks for the clear explanation. So BOINC v6.10.17 calculates correctly for the ION (which does have only 8 cores per MP) but not for the GT 430. Well, this gives me a reason (if only cosmetic) to upgrade to 6.10.58.
ID: 21308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21309 - Posted: 5 Jun 2011, 15:18:34 UTC - in response to Message 21306.  

It's double-precision, so I can run Milky Way.


You can - just don't do it, if you can help it ;)
Your "not-so-fast" card is only 1/12th its sp performance under dp. Any ATI is going to walk all over it. I think GPU-Grid and Einstein can make much better us of this (sp) power.

MrS
Scanning for our furry friends since Jan 2002
ID: 21309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 21311 - Posted: 5 Jun 2011, 20:49:26 UTC

Fri 03 Jun 2011 02:34:07 PM MDT Starting BOINC client version 6.12.26 for x86_64-pc-linux-gnu
.
.
.
Fri 03 Jun 2011 02:34:07 PM MDT NVIDIA GPU 0: GeForce GTX 570 (driver version unknown, CUDA version 3020, compute capability 2.0, 1279MB, 1425 GFLOPS peak)
ID: 21311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 23179 - Posted: 29 Jan 2012, 20:45:27 UTC - in response to Message 21311.  
Last modified: 26 Mar 2012, 21:42:09 UTC

The update is just to add three cards (GTX 560 448 Ti, GTX 560 and GT 545)

Relative Comparison of Recommended Cards, with approximated CC Correction Factor values (in brackets):

GTX 590 GF110 40nm Compute Capable 2.0 2488 GFlops peak (3359)
GTX 580 GF110 40nm Compute Capable 2.0 1581 GFlops peak (2134)
GTX 570 GF110 40nm Compute Capable 2.0 1405 GFlops peak (1896)
GTX 560 Ti 448 GF110 40nm Compute Capable 2.0 1311 GFlops peak (1770)
GTX 480 GF100 40nm Compute Capable 2.0 1345 GFlops peak (1816)
GTX 295 GT200b 55nm Compute Capable 1.3 1192 GFlops peak (1669)
GTX 470 GF100 40nm Compute Capable 2.0 1089 GFlops peak (1470)
GTX 465 GF100 40nm Compute Capable 2.0 855 GFlops peak (1154)
GTX 560 Ti GF114 40nm Compute Capable 2.1 1263 GFlops peak (1136)
GTX 285 GT200b 55nm Compute Capable 1.3 695 GFlops peak (973)
GTX 560 GF114 40nm Compute Capable 2.1 1075 GFlops peak (967)
GTX 275 GT200b 55nm Compute Capable 1.3 674 GFlops peak (934)
GTX 260-216 GT200b 55nm Compute Capable 1.3 596 GFlops peak (834)
GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 768MB (816)
GTX 460 GF104 40nm Compute Capable 2.1 907 GFlops peak 1GB (816)
GTX 550 Ti GF116 40nm Compute Capable 2.1 691 GFlops peak (622)
GTS 450 GF106 40nm Compute Capable 2.1 601 GFlops peak (541)
GT 545 GF116 40nm Compute Capable 2.1 501 GFlops peak (451)

This update is based on the previous table, and is by performance after considering compute capability, highest first. I have only included CC1.3, CC2.0 and CC2.1 cards, but the comparison was originally based on CC1.1 cards.

Only Reference clocks are listed and only for optimized cards by the recommended methods. As usual, there are accuracy limitations and it’s lifetime is limited by the apps/drivers in use. New cards were added based on GFlops peak and CC, rather than resurveying different cards (demonstrated to be reliable). Comparable but different systems (CPUs) were used, not all cards used the same drivers, only one task type was looked at. Some of these comparisons are adapted from when we used the 6.11app but the correction factors are still valid.

When a new app is released I will review these cards/values for consistency; normally the cards relative performances don't change too much, but on at least 2 occasions things did change in the past. For CC2.0 and CC2.1 it's less likely. When the next generation of NVidia turns up things will probably start changing, and obviously an ATI app would add a few cards.

Correction Factors Used

CC1.1 = 1.00
CC1.2 = 1.30
CC1.3 = 1.40
CC2.0 = 1.35
CC2.1 = 0.90

Thanks for the posts,
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 23179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile oldDirty
Avatar

Send message
Joined: 17 Jan 09
Posts: 22
Credit: 3,805,080
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwat
Message 23211 - Posted: 1 Feb 2012, 22:44:23 UTC

ATI means only GCN 1D Cards?
ID: 23211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 23222 - Posted: 2 Feb 2012, 20:12:54 UTC - in response to Message 23211.  

ATI is not clear yet, since the development of the app is not finished.

MrS
Scanning for our furry friends since Jan 2002
ID: 23222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nenym

Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,429,587,071
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24258 - Posted: 5 Apr 2012, 12:48:00 UTC

I did a comparation of longrun tasks crunched by GTX560Ti CC2.1 (host ID 31329, 875GHz) and GTX570 (host ID 101638, 730GHz). Host with GTX570 is running Linux, all cores free for GPUGRID, SWAN_SYNC set. Host with GTX560Ti is running Win XP x64, no CPU core free, Swan_sync not used.
.................................Run time................theoretical RAC
task type...........GTX560i...GTX570......GTX560i...GTX570..Ratio (Run time)
PAOLA...............61 500......35 600.......126 400.....219 000.....1.74
NATHAN_CB1....25 700......14 700.......120 400.....210 500.....1.75
GIANNI...............54 200......30 000.......116 200.....201 000.....1.8
NATHAN_FAX4..74 800.......34 000........82 470.....181 440.....2.2
What to think about it? PAOLA, NATHAN_CB1 and GIANNI tasks uses all CUDA cores of GTX560Ti CC2.1 or anything different?
ID: 24258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24261 - Posted: 5 Apr 2012, 15:02:33 UTC - in response to Message 24258.  

The GTX560i has 384shaders of which 256 are usable by GPUGrid, due to the science requirements/projects app and GPU architecture.
A reference GTX560i has a GPU @ 822MHz
A reference GTX570 has a GPU at 732MHz and 480 shaders.
480*732/256*822=1.67. Your slightly higher ratio can be explained by SWAN_SYNC being used on Linux.

All this has been known for some time, hence my table with Correction factors:
    Correction Factors Used

    CC1.1 = 1.00
    CC1.2 = 1.30
    CC1.3 = 1.40
    CC2.0 = 1.35
    CC2.1 = 0.90


1.35/0.9=1.5=2/3 of the shaders.
Take GPU frequency into the equation and you can see it all adds up:
822/732=1.12, 1.5*1.12=1.68


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 24261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Gammon

Send message
Joined: 28 May 12
Posts: 63
Credit: 714,535,121
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26209 - Posted: 7 Jul 2012, 17:21:36 UTC - in response to Message 23179.  

Is it time for another update of this table, especially with a new batch of cards out to replace most of the 200 series cards (increasingly hard to find)
ID: 26209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26232 - Posted: 8 Jul 2012, 20:36:09 UTC - in response to Message 26209.  

The GeForce 600 series cards are about 60% more efficient in terms of performance per Watt, compared to the GeForce 500 series.

There are several issues.
The range is quite limited - we only have 3 high end cards (690, 680 and 670). These are all CC3.0 and all expensive.
At the low end there is the GT 640 (a recent release), and some GK OEM cards.
We are still lacking the mid range cards, which tend to make up a large portion of cards here. When they turn up we might be in a position to start comparing performances and price.

The present cards have a variety of clock rates, making relative comparisons somewhat vague, and I cannot see the GPU freq. as this is not reported in the tasks 'Stderr output' file when run with the 4.2 app, and the 600 cards can't run tasks on the 3.1app.

The old issue of system setup is still there, but now has an additional dimension; the same GPU in a low end system with a low clock CPU is going to under perform compared to a well optimized system with a high end CPU. Now we also have to consider PCIE3 vs PCIE2 performance. Thus far this has not been done for here (anyone with a PCIE3 system post up PCIE3 vs PCIE2 performances).

There is also the issue of so many different task types. The performance of each varies when comparing the 500 to 600 series cards. I think it's likely that GPUGrid will continue to run many experiments, so there will continue to be a variety of tasks with different performances. This will make a single table more difficult and may necessitate maintenance for accuracy.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 26232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26236 - Posted: 8 Jul 2012, 21:09:33 UTC - in response to Message 26232.  

In the past there hasn't been much of a performance difference between PCIe 2 at 16x and 8x lanes, so I suspect 16 lanes at PCIe 2 or 3 won't matter much either. We'd "need" significantly faster GPUs (or smaller WUs) to make the difference count more.

MrS
Scanning for our furry friends since Jan 2002
ID: 26236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26238 - Posted: 8 Jul 2012, 22:02:50 UTC - in response to Message 26236.  

Going back to the GTX200 series, that was the unmeasured speculation - x16 vs x8 didn't matter. With the Fermi cards it was measured. Zoltan said he saw a ~10% drop in performance from PCIE2 x16 to x8 on a GTX480, for some tasks. For x4 vs x16 the drop was more like 20%. ref. I think I eventually came to a similar conclusion and did some measurements.

With a new app, new performances, and new cards it's worth a revisit. That said, present tasks utilize the GPU more on the new app. I think GPU memory might be better utilized for some tasks and generally speaking the CPU is used less. While tasks will change, this suggest less importance for PCIE at present. How much that offsets the increase in GF600 GPU performance remains to be seen/measured.

What I'm not sure about is the controller. How much actual control does it have over the lanes? Can it allocate more than 8 lanes when other lanes are free; is a 16 lane slot absolutely limited to 8 lanes when the first slot is occupied, for example (on an x16, x8 board but both having x16 connectors)?
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 26238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26239 - Posted: 8 Jul 2012, 22:41:02 UTC - in response to Message 26238.  

What I'm not sure about is the controller. How much actual control does it have over the lanes? Can it allocate more than 8 lanes when other lanes are free; is a 16 lane slot absolutely limited to 8 lanes when the first slot is occupied, for example (on an x16, x8 board but both having x16 connectors)?

It could be limited by the motherboard as well, but according to ark.intel.com, both Ivy Bridge CPU line, the socket 1155 CPUs and the socket 2011 CPUs have only one PCIe 3.0 x16 lane, configurable in three variations: 1x16, 2x8, 1x8 & 2x4. There are simply not enough pins on the CPU socket to support more PCIe lanes. So if a MB had two (or more) real PCIe 3.0 x16 connectors, it would have to have one (or more) PCIe 3.0 x16 bridge chip (like the one the GTX 690 has on board). I don't know how the AMD CPUs support PCIe 3.0, but I think they are the same.
ID: 26239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26241 - Posted: 8 Jul 2012, 23:10:19 UTC - in response to Message 26239.  
Last modified: 8 Jul 2012, 23:23:27 UTC

I thought 2011 had 40 PCIE3 lanes? That should be enough for 2*16. I know 1155 can only support 1GPU@x16 or two @x8, even if it is PCIE3 (by Bios hack/upgrade), though the on die IB effort should be faster.

AMD CPU's don't support PCIE3 at all! Only their GPU's are PCIE3.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 26241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 17 · Next

Message boards : Graphics cards (GPUs) : NVidia GPU Card comparisons in GFLOPS peak

©2025 Universitat Pompeu Fabra