NVidia GPU Card comparisons in GFLOPS peak

Message boards : Graphics cards (GPUs) : NVidia GPU Card comparisons in GFLOPS peak
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 40860 - Posted: 13 Apr 2015, 13:08:08 UTC

Thanks: which is the correct thread?
ID: 40860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40861 - Posted: 13 Apr 2015, 13:51:52 UTC - in response to Message 40860.  

"Number Crunching"
ID: 40861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John C MacAlister

Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 40862 - Posted: 13 Apr 2015, 14:11:36 UTC

Many thanks, Betting Slip! Boy, did you ever stir some memories with the team name "Radio Caroline"!! :)-
ID: 40862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Gammon

Send message
Joined: 28 May 12
Posts: 63
Credit: 714,535,121
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41179 - Posted: 28 May 2015, 13:22:50 UTC

I wish that the web site would allow us to mine the dataset as apparently some have been able to do.

Wish God would give us the ability to see the differences in runtimes over a long period versus the GPUs that we have and the ones we may buy sometime

For example,
I recently added a EVGA GTX960 (4G RAM about USD $240)
I wish to compare the performance of this pair on GPU work units directly

At this point only GTX960, two GTX760, and one GTX550 exist
ID: 41179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41181 - Posted: 28 May 2015, 16:32:53 UTC - in response to Message 41179.  
Last modified: 28 May 2015, 16:48:25 UTC

Robert, while running NOELIA_ETQ_bound tasks your GTX960 is ~11.7% faster than your GTX760.
There isn't a GTX550, but there is a GTX 550 Ti, 192Cuda Cores but GF116 40nm and Compute Capable 2.1 691 GFlops peak (622 with correction factor applied albeit back in 2012).
The performance of a GTX550Ti would be around 15% of the original GTX Titan, or to put it another way your GTX960 would be ~4 times as fast as the GTX550Ti.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xixou

Send message
Joined: 8 Jun 14
Posts: 18
Credit: 19,804,091
RAC: 0
Level
Pro
Scientific publications
watwatwat
Message 41281 - Posted: 10 Jun 2015, 8:17:46 UTC

10-06-15 09:59:20 | | CUDA: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, CUDA version 7.5, compute capability 5.2, 4096MB, 3065MB available, 6611 GFLOPS peak)

10-06-15 09:59:20 | | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, device version OpenCL 1.2 CUDA, 12288MB, 3065MB available, 6611 GFLOPS peak)

ID: 41281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xixou

Send message
Joined: 8 Jun 14
Posts: 18
Credit: 19,804,091
RAC: 0
Level
Pro
Scientific publications
watwatwat
Message 41282 - Posted: 10 Jun 2015, 8:18:00 UTC

10-06-15 09:59:20 | | CUDA: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, CUDA version 7.5, compute capability 5.2, 4096MB, 3065MB available, 6611 GFLOPS peak)

10-06-15 09:59:20 | | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN X (driver version 353.12, device version OpenCL 1.2 CUDA, 12288MB, 3065MB available, 6611 GFLOPS peak)

ID: 41282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41294 - Posted: 11 Jun 2015, 9:49:20 UTC - in response to Message 41282.  
Last modified: 26 Aug 2016, 12:32:08 UTC

This was updated with the GTX 980Ti using limited results of one task type only (subsequent observations show that performance of different cards varies by task type; with some jobs scaling better than others):

    Performance	GPU			Power	GPUGrid Performance/Watt
    211%    	GTX Titan Z (both GPUs)	375W	141%
    156%    	GTX Titan X             250W	156%
    143%		GTX 980Ti		250W	143%
    116% 		GTX 690 (both GPUs)	300W	97%
    114%    	GTX Titan Black		250W	114%
    112%    	GTX 780Ti		250W	112%
    109%		GTX 980			165W	165%		
    100%		GTX Titan		250W	100%
    93%		GTX 970			145W	160%		
    90%		GTX 780			250W	90%
    77% 		GTX 770			230W	84%
    74% 		GTX 680			195W	95%
    64% 		GTX 960 		120W	134%
    59% 		GTX 670			170W	87%
    55% 		GTX 660Ti		150W	92%
    53%		GTX 760 		130W	102%
    51% 		GTX 660			140W	91%
    47%     	GTX 750Ti		60W	196%
    43% 		GTX 650TiBoost		134W	80%
    37% 		GTX 750			55W	168%
    33%		GTX 650Ti		110W	75%
    
    

Throughput performances and Performances/Watt are relative to a GTX Titan.
Note that these are estimates and that I’ve presumed Power to be the TDP as most cards boost to around that, for at least some tasks here. Probably not the case for GM200 though (post up your findings).

When doing this I didn’t have a full range or cards to test against every app version or OS so some of this is based on presumptions based on consistent range observations of other cards. I’ve never had a GTX750, 690, 780, 780Ti, 970Ti or any of the Titan range to compare, but I have read what others report. While I could have simply listed the GFLOPS/Watt for each card that would only be theoretical and ignores discussed bottlenecks (for here) such as the MCU load, which differs by series.

The GTX900 series cards can be tuned A LOT - either for maximum throughput or less power usage / coolness / performance per Watt:
For example, with a GTX970 at ~108% TDP (157W) I can run @1342MHz GPU and 3600MHz GDDR or at ~60% TDP (87W) I can run at ~1050MHz and 3000MHz GDDR, 1.006V (175W at the wall with an i7 crunching CPU work on 6 cores).
The former does more work, is ~9% faster than stock.
The latter is more energy efficient, uses 60% stock power but does ~ 16% less work than stock or ~25% less than with OC'ed settings.
At 60% power but ~84% performance the 970 would be 34% more efficient in terms of performance/Watt. On the above table that would be ~214% the performance/Watt efficiency of a Titan.

I expected the 750Ti and 750 Maxwell's also boost further/use more power than their reference specs suggest, but Beyond pointed out that although they do auto-boost they don't use any more power for here (60W). It's likely that they can also be underclocked for better performance/Watt, coolness or to use less power.

The GTX960 should also be very adaptable towards throughput or performance/Watt but may not be the choicest of cards in that respect.

Note that system setup and configuration can greatly influence performance and performance varies with task types/runs.

PM me with errors/ corrections.


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 41294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Thomas H.V. DUPONT

Send message
Joined: 20 Jul 14
Posts: 732
Credit: 130,089,082
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 41304 - Posted: 12 Jun 2015, 5:37:43 UTC - in response to Message 41294.  

Thanks for this new update skgiven :)
[CSF] Thomas H.V. Dupont
Founder of the team CRUNCHERS SANS FRONTIERES 2.0
www.crunchersansfrontieres
ID: 41304 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 21 Apr 13
Posts: 3
Credit: 54,953,606
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 41327 - Posted: 13 Jun 2015, 21:33:55 UTC

Hi,

Just a question : is GPUGrid in SP or DP ?

I Though it was SP, but I have done some math and found that it it seems to be in DP.

I ran some long tasks, which are ~5 TFLOP, and ran it on my GTX 750 Ti OC in 108000 secondes, that lead to ~46 GFLOPS, which is the peak in DP of my card.

Am I right ? If yes, why do the Titan and Titan black, which have a strong DP, seem to be weaker on GPUGrid than high-end GTX 900 cards which have 10 times less DP ?

As I planned to buy a Titan Black for GPUGrid, I'm very interested if you have the explanation.

Thanks !
ID: 41327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt
Avatar

Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41328 - Posted: 13 Jun 2015, 21:36:55 UTC - in response to Message 41327.  

I can confirm that this project is indeed SP.
ID: 41328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41329 - Posted: 13 Jun 2015, 22:41:06 UTC - in response to Message 41327.  
Last modified: 13 Jun 2015, 22:46:05 UTC

Just a question : is GPUGrid in SP or DP ?

The GPUGrid app does most of its calculations in SP, on the GPU. The rest (in DP) is done on the CPU.

I Though it was SP, but I have done some math and found that it it seems to be in DP.

I ran some long tasks, which are ~5 TFLOP, and ran it on my GTX 750 Ti OC in 108000 secondes, that lead to ~46 GFLOPS, which is the peak in DP of my card.

Am I right ?

No. For a more elaborated answer perhaps you should share your performance calculation, and we could try to figure out what's wrong with it.

... why do the Titan and Titan black, which have a strong DP, seem to be weaker on GPUGrid than high-end GTX 900 cards which have 10 times less DP ?

A Titan Black nearly equals a GTX 780Ti from GPUGrid's point of view.
DP cards like the Titan Black usually have lower clocks than their gaming card equivalent, and/or the ECC makes the card's RAM run slower.

As I planned to buy a Titan Black for GPUGrid,...

Don't buy a DP card for GPUGrid. A Titan X is much better for this project, or a GTX980Ti, as it has a higher performace/price ratio than a Titan X.

I'm very interested if you have the explanation.

Now you have it too. :)

Thanks !

You're welcome.
ID: 41329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 21 Apr 13
Posts: 3
Credit: 54,953,606
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 41330 - Posted: 14 Jun 2015, 2:09:21 UTC - in response to Message 41329.  

For a more elaborated answer perhaps you should share your performance calculation, and we could try to figure out what's wrong with it.


I am currently running a long task ( https://www.gpugrid.net/result.php?resultid=14262417 ). BOINC estimates its size at 5 000 000 GFLOP, and ETA is a bit more than 20 hours.

GPU is a GTX 750 Ti, GPU load is 90-91%. I don't run any task on CPU (i7 2600K) in order to evaluate the speed of the card on this task.

If I am right, 5 000 000 divided by 72 000 (number of seconds in 20 hours) = 69 and GTX 750 Ti is given for 1 300 GFLOPS (well, a bit more, as mine is a bit overclocked).

I expected 5 000 000 / 1 300 = 3850 seconds (a few more, because I suppose CPU, which run slower than GPU, is a bottleneck), so what's wrong with my GPUGrid understanding ?
ID: 41330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41331 - Posted: 14 Jun 2015, 3:18:23 UTC - in response to Message 41330.  

For a more elaborated answer perhaps you should share your performance calculation, and we could try to figure out what's wrong with it.


I am currently running a long task ( https://www.gpugrid.net/result.php?resultid=14262417 ). BOINC estimates its size at 5 000 000 GFLOP, and ETA is a bit more than 20 hours.

GPU is a GTX 750 Ti, GPU load is 90-91%. I don't run any task on CPU (i7 2600K) in order to evaluate the speed of the card on this task.

If I am right, 5 000 000 divided by 72 000 (number of seconds in 20 hours) = 69 and GTX 750 Ti is given for 1 300 GFLOPS (well, a bit more, as mine is a bit overclocked).

I expected 5 000 000 / 1 300 = 3850 seconds (a few more, because I suppose CPU, which run slower than GPU, is a bottleneck), so what's wrong with my GPUGrid understanding ?


For a typical computer, the CPU runs at about 4 times the clock speed of the CPU. However, typical GPU programs are capable of using many of the GPU cores at once. GPUs have a varying number of GPU cores - usually more for the more expensive GPUs, currently with a maximum of about 3000 GPU cores per GPU. A GTX 750 Ti has 640 GPU cores. A GTX Titan Z board has 5760 GPU cores, but only because it uses 2 GPUs.

A CPU can have multiple CPU cores, with the number being as high as 12 for the most expensive CPUs. However, BOINC CPU workunits usually use only one CPU core each.
ID: 41331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 41332 - Posted: 14 Jun 2015, 11:30:18 UTC - in response to Message 41330.  

I am currently running a long task ( https://www.gpugrid.net/result.php?resultid=14262417 ). BOINC estimates its size at 5 000 000 GFLOP, and ETA is a bit more than 20 hours.

GPU is a GTX 750 Ti, GPU load is 90-91%. I don't run any task on CPU (i7 2600K) in order to evaluate the speed of the card on this task.

If I am right, 5 000 000 divided by 72 000 (number of seconds in 20 hours) = 69 and GTX 750 Ti is given for 1 300 GFLOPS (well, a bit more, as mine is a bit overclocked).

I expected 5 000 000 / 1 300 = 3850 seconds (a few more, because I suppose CPU, which run slower than GPU, is a bottleneck), so what's wrong with my GPUGrid understanding ?


If the only factors playing the key roles were the tasks' computational load and the cards' performance, then I guess the tasks' computation times would be roughly equal to [task GFLOP] / [card GFLOP/s]. But of course, this is not an ideal world :)

Off the top of my head I can think of the following factors:

    Each task type's degree of parallelism. The power of GPUs is in the sheer number of computing cores. The more cores a task can utilize, the more the performance will reach the theoretical levels.

    Each task type's need for main memory accesses. The GPU can perform its operations much quicker if it doesn't need to access the card's main memory

    It follows from the above, that the size of the in-GPU cache and its speed can have a major effect on the actual performance.

    The PCI-Express performance. The card eventually finishes with the computational work it has been assigned and then a) the results of the computation must be transferred back to the host and b) new computational work must be transferred from the host to the card. During these times, the card sits idle, or at least is not too busy doing real work.

    At least for GPUGrid, the CPU has to do some significant work too. For my (new!) GTX 970, the acemd process consumes ~25% of a logical core. This means a) there must be some CPU resources available, and b) the CPU has to provide some "acceptable" levels of performance. Doing other CPU-bound computational work (e.g. CPU BOINC tasks) can have a significant effect.

    System memory. Especially if other CPU-bound work is running, all these tasks will need to frequently access the main memory and this access has specific limited performance. Many tasks needing access simultaneously effectively lower the memory access bandwidth for each task. The increased load on the memory controller also effectively increases memory latency.



These are just factors that I could readily think of, there may be others too that come into play (e.g. CPU cache size and speed). Computational performance is a difficult, complex topic. High Performance Computing especially more so! I'm telling you this as a professional software engineer who has spent a lot of time improving the performance of software...


ID: 41332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 21 Apr 13
Posts: 3
Credit: 54,953,606
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 41333 - Posted: 14 Jun 2015, 13:00:21 UTC

Thank you, so my idea of a bottleneck was not so far from the truth (it seems to be MANY bottlenecks instead !).

It should be great to know what exactly slow the GPU computation and how we can improve the speed of the card.

I run BOINC on a SSD, which is far faster than a classic HDD. Maybe using a RAMDisk should improve a bit more the total speed of the task ?

For the RAM, using a non-ECC high frequency low latency RAM ?

And for the GPU itself, does the bandwidth heavily affect the computation ? High-end GPU use a 384 bits bandwidth, and past high-end used as high as 512 bits (GTX 280...).

Does compute capability affects the speed ?

It is very sad to see that average computation speed is so far from theoretical GFLOPS peak. This is a waste of computational time and energy.
ID: 41333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41339 - Posted: 14 Jun 2015, 17:46:24 UTC - in response to Message 41333.  
Last modified: 14 Jun 2015, 18:03:45 UTC

I run BOINC on a SSD, which is far faster than a classic HDD. Maybe using a RAMDisk should improve a bit more the total speed of the task ?

It won't be noticeable, as the speed of the HDD(SSD) subsystem matters only when a task is starting/ending/making a checkpoint.

For the RAM, using a non-ECC high frequency low latency RAM ?

Yes.

And for the GPU itself, does the bandwidth heavily affect the computation ? High-end GPU use a 384 bits bandwidth, and past high-end used as high as 512 bits (GTX 280...).

Not heavily, but it's noticeable (up to ~10%). But the WDDM overhead is much more a bottleneck especially for high-end cards.
My GTX 980 in a Windows XP host is only 10% slower than a GTX980Ti in a Windows 8 host.

Does compute capability affects the speed ?

The actual compute capability which the client is written to use matters.
The GPUGrid client is the most recent among BOINC projects, as it is written in CUDA6.5.

It is very sad to see that average computation speed is so far from theoretical GFLOPS peak.
This is a waste of computational time and energy.

It's not as far as you think.
In your calculation you took the 5.000.000 GFLOPs from BOINC manager / task properties, but this value is incorrect, as well as the result.
I suppose from the task's runtime that you had a GERARD_FXCXCL12. This workunit gives 255000 credits, including the 50% bonus for fast return, so the basic credits given for the FLOPs is 170000. 1 BOINC credit is given for 432 GFLOPs, so the actual GFLOPs needed by the task is 73.440.000. Let's do your calculation again with this number (which is still an estimate).
73.440.000 GFLOPs / 1.300 GFLOPS = 56492.3 seconds =15h 41m 32.3s
73.440.000 GFLOPs / 72.000 sec = 1020 GFLOPS

FLOPS stands for FLOating Point Operations Per Second. (its the speed of computation)
FLOPs stands for FLOating Point OPerations (its the "total" number of operations done)

The 1 BOINC credit for 432 GFLOPs comes from the definition of BOINC credits.
It says that 200 cobblestones (=credits) are given for 1 day work on a 1000 MFLOPS computer.
1000 MFLOPS = 1 GFLOPS.
200 credits for 24h at 1 GFLOPS
200 credits for 86400sec at 1 GFLOPS
200 credits for 86400 GFLOPs
1 credit for 432 GFLOPs
ID: 41339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41345 - Posted: 15 Jun 2015, 1:31:28 UTC - in response to Message 41339.  

For the RAM, using a non-ECC high frequency low latency RAM ?

Yes.

Especially for the graphics board RAM; much less for the CPU RAM (they are separate except for some low-end graphics boards).

ID: 41345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 41346 - Posted: 15 Jun 2015, 1:44:16 UTC - in response to Message 41332.  

The PCI-Express performance. The card eventually finishes with the computational work it has been assigned and then a) the results of the computation must be transferred back to the host and b) new computational work must be transferred from the host to the card. During these times, the card sits idle, or at least is not too busy doing real work.


Not always true for CUDA workunits. With recent Nvidia GPUs, it's possible for CUDA workunits to transfer data to and from the graphics memory at the same time that the GPU is performing calculations on something already in the graphics memory. This requires starting some of the kernels asynchronously, though. I don't know if GPUGRID offers any workunits that do this, or if this is also possible for OpenCL workunits.

ID: 41346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 41351 - Posted: 15 Jun 2015, 8:18:18 UTC - in response to Message 41346.  

Not always true for CUDA workunits. With recent Nvidia GPUs, it's possible for CUDA workunits to transfer data to and from the graphics memory at the same time that the GPU is performing calculations on something already in the graphics memory. This requires starting some of the kernels asynchronously, though. I don't know if GPUGRID offers any workunits that do this, or if this is also possible for OpenCL workunits.


Indeed I would think that not all the card's memory is being accessed by the GPU at the same time, so some part(s) of it could be updated without stopping the GPU. But to avoid data corruption you would need exclusive locks at some level (range of addresses, banks, whatever). Depending mostly on timing and I would guess the cleverness of some algorithm deciding which parts of the memory it would make available for external changes, these changes could happen without the GPU stopping at all. With such schemes however, you generally get better latency (in our case, the CPU applying the changes it wants with a shorter delay), but also lower overall throughput (both the CPU and GPU access fewer memory addresses over time).
ID: 41351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

Message boards : Graphics cards (GPUs) : NVidia GPU Card comparisons in GFLOPS peak

©2025 Universitat Pompeu Fabra