Nvidia GT300

Author	Message
GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 13122 - Posted: 10 Oct 2009, 20:27:15 UTC - in response to Message 13119. Well, I would think that none of your ATI card could actually run faster than a GTX285 for our application. If you mean that you get more credits, this is just due to overcrediting by the projects. Something that it is bound to change long term. gdf ID: 13122 · Rating: 0 · rate: / Reply Quote

Hydropower Send message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level Scientific publications	Message 13128 - Posted: 10 Oct 2009, 22:57:29 UTC - in response to Message 13119. Last modified: 10 Oct 2009, 22:58:30 UTC Hi Paul, you wrote: why is waving around a "real" card that much more impressive than waving about a "fake" one If I show my Alcoholics Anonymous membership card and proclaim 'Here is my real American Express Gold card' that may not cast such a good impression in my local Gucci store. If I pass my real American Express Gold card to the cashier and say 'could you please wrap the purse as a gift' in the same Gucci store, I may have more credibility. If you claim to have something, it can be wise to actually show it. Especially if your audience consists of investors and important business relations. If you do not have it it may be unwise to show a mockup and pretend it is the real thing. I am not against NVidia, mind you, but this was not a wise move. ID: 13128 · Rating: 0 · rate: / Reply Quote

DJStarfox Send message Joined: 14 Aug 08 Posts: 18 Credit: 16,944 RAC: 0 Level Scientific publications	Message 13132 - Posted: 11 Oct 2009, 3:59:29 UTC - in response to Message 12961. Wow, did you read about their Nexus plugin for Visual Studio? Now you can have a machine/GPU state debugger for CUDA applications in an integrated development environment. Heck, I might even take on GPU coding; this stuff may be a good job skill to have in the coming years. ID: 13132 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 13136 - Posted: 11 Oct 2009, 5:45:02 UTC - in response to Message 13128. If you claim to have something, it can be wise to actually show it. Especially if your audience consists of investors and important business relations. If you do not have it it may be unwise to show a mockup and pretend it is the real thing. I am not against NVidia, mind you, but this was not a wise move. Having worked with electronics for years I can also assure you that waving a "working" version of a card around with no anti-static protection can turn that "working" card into a dead one in moments. The cases you used are interesting but not parallel. There is no risk, other than to have your card remotely scanned to waving it about in a crowd. There is a huge risk to a chunk of electronics. You see value to the exercise and I do not ... simple as that, and we are never going to agree ... :) ID: 13136 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level Scientific publications	Message 13137 - Posted: 11 Oct 2009, 5:52:10 UTC - in response to Message 13122. Well, I would think that none of your ATI card could actually run faster than a GTX285 for our application. If you mean that you get more credits, this is just due to overcrediting by the projects. Something that it is bound to change long term. I have not yet tried to compare something that is more apples to apples but the comparisons of the cards relative speeds can be gesstimated with a relative comparison with Collatz on the two cards which would be more SP to SP ... I know my 4870s are 3-4 times faster than the 260 cards for MW, in part because of the weakness of the 260 cards in DP capability ... but my understanding is that same carries through to a lesser extent with the 260 cards and the 4870s with Collatz. The new 5870 is about 2x faster than the 4870 ... and is shipping now ... Yes the GTX300 will redress this balance somewhat, how much is still not known as the card is not shipping yet ... then we also get into that whole price to performance thing ... ID: 13137 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 13143 - Posted: 11 Oct 2009, 12:16:11 UTC - in response to Message 13137. Last modified: 11 Oct 2009, 12:55:27 UTC There is a fundamental flaw with NVidia. They sell GPU’s that are excellent for gamers and for crunchers, but they are expensive due to research and design of cutting edge technology. Partially as a result of this, they don’t sell well against Intel in the low end market – where most of the sales occur. Unfortunately for NVidia Intel have proprietary rights on chip design and can therefore hold NVidia to random – and have been doing so for some time! With a year of financial instability it is little wonder that the manufactures of a panicle technological design are struggling. When you are faced with competitors that are capable of flexing considerable muscle (buy our GPUs or you wont get our CPU’s) and governments who are scared to help NVidia, you are in a difficult position! It is likely that in Europe Intel could be fined for their present actions, but by the time that happens there might be two CPU manufacturers and the same two GPU manufacturers. By most accounts the G200 range was difficult and expensive to manufacture so it would be an unnecessary financial burden on the company to try to keep these production lines running. Now is not the time to produce a card that will not sell at a profit! It seems sensible that they are cutting manufacturing back now, several months before the release of the G300 based GPU’s. I expect the new G300 line will require the full attention of NVidia – it could make or break the company. To keep manufacturing lots of pointless old lines of technologies, as Intel do with their CPU’s, would definitely spell the end for NVidia. ID: 13143 · Rating: 0 · rate: / Reply Quote

Hydropower Send message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level Scientific publications	Message 13162 - Posted: 13 Oct 2009, 14:33:50 UTC - in response to Message 13128. This is just a FYI, I do not want to start a new discussion. Paul, I agree we disagree and join you in crunching :) http://www.nordichardware.com/news,10006.html Which eventually links to : http://www.xbitlabs.com/news/video/display/20091002130844_Nvidia_Admits_Showing_Dummy_Fermi_Card_at_GTC_Claims_First_Graphics_Cards_on_Track_for_Q4_2009.html ID: 13162 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 13165 - Posted: 13 Oct 2009, 17:50:50 UTC - in response to Message 13162. So by the end of the year there will be some G300 out (I hope more than few). gdf ID: 13165 · Rating: 0 · rate: / Reply Quote

chumbucket843 Send message Joined: 22 Jul 09 Posts: 21 Credit: 195 RAC: 0 Level Scientific publications	Message 13259 - Posted: 22 Oct 2009, 22:10:03 UTC - in response to Message 13143. There is a fundamental flaw with NVidia. They sell GPU’s that are excellent for gamers and for crunchers, but they are expensive due to research and design of cutting edge technology. Partially as a result of this, they don’t sell well against Intel in the low end market – where most of the sales occur. Unfortunately for NVidia Intel have proprietary rights on chip design and can therefore hold NVidia to random – and have been doing so for some time! With a year of financial instability it is little wonder that the manufactures of a panicle technological design are struggling. When you are faced with competitors that are capable of flexing considerable muscle (buy our GPUs or you wont get our CPU’s) and governments who are scared to help NVidia, you are in a difficult position! It is likely that in Europe Intel could be fined for their present actions, but by the time that happens there might be two CPU manufacturers and the same two GPU manufacturers. By most accounts the G200 range was difficult and expensive to manufacture so it would be an unnecessary financial burden on the company to try to keep these production lines running. Now is not the time to produce a card that will not sell at a profit! It seems sensible that they are cutting manufacturing back now, several months before the release of the G300 based GPU’s. I expect the new G300 line will require the full attention of NVidia – it could make or break the company. To keep manufacturing lots of pointless old lines of technologies, as Intel do with their CPU’s, would definitely spell the end for NVidia. thats a little out there to think a company will fail from one generation. AMD is still with us. 5870 is not 2x faster. it is bottlenecked by bandwidth specifically L1 cache. on milky way the card gets 2tflops which is a perfect match for the 2TB/s of bandwidth.nvidias l1 bandwidth on fermi should be 3TB/s so the card will be very fast. something that should be noted about ati's architecture is that it was designed for dx10, not gpgpu. not all applications can take full advantage of vliw or vectors. nvidia has spent very little r&d lately. gt200 and g92 were very small tweaks. fermi is the same basic architecture with more programmablity and cache. how gpugrid performs with ati is a mystery. its going to come down to bandwidth and ilp. ID: 13259 · Rating: 0 · rate: / Reply Quote

zpm Send message Joined: 2 Mar 09 Posts: 159 Credit: 13,639,818 RAC: 0 Level Scientific publications	Message 13262 - Posted: 23 Oct 2009, 2:03:04 UTC - in response to Message 13259. well said. ID: 13262 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 13270 - Posted: 24 Oct 2009, 18:50:53 UTC - in response to Message 13259. I did not say anyone would fail. I suggested that a plan to save the company was in place and this does not include selling old line GPUs! Nice technical insight but businesses go bust because they cant generate enough business to pay off their debts in time, not because of the technical details of a future architecture. ID: 13270 · Rating: 0 · rate: / Reply Quote

Gipsel Send message Joined: 17 Mar 09 Posts: 12 Credit: 0 RAC: 0 Level Scientific publications	Message 13429 - Posted: 9 Nov 2009, 19:27:49 UTC - in response to Message 13052. We will take advantage of the 2.5 Tflops single precision. gdf You expect Fermi to hit more than 2.4GHz shader clock? The extra MUL is gone with Fermi. The theoretical peak throughput for single precision is just: number of SP * 2 * shader clock That means for the top model with 512 SPs and a clock of 1.7GHz (if nv reaches that) we speak about 1.74 TFlop/s theoretical peak in single precision. ID: 13429 · Rating: 0 · rate: / Reply Quote

Gipsel Send message Joined: 17 Mar 09 Posts: 12 Credit: 0 RAC: 0 Level Scientific publications	Message 13430 - Posted: 9 Nov 2009, 19:48:24 UTC - in response to Message 13137. I have not yet tried to compare something that is more apples to apples but the comparisons of the cards relative speeds can be gesstimated with a relative comparison with Collatz on the two cards which would be more SP to SP ... Collatz does not run a single floating point instruction on the GPU. It's pure integer math. Nvidias are currently slower there because 32bit integer math are not exactly one of the strengths of nvida GPUs and their memory controller and the cache system has more difficulties with the random accesses necessary there (coalescing memory accesses is simply not possible). But integer operations get a significant speedup with Fermi and the new cache system as well as the new memory controller should be able handle their tasks also much better than current nvidia GPUs. With Fermi I would expect a significant speedup (definitely more than factor 2 compared to a GTX285) for Collatz. How bad nvidia currently does there (originally I expected them to be faster than ATI GPUs) is maybe more clear when I say that the average utilization of the 5 slots of each VLIW unit of the ATI GPUs is actually only ~50% (MW arrives at ~4.3/5, i.e. 86% on average). That is also the reason the GPUs consume less power on Collatz than with MW. ID: 13430 · Rating: 0 · rate: / Reply Quote

Gipsel Send message Joined: 17 Mar 09 Posts: 12 Credit: 0 RAC: 0 Level Scientific publications	Message 13431 - Posted: 9 Nov 2009, 20:55:36 UTC - in response to Message 13259. 5870 is not 2x faster. it is bottlenecked by bandwidth specifically L1 cache. [..] nvidias l1 bandwidth on fermi should be 3TB/s so the card will be very fast. For some problems it is 2x as fast (or even more), just think of MW and Collatz. And the L1 bandwidth didn't change per unit und clock (each SIMD engine can fetch 64 Bytes per clock from L1, and there are twenty of them ;). That means it isn't more a bottleneck as it was with the HD4800 series. What wasn't scaled as well is the L2 cache bandwidth (only size doubled) and the memory bandwidth. I don't know where you got the L1 bandwidth figure for Fermi from, but it is a bit speculative to assume every L/S unit can fetch 8 bytes per clock. Another estimate would be 16 SMs * 16 L/S units per SM * 4 bytes per clock = 1024 Bytes per clock (Cypress stands at 1280 Bytes/clock but at a significantly lower clockspeed and more units). With a clock of about 1.5GHz one would arrive at roughly half of your figure. on milky way the card gets 2tflops which is a perfect match for the 2TB/s of bandwidth. As said, a HD5870 has only about 1.1 TB/s L1 cache bandwidth. The 2 TB/s figure someone came up with is actually adding the L1 cache and shared memory bandwidth. And I can tell you that it is nothing to consider at all for the MW application. First, the MW ATI applications doesn't use the shared memory (the CUDA version does, but I didn't found it useful for ATIs) and second, the MW applications are so severly compute bound that the bandwidth figures doesn't matter at all. Depending on the problem one has between 5 and 12 floating point operations per fetched byte (not per fetched value), and we are speaking about double precision operations. A HD5870 is coming close to about 400 GFlop/s (double precision) over at MW, that means with the longer WUs (consuming less bandwidth than the shorter ones) one needs only about 33GB/s L1 bandwidth. Really nothing to write home about. That is a bit different with Collatz which is quite bandwidth hungry, not exactly cache bandwidth hungry but memory bandwidth hungry. A HD5870 peaks somewhere just below 100GB/s used bandwidth. And that with virtually random accesses (16Bytes are fetched by each access) to a 16 MB buffer (larger than all caches), which is actually a huge lookup table with 2^20 entries. Quite amazing a HD5870 is able to pull that off (from some memory bandwidth scaling experiments with a HD4870 I first thought it would be a bottleneck). Obviously the on chip buffers are quite deep so they can find some consecutive accesses to raise the efficiency of the memory controllers. Contrary to nvidia the coalescing of the accesses are apparently not that important for ATI cards. something that should be noted about ati's architecture is that it was designed for dx10, not gpgpu. Actually Cypress was designed for DX11 with a bit of GPGPU in mind which is now even part of the DirectX11 specification (DX compute shader). In fact DX11 required that the shared memory gets doubled compared to what is available on the latest DX10.x compatible cards. gt200 and g92 were very small tweaks. fermi is the same basic architecture with more programmablity and cache I would really oppose this statements as Fermi is going to be a quite large step to be considered the same architecture. how gpugrid performs with ati is a mystery. its going to come down to bandwidth and ilp. I guess GDF will best know what GPUGrid stresses most and if there are particular weaknesses of the architectures. Generally GPUGrid does some kind of molecular dynamics, which have the potential to run fast on ATI hardware if some conditions are met. In the moment ATI's OpenCL implementation is lacking the image extension which really helps the available bandwidth for quite some usage scenarios. And OpenCL is far from being mature right now. That means the first ATI applications may very well not show the true potential the hardware is capable of. And if ATI cards can match nvidia's offerings here is of course dependent on the details of the actual code and the employed algorithms to the same extent as it is dependent on the hardware itself ;) ID: 13431 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 13432 - Posted: 9 Nov 2009, 22:25:07 UTC - in response to Message 13431. As soon as we get a H5870 I will be able to tell you. Maybe we can have a chat in case it is not fast to see where is the problem. gdf ID: 13432 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 13454 - Posted: 10 Nov 2009, 14:29:46 UTC - in response to Message 13432. We got access to a 4850, and you were right shared memory is still emulated via global memory, so of not use. gdf ID: 13454 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 13688 - Posted: 24 Nov 2009, 18:31:46 UTC - in response to Message 13454. We got access to a 4850, and you were right shared memory is still emulated via global memory, so of not use. gdf Not sure I am picking this up right. Am I right in thinking that the HD4850's bottleneck is due to using System memory? ID: 13688 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 13861 - Posted: 10 Dec 2009, 12:52:53 UTC - in response to Message 13688. NVIDIA has released another GT300 card. They did it in the only way they know how. They Re-branded the GT220 and called it a GeForce 315! Fortunately, it is OEM only. So shoppers need only look out for the existing range of p0rkies. The first effort was particularly special. The GeForce 310 uses DDR2. At this rate dont count out an AGP comeback. ID: 13861 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 13872 - Posted: 10 Dec 2009, 21:03:05 UTC - in response to Message 13861. LOL!!! I guess this put's nVidias comment "You'll be surprised once the Geforce 300 lineup is complete" into a whole new light.. MrS Scanning for our furry friends since Jan 2002 ID: 13872 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 13874 - Posted: 11 Dec 2009, 9:05:28 UTC - in response to Message 13872. Please nobody buy a G315 ever. gdf ID: 13874 · Rating: 0 · rate: / Reply Quote