Advanced search

Message boards : Graphics cards (GPUs) : GTX 590 coming?

Author Message
Hypernova
Send message
Joined: 16 Nov 10
Posts: 22
Credit: 24,712,746
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 20327 - Posted: 1 Feb 2011 | 12:07:17 UTC
Last modified: 1 Feb 2011 | 12:07:46 UTC

It seems we may get in February a new card the GTX 590 that will be a dual 580 card. All cores (1024 total) will be active but frequencies a little lower to cut consumption and heat. It will be interesting to see how they will behave on GPUGrid.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20329 - Posted: 1 Feb 2011 | 13:19:38 UTC - in response to Message 20327.

Español:
Hola: Será muy interesante, pero hay que tener presente que no son 1024 núcleos en realidad es 16+16 SMs = 512+512 Shaders es como la anterior GTX295 (240+240 que es la mía) lo que permite ejecutar DOS tareas el mismo.

Supongo que en la practica doblará el rendimiento, pero la estructura interna es muy diferente en especial la distribución de las unidades de cálculo. Saludos

Ingles:
Hello: It will be very interesting, but should be present are not 1024 cores actually is 16 + 16 SMs = 512 + 512 Shaders is like the previous GTX295 (240 + 240 which is mine) allowing to run two tasks the same.

I suppose that in her practice it will double the performance, but the internal structure is very different in particular the distribution of units of calculation. Best regards

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20331 - Posted: 1 Feb 2011 | 14:40:13 UTC - in response to Message 20329.

It will allow some single PCIE slot users to basically have 2 GPUs, and dual slot users to have more than 2 GPUs. Good news. My guess is that they will be slightly more energy efficient than the GTX580 cards (performance per Watt). Hopefully they will drive the price of other cards down too; the GTX580 is still far too rich for many, and if the GTX570 is only the 3rd fastest Fermi then those prices may see an early fall too.

Hypernova
Send message
Joined: 16 Nov 10
Posts: 22
Credit: 24,712,746
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 20332 - Posted: 1 Feb 2011 | 21:23:28 UTC - in response to Message 20329.

Español:
Hola: Será muy interesante, pero hay que tener presente que no son 1024 núcleos en realidad es 16+16 SMs = 512+512 Shaders es como la anterior GTX295 (240+240 que es la mía) lo que permite ejecutar DOS tareas el mismo.

Supongo que en la practica doblará el rendimiento, pero la estructura interna es muy diferente en especial la distribución de las unidades de cálculo. Saludos

Ingles:
Hello: It will be very interesting, but should be present are not 1024 cores actually is 16 + 16 SMs = 512 + 512 Shaders is like the previous GTX295 (240 + 240 which is mine) allowing to run two tasks the same.

I suppose that in her practice it will double the performance, but the internal structure is very different in particular the distribution of units of calculation. Best regards


You are right. You won't have 1024 Cores available only for one task. It will be two GPU at 512 core each, and crunching each one a separate task.

The reason I was mentioning 1024 cores, was to say that Nvidia had to arbitrate for consumption and thermal reason, to either reduce the number of active cores per GPU or reduce the frequencies. In my opinion the fact that they preferred to keep all cores active, is I think the better variant for crunching.

For game playing were framerates (fast cycle times) are paramount, probably choosing to keep frequencies high would have been best.

But maybe I am wrong.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20351 - Posted: 6 Feb 2011 | 15:21:41 UTC - in response to Message 20332.

AMD will be showcasing Bulldozer and Antilles at CeBit, which will be in Hannover from 1st to 5th March this year.

NVidia are now expected to release the GTX595 sometime after this; no doubt they want to test Antilles so that they can fine tune their GTX595 to outperform Antilles, at least in some promotional way.

So my guess is that a GTX595 will turn up by the middle of the year.

With high end Sandy Bridge CPU's, Bulldozer, Antilles and a GTX595 en route, this will be a big year for big CPU's and very big GPU's.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20488 - Posted: 21 Feb 2011 | 9:40:35 UTC

For games and regarding shaders it doesn't matter if you increase shader count or frequency. The task is "embarrassingly parallel", meaning even at just 1024x786 we've got 0.8 million pixels per frame and could to first approximation make use of just as many shaders in parallel (this number will go down by a factor of 20 or so if you consider pipeline depth, but still safe). And regarding frequency: we need frames at Hz region, whereas GPU frequencies are in the MHz region.

There are some parts of the GPU, however, whose number is not reduced upon cutting shaders. So these will work faster in a card with less higher clocked shaders, as they'll get higher frequencies too.

And regarding the GTX595 rumors: even a double GTX570 exceeds the 300 W power wall set by the PCIe specification by quite a bit. And they're already down to ~0.95 V, so there's not much room left at the bottom for TSMCs 40 nm process. I could imagine 2 full GF110 at no more than 0.90 V and frequencies below GTX570, making them about as fast as GTX570 SLI. It's going to be a powerful and interesting card, but don't expect miracles ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20648 - Posted: 11 Mar 2011 | 18:32:46 UTC - in response to Message 20488.
Last modified: 11 Mar 2011 | 18:59:24 UTC

Anticipated release date is the 22nd March, according to several reports - only 11 days.
It will be interesting to see what the performance is, as we have not seen a dual-Fermi of any kind. One things for sure, at the suggested entry price I won't be the first to fork out. Hopefully the other Fermi's will start to be more reasonably priced, I fancy a GTX570, but only when the price is right.

I can't see NVidia releasing a sub 300W GF110 dual-Fermi; the Radeon HD 6990 uses 350W (or 450W if you flick the performance switch).

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20649 - Posted: 11 Mar 2011 | 21:57:34 UTC - in response to Message 20648.

Agreed - if they follow the HD6990 they can probably put out a decent dual GF110, i.e. without castrating it too much to stay within 300 W.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20683 - Posted: 17 Mar 2011 | 17:59:31 UTC - in response to Message 20649.
Last modified: 17 Mar 2011 | 18:02:24 UTC

This one is funny.

Perhaps a bit closer to the real thing.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20685 - Posted: 17 Mar 2011 | 20:11:54 UTC

LOL!

MrS
____________
Scanning for our furry friends since Jan 2002

alephnull
Send message
Joined: 8 Jul 09
Posts: 13
Credit: 306,850,267
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 20691 - Posted: 18 Mar 2011 | 4:07:34 UTC - in response to Message 20683.

hopefully these come out soon. i was waitin for them but when it got delayed again my patience ran out and just got the 580s. anyone have guesstimations on what the 590s may go for? i feel a second mortgage comin in the near future...

any speculation on the cpu usage for gpugrid with these cards? since they will be able to crunch 2 wu each, will that mean 2 cpu cores per card?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20692 - Posted: 18 Mar 2011 | 7:08:51 UTC - in response to Message 20691.

You're about right on the cost - mortgage territory.

Yeah, with 2 GPU's, you would want two CPU cores/threads disabled if you use swan_sync (which you would do, of course).

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20705 - Posted: 18 Mar 2011 | 18:54:55 UTC - in response to Message 20692.
Last modified: 23 Mar 2011 | 14:15:37 UTC

Hexus suggest a very reasonable TDP of 365W. This might mean the 622MHz I read elsewhere (607) is real and might lay some doubt about if it will outperform a 6990. That said I still think it will be 50% faster than a single GTX580.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20709 - Posted: 18 Mar 2011 | 21:50:44 UTC

Well, if you'd need a mortgage to afford one.. you probably shouldn't ;)

Anyway, I fail to be impressed by these cards (rumored GTX590 and HD6990). In my opinion it's pushing single cards too far, especially for serious 24/7 crunching. I'd rather see more flexible use of more cards of the GTX580 caliber (card arrangement & cases, PCIe slots & their spacing).

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20711 - Posted: 18 Mar 2011 | 23:38:51 UTC - in response to Message 20709.

Hi, My experience with a GTX295 is very good (on Linux and Windows) better than its equivalent in SLI, lower consumption and easy installation and OC quite broad, also suggests in mount 4 GPU and is not nonsense.
When the change will almost certainly be a GTX590 (if I can get a credit ...). Greetings.

Jeremy
Send message
Joined: 15 Feb 09
Posts: 55
Credit: 3,542,733
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 20715 - Posted: 19 Mar 2011 | 3:37:41 UTC
Last modified: 19 Mar 2011 | 3:38:50 UTC

The dual GPU on a single card configurations have certain advantages. Ability to install in a single PCIe slot motherboard is one, but another (the one interests me, frankly) is water cooling. A single watercooling block is typically about $110-$120 or so. With proper cooling, there's little reason a dual-GF110 card won't be able to achieve full GTX580 clock speeds if your power supply can handle it. Once you factor in the costs of the waterblock(s), it might even be less expensive than two 570/580s.

Time will tell, but I'm interested to see what nVidia actually brings to the table.
____________
C2Q, GTX 660ti

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20718 - Posted: 19 Mar 2011 | 15:21:58 UTC - in response to Message 20715.
Last modified: 24 Mar 2011 | 13:08:33 UTC

Most people that buy these cards will probably remove the heatsink and fans, and go straight to water cooling. If it's TDP is only 365W and the user has a PCIE2 slot then the system can supply up to 450W to the card – plenty of headroom to overclock to at least the GTX580 ref of 772MHz, probably more saying as these will be the sweetest of cores.
My concern would be the potential lack of capacitors on the card (to save on power). Hope NVidia did not scrimp in this area, but you never know. I doubt it, but if there is a switch such as with the 6990 then perhaps it could be a 365W/440W card.
I also read a suggestion that only 1000 of these would be made available at the outset. A very low number, suggesting a very high price tag, but I think in the long run many more will be released.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20778 - Posted: 24 Mar 2011 | 12:52:28 UTC - in response to Message 20718.
Last modified: 27 Mar 2011 | 21:56:03 UTC

Ref specs:

512Shaders per GPU (1024 total)
Memory: 3072 MB DDR5 (1536MB per GPU)
Core Clock :607 MHz
Shader Clock: 1215 MHz
Memory Clock : 3414 MHz
Memory Interface : 768 bit (384 bit per GPU)
DirectCompute 5.0 and OpenCL support
Microsoft DirectX 11 support
NVIDIA PhysX -Ready
Quad SLI Ready
Three dual-link DVI + mini-displayport connectors
Power consumption: 365W

It's GF110 (Rev 1A), so it should work here straight out of the box (using the 267.71 driver). Of course ASUS already have one listed at 612 MHz, so expect some variation. GPUz image
Prices from £550 ($700~ish, 600 Euro)

Review by Ryan Smith at AnandTech.
Going by this Folding graph it should outperform a GTX580 by around 55% for crunching here or at Folding, and if OC'd might get to about 190% of a ref GTX580.

Profile liveonc
Avatar
Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 20780 - Posted: 24 Mar 2011 | 20:28:14 UTC

There's also this review from benchmarkreviews.com

At first it seamed like pulling rabbits out of a hat, sold as a "miracle". Since the specs were (copy/paste from benchmarkreviews):

Graphics Clock: 607 MHz
Processor Clock: 1215 MHz
Memory Clock: 854/3414 MHz
Thermal Design Power: 365 Watts

But even thought the benchmarks point to 1.5 of GTX580 SLI. The TDP of a GTX 580 SLI is 492 Watts (2x246) & 1.5 of that is 369 Watts.

So where is this "miracle"???
____________

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 20787 - Posted: 26 Mar 2011 | 23:06:35 UTC - in response to Message 20780.

There is a major factor lurking outside the marketing/fanboy hype of both the 590 and 6990, and those considering a long term purchase of high cards may well like to consider it if not really known before. Both the 590 and 6990 are cobbled together designs patching over the cracks resulting from the 32nm fabrication being canned - they had no choice but to release new designs on 40nm. AMD were in a better position to slip the Nothern Islands design to 40nm, NVidia had a bigger hassle as it was still getting over the Fermi debacle. The result is of course two fast cards, but both are hobbled by the fact they had to go to 40nm.

Both AMD and NVidia are due to release 28nm designs this year (allegedly late second quater - probably means by Xmas in reality). The designs are already proven and not vaporware. In September last year NVidia claimed they would release the Kepler card - 28nm - end second quater this year, but given NVidia are always late on Marketing promises, Xmas is probably it. AMD will be ready to go by then with 28nm Northern Islands as they usually do product refreash mid-end fourth quater.

The import of this is a step change in performance of 3-6 times current levels (NVidia even claim Maxwell, due in 2013 on 28nm, will be 15x current levels). Normaly I personaly would go for the cards that meet the need now, there is always something better round the corner, and you could end up waiting forever. In this instance however, the releases later this year come with a massive step change in performance. That would also explain NVidias claim of only releasing 1000 590's.

Both Companies were caught by the 32nm fabrication failure, AMD arguably came out better on balance in that scramble over the last 2-3 years, but the real change where ears need to prick up is the 28nm fabrication process. The latter will put both AMD and NVidia on a level playing field for the first time in three years, and will deliver massive performance increases along with reduced power and heat.

If individuals need a high end card now, thats life, but if the need is not critical, waiting to see the 28nm options develop makes sense. Lots of hype flying around re 590/6990 at present, all driven by respective marketing - but thats the wrong war. 40nm is dead now, the 6990 & 590 are the last of the 40nm line, cobbled together designs that were originally seen as 32nm. The real war starts again at 28 nm and the first battle is about to start in a few months. So unless a purchase of a high end card is urgent, there is for once, a real case for waiting until the 28nm picture is clarified.

Horses for Courses, we all have our individual needs and drivers, but if individuals were not fully aware of the 40 / 32 / 28nm fabrication saga, its worth pausing to see what it means for them - 28nm is only a few months away ....

Regards
Zy

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20788 - Posted: 27 Mar 2011 | 0:57:07 UTC - in response to Message 20787.

Perspective,

AMD cards do not work here, so any discussion of them is not yet relevant - hopefully some day, but not yet. Anyway, whoever the manufacturer I'm not forking out £500 for a GPU.

I very much doubt that NVidia will release a 28nm GPU in the 2nd quarter this year. They may have a prototype ready in the 3rd Q. but I can't see a 28nm release any time this year.

While I would expect a speed bump with a move to 28nm, I seriously doubt that any 28nm NVidia card will ever outperform a GTX590 by fifteen times.

Still, I'm in no hurry to replace my GTX470's with 500 series cards; the GTX470 is still the best GPU in terms of performance per purchase price, if you can still get one. Maybe next year.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20790 - Posted: 27 Mar 2011 | 13:30:44 UTC

28 nm cards should be a lot more attractive.. but don't expect miracles!

For Cayman ATI changed the fudamental shader architecture (-> different software optimization -> expensive!) for 10% more efficiency per transistor. Sure, more improvements will be made, but we've already reched a quite mature state.

And without architetural revolutions (which would mostly need software to catch up to be useful) there's brute force "more transistors, more shaders" left. They will gain some more headroom at 28 nm, but unless any power efficiency miracle happened (which didn't happen at TSMC 40 nm and didn't happen for Intel at 32 nm, just regular solid improvements), I don't see how they could suddenly use 2 to 3 times as many transistors. Sure, at 28 nm they can pack twice as many transistors into the same area as before, but the power savings from the new process alone will not be 2x, i.e. twice as many transistors will require a lot more power than the old design.

You can't have both, more transistors and significantly reduced power consumption. You have to choose one of them. And I think we all know how hard nVidia was pushing Fermi already.. so upping power consumption further is not really a promising option.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20791 - Posted: 27 Mar 2011 | 15:23:20 UTC - in response to Message 20787.

The transition from 55nm to 40nm, and creating a working brand new Fermi architecture took NVidia more than a year (that was the GTX 480-470-465 line), and then another 9 months to reach the original goal of 512 shaders (GTX 580-570). Then it took them 3 months to select the chips for a twin-chip "world's fastest GPU" which is clearly made for one purpose: to gain in NVidia's prestige. If a new GPU made with the 28nm fabrication process was about to release, why would NVidia (and AMD) waste their time and resources to come up with something like the GTX 590? (which is clearly more expensive to produce than they sell it, that's why they limit it's production)
Changing from 40nm litography to 28nm means doubled transistor count on the same die area size, (or half sized chips, which results higher yield = cheaper dies) but twice as much transistor at the same clock speed and voltage dissipates twice as much heat, which would be impossible to do below 90°C. The only way to maintain the heat dissipation without reducing the speed is to reduce the voltage to 71% of the 40nm's, but to achieve this there is more technological development needed than shrinking the transistors. The real basis of performance gain is how much the chip's voltage can be reduced. I guess it will be around 90%, which means 81% TDP of the 40nm chip per transistor. If so, then the overall performance gain on a single (GPU) chip could be around 23.5% (=1/0.81) over the 40nm chips, because the heat dissipation already reached it's peak. If NVidia is about to create such a large die as the Fermi was, they will be facing similar difficulties to release a working product of it. If they were going to release a smaller chip, this would not be much faster, but would cost less. The latter would not be much motivation for enthusiasts to buy this new GPU.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20793 - Posted: 27 Mar 2011 | 21:05:51 UTC - in response to Message 20791.

A GPU with many small efficient cores could be the answer to significantly increase crunching and server farm performance per Watt, but such cards would not sell to gamers, or be of much use as entry level GPUs. Perhaps market changes will open gateways to such cards and the development of software such as OpenCL/OpenGL will drive either NVidia and/or ATI towards such cards, in a similar way to Intel being driven towards developing a range of small efficient CPU cores for server farms.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20794 - Posted: 27 Mar 2011 | 21:45:06 UTC

Well said, Retvari. I've just got to add one detail: in a new process node power the reduction in power consumption not only comes from reduced voltage. The smaller transistors usually also require less power for switching because, well, they're smaller (simply said). This adds to the voltage scaling but is difficult to predict (and TSMCs 40 nm was rather dissappointing in this regard).

@SK: well.. that's actually what a modern GPU is. You can view the individual "shader multiprocessors" in Fermis as individual "cores". Making them even more fine-granular (i.e. each one smaller) would actually cause more overhead for instruction decoding, distribution etc. Utilization may be better, but you'll get less shaders / computation power per transistor. That's why the SMs actually grew even bigger in CC 2.1.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20795 - Posted: 27 Mar 2011 | 22:03:09 UTC - in response to Message 20794.

What I meant is having more cores, as in the GTX590 has two cores. A card with 4, 8 or 16 might well be a better performer in terms of computational power per Watt. I'm not talking about 3Billion transistors per core though, perhaps 1B or 1.5B at 28/22nm for 8 cores.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20797 - Posted: 28 Mar 2011 | 10:07:39 UTC - in response to Message 20794.
Last modified: 28 Mar 2011 | 10:31:01 UTC

The smaller transistors usually also require less power for switching because, well, they're smaller (simply said).

Well, this is *almost* true, and a transistor is only a part of the chip. It's quite logical, that smaller transistors require less power for switching, but on a chip there are parasitic capacitors everywhere on the entire area of the chip. A capacitor's (practical or parasitic in our case, their physics is similar to each other) capacitance is in direct ratio to it's electrode's surface area, but in reciprocal ratio of the electrode's distance. So if a transistor's area is reduced, this is good. But with this reduction the transistor's semiconductor part is also gets thinner, which is bad (and also good, because it's lowers the voltage needed to switch between logical levels). Also, the wiring comes closer to each other, which is also bad. Moreover a chip built thinner litography could contain more components, connected by (overall) longer wires, which is also bad.
The only way to overcome this bad factors is creating better insulation between the conductors (dielectric in terms of a capacitor).

This adds to the voltage scaling but is difficult to predict (and TSMCs 40 nm was rather dissappointing in this regard).

I agree with that.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20798 - Posted: 28 Mar 2011 | 10:13:43 UTC

BTW I have ordered a Gainward GTX 590 yesterday, and it will be delivered tomorrow.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 20799 - Posted: 28 Mar 2011 | 12:09:52 UTC - in response to Message 20798.

We, as a lab, are waiting to buy the new 28nm cards and PCI3 bus motherboards (for parallel runs). We are expecting a factor at least 2 in performance for kepler because the Fermi core design is now mature (see g80 to g200).

gdf

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20801 - Posted: 28 Mar 2011 | 21:01:49 UTC

Well said again, Retvari. I think I could add some more, but don't see a need to do, especially since we'd be getting OT.

@SK: oh you mean distributing the shaders among several smaller chips (I don't think we should use the term "core" for GPUs, as it's too unspecific).

That could give you several benefits regarding power efficiency:
- You could physically spread the chips out, so you've got more space for cooling. This generally means higher performance cooling is possible. Reduced chip temperatures lead to higher power efficiency.

- You could spread the memory bandwidth demands among several narrower busses, which are themselves easier to implement and can provide more aggregate bandwidth than the current solutions, allowing for lower memory clocks.

- You could bin the chips more carefully (less variation in 200 mm² than in 500 mm²) and drive them at optimal working points (voltage & clock).

- Yield would improve (which has no direct influence on power efficiency).

I do see some problems, though:
- You already said such cards would be worse at gaming due to non-ideal scaling in SLI / Crossfire. Well, screw gaming.. but there's more to it. The same principle limits performance at HPC applications. For BOINC we'd be fine with that, since each chip could run its own code without any need for communication with the other ones. However, such GPUs are not built for this market alone, but rather for the entire HPC crowd. And here you've got many problems which you want to solve as fast as possible. Think of a single time step at GPU-Grid - you can not run this across different GPUs efficiently because the communication would be too slow. Far slower than on-chip. Even if we could get the bandwidth between chips (e.g. using optical chip-to-chip communication) this would still add latency, which would make the multi-chip solution slower than the fat chip (everything else being equal). High-bandwidth long-range communication can also quickly become a power hog, reducing efficiency.

- In a multi-chip solution you still want to produce only a single type of chip (lithography mask sets are really expensive). Therefore you'd duplicate units, which you need only once for a single card. You'd need more transistors to get the same job done, which usually reduces power efficiency.

- If your individual shader multiprocessors in a Fermi execute different code (or just different stages of the same code), the requirements regarding cache-size and -bandwidth and memory bandwidth may differ. In the fat chip you get automatic load balancing accross all shared ressources. In the multi-chip solution you loose this advantage, potentially reducing power efficiency.

All in all I think it's probably not worth it, as long as the fat chips can be handled. Anyway, worth thinking about :)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20802 - Posted: 28 Mar 2011 | 21:20:07 UTC - in response to Message 20799.

We, as a lab, are waiting to buy the new 28nm cards and PCI3 bus motherboards (for parallel runs). We are expecting a factor at least 2 in performance for kepler because the Fermi core design is now mature (see g80 to g200).

gdf

This is a realistic expectation, but fifteen times as Zydor said is irrealistic. I expected that before NVidia releases kepler, they transferring the GF110 to 28nm, as they did it with the GT200 (65 to 55nm), and release a dual GPU card built on this smaller, faster, cooler, cheaper chip. But neither NVidia, nor ATI released their new dual GPUs on 28nm, so there must be some reason behind it (I mean technological difficulties at TSMC)

I've found the corresponding technological articles on wikipedia (in my previous post I *almost* recalled everything right... :)
http://en.wikipedia.org/wiki/Low-k_dielectric
http://en.wikipedia.org/wiki/High-k_dielectric
http://en.wikipedia.org/wiki/Silicon_on_insulator
Light reading :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20803 - Posted: 28 Mar 2011 | 22:54:03 UTC - in response to Message 20802.

Easy: TSMC 28 nm is not ready yet :p

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 20804 - Posted: 29 Mar 2011 | 2:43:06 UTC

The 15x times refered specifically the Maxwell chip in 2013, which will be on 22nm. NVidia claims, not mine. I share a healthy scepticism particularly as its 2+ years away and referred to a 2013 release, not relevant to Kepler, I dont believe it impossible, guess time will tell in 2+ years. It serves to illustrate NVidia claims and direction, but is not relevant to a 590 successor.

2X for Kepler is vertually a given in that such an increase is almost automatic in a move from 40nm to 28nm. How much more than that remains to be seen. I believe its almost certainly more, but lets wait for reality despite NVidia claims which center either side of 3X the Fermi series. However the minimum 2X is enough to at least give pause for thought in a 590 purchase, given time frames. How much thought is an individual matter, we all have different pressures, needs and wants.

Regards
Zy

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20806 - Posted: 29 Mar 2011 | 14:21:32 UTC

My GTX590 has arrived, and I've successfully installed it into this host. More exactly I replaced the GTX480 in this host with the new GTX590. I will not overclock it for the time being. So this host has now an overclocked GTX580 and a GTX590.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20807 - Posted: 29 Mar 2011 | 16:01:19 UTC - in response to Message 20806.
Last modified: 29 Mar 2011 | 16:52:55 UTC

Zoltan, could you post up the details here - thanks

In theory the GTX590 (ref clocks) should be about 50% more productive than the GTX580. I noticed that the GTX580 is about 50% faster than the GTX470, which is about 27% faster than a GTX465. Anyway, we are now in the position that the top CC2.0 Fermi has about 290% the performance of the lowest at GPUGrid.
Should the GTX590 clock to the same freq as a ref GTX580 (772MHz) then a GTX590 could do about 369% the work of a GTX465.

In terms of maturity the GTX500 series cards are definitely more energy efficient than their GTX400 counterparts; in terms of performance per Watt a GT240/GT340 could outperform a GTX470, but this is not the case with the CC2.0 GTX500 series cards. I still think that more refinements are possible and these would be augmented with a move to 28nm. I just hope they do not take the architectural frailties of CC2.1 to 28nm.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20809 - Posted: 29 Mar 2011 | 17:43:50 UTC - in response to Message 20806.


Hello: Congratulations Zoltan, pays you 54% more than my GTX295.

I do not see clear for some time, is in the series Fermi not reporting good, in the detail of the task, the Number of cores: 128 instead of which has 512...?. Best regards.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20811 - Posted: 29 Mar 2011 | 21:21:00 UTC - in response to Message 20809.


Hello: Congratulations Zoltan, pays you 54% more than my GTX295.

I do not see clear for some time, is in the series Fermi not reporting good, in the detail of the task, the Number of cores: 128 instead of which has 512...?. Best regards.

Thank you!

This is a reporting bug in BOINC. BOINC assumes that a streaming multiprocessor has 8 CUDA cores, which is true for G80 and G200 GPUs, but this is changed to 32 (and to 48 in CC2.1 GPUs) in the Fermi architecture.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20812 - Posted: 30 Mar 2011 | 13:43:05 UTC - in response to Message 20799.

We, as a lab, are waiting to buy the new 28nm cards and PCI3 bus motherboards (for parallel runs). We are expecting a factor at least 2 in performance for kepler because the Fermi core design is now mature (see g80 to g200).

gdf

The ACEMD client is having trouble to keep the Fermi GPUs busy. I wonder how will the client keep those faster GPUs busy then?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20813 - Posted: 30 Mar 2011 | 15:41:20 UTC - in response to Message 20812.

The ACEMD client is having trouble to keep the Fermi GPUs busy.

I take it you mean the GPU utilization percentage is low?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20814 - Posted: 30 Mar 2011 | 18:15:02 UTC - in response to Message 20813.
Last modified: 30 Mar 2011 | 18:28:49 UTC

The ACEMD client is having trouble to keep the Fermi GPUs busy.

I take it you mean the GPU utilization percentage is low?

I mean the Fermi GPU utilization is pretty much CPU speed dependant, and WU type dependant. If these factors won't change for the better in the future, then the ACEMD client will need 8GHz+ CPU cores to feed the new GPUs.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20817 - Posted: 30 Mar 2011 | 19:09:20 UTC - in response to Message 20814.

I mean the Fermi GPU utilization is pretty much CPU speed dependant, and WU type dependant. If these factors won't change for the better in the future, then the ACEMD client will need 8GHz+ CPU cores to feed the new GPUs.


They could always go to larger molecules, increasing the amount of work per teim step for the GPU. But lower end cards would get choked by this.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 20818 - Posted: 30 Mar 2011 | 19:38:45 UTC - in response to Message 20817.

GPU utilization is 96-98% if using only the GPU and there will be no problem at least for a couple of generations. GPU utilization is low if you don't use SWAN_SYNC or for some workunits which uses a little of CPU. This last case will be less and less important as the new application moved even this tiny bit on GPU.

gdf

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20819 - Posted: 30 Mar 2011 | 19:45:33 UTC - in response to Message 20814.

It is the case that GPU's are progressing faster than CPU's, so any app that depends partially on a CPU will increasingly hinder the GPU performance, in the long run. However there is a bit more to it that brute force speed. Take for example your 2.8GHz Pent D - this 3 generation old processor is not actually as quick as a 2.4GHz IC2D (E6600). In turn a Wolfdale is about 10 to 15% faster than an equally clocked Conroe. Move on to an i3 and you see another increase in performance, and then again when you move to Sandy Bridge. These 32nm SB processors clocks well and 5GHz is not too hard to reach. By the time we move to 22nm GPU's, it will be old hat for CPU's and 5GHz will be common for high end CPUs. By then I would expect to see many app improvements, but even if there was not it would not be all doom and gloom, so long as you upgrade you CPU when you are upgrading your GPU.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20855 - Posted: 4 Apr 2011 | 23:46:05 UTC - in response to Message 20819.

I made a couple of experiments in the weekend, and I've concluded that pairing my GTX580@900MHz with an i3-560@3.33GHz shortens the processing time compared to my overclocked Core2Quad9560@4GHz (no wonder, the i3 has an integrated northbridge), but not as much as the Core2Quad9560 does compared to the Pentium D 820 (no wonder, it's quite old). But I still think the FPU of the Core i3 (i5, i7) is not faster than the FPU of the Core2, but the integrated northbridge boosts the performance (the lower the GPU utilization is, the higher the boost will be). Another side effect of this boost: I had to raise the voltage of the GTX580@900 to 1.083V when it was in the i3 MB, while it runs fine at 1.062V in the Core2Quad MB.
To be on topic: I've overclocked both GPUs of the GTX590 to 700MHz (fan at maximum, 72°C, noise is almost intolerable), and it seems to be running fine (in the Core2Quad MB, so I'm wondering how much overclocking it could take in the i3)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20888 - Posted: 9 Apr 2011 | 8:59:49 UTC - in response to Message 20855.

Yes, the FP execution units in 1st generation Core i CPUs are the same as in Core 2 CPUs. But due to numerous tweaks actual hardware utilization is better (depending on load, of course).

The higher voltage using the i3 is intersting. Could be increased GPU temperatures due to the higher GPU utilization, which reduces its maximum frequency at a given voltage slightly, in turn requiring more voltage for the same clock.
Or the PSU could be worse (e.g. stronger ripple).

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,532,759
RAC: 3,459,749
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 20889 - Posted: 9 Apr 2011 | 9:51:06 UTC - in response to Message 20888.
Last modified: 9 Apr 2011 | 10:10:35 UTC

I'm going on with my experiments in this weekend. This time I'm using a Core i5-2400 (3.1GHz/3.4GHz) in an Intel DH67BL MB. The GPU is a GTX580@900MHz. My experimental host has processed two TONI_AB WUs since I've changed the CPU and the MB, they took about 16.192 sec = 4h30m to finish. Now it's processing an IBUCH_1_mutEGFR, so I'll be able to compare the performance of the i3-560 and the i5-2400, when it'll finish.

Post to thread

Message boards : Graphics cards (GPUs) : GTX 590 coming?

//