Fermi

Message boards : Graphics cards (GPUs) : Fermi
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next

AuthorMessage
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15442 - Posted: 25 Feb 2010, 19:46:41 UTC - in response to Message 15439.  

Semi Accurate reported the following, with some caution.
GTX480: 512 Shaders @600MHz or 1200MHz high clock (hot clock)
GTX470: 448 Shaders @625MHz or 1250MHz high clock (hot clock)
If correct the GTX470 will be about 10% slower than the GTX480, but cost 40% less (going by another speculative report)!

I am not interested in how it compairs to ATI cards playing games, just the cost and that they don’t crash tasks!

Dont like the sound of only 5000 to 8000 cards being made. You could sell them in one country, and our wee country might be down the list a bit ;)
ID: 15442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15446 - Posted: 25 Feb 2010, 22:27:16 UTC

We started another discussion on Fermi in the ATI thread, so I'm going to reply here:

me wrote:
They desperately wanted to become the clear number 1 again, but currently it looks more like "one bird in the hand would have been better than two on the roof".

SK wrote:
If a bird in the hand is worth two in the bush, NVidia are releasing two cards, the GTX 480 and GTX 470, costing $679.99 and $479.99, how much is the bush worth?


I was thinking along the lines of
- Fermi is just too large for TSMCs 40 nm process, so even if they can get the yield up and get the via reliability under control, they can't fix the power consumption. So the chip will always be expensive and power constrained, i.e. it will be slowed down by low clock speeds due to the power limit.
- Charlie reports 448 shaders at 1.2 GHz for the current top bin of the chip.
- Had they gone for ~2.2 bilion transistors instead of 3.2 billion they'd get ~352 shaders and they'd get a much smaller chip, which is (a) easier to get out of the door at all and (b) yields considerably better. Furthermore they wouldn't be too limited by power consumption (they'd end up at ~200 W at the same voltage and clock speed as the current Fermi), so they could drive clock speed and voltage up a bit. At a very realistic 1.5 GHz they'd achieve 98% of the performance of the current Fermi chips.
- However, at this performance level they'd probably trade blows with ATI instead of dominating them, like a full 512 shader Fermi at 1.5 GHz probably would have been able to.
- The full 512 shader Fermi are the two birds on the roof. A 2.2 billion transistor chip is the one in the hand.. or at least in catching range.

I didn't read the recent posts and didn't have the time yet to reply to the ones I did read. Will probably find the time over the week end :)

MrS
Scanning for our furry friends since Jan 2002
ID: 15446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15453 - Posted: 26 Feb 2010, 3:00:53 UTC

Looks like Nvidia has planned another type of Fermi cards - the Tesla 20 series.

http://www.nvidia.com/object/cuda_home_new.html
ID: 15453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15457 - Posted: 26 Feb 2010, 9:04:48 UTC - in response to Message 15453.  

You just got to love their consistent naming schemes. But there's still hope the actual products will called 20x0, isn't it?

MrS
Scanning for our furry friends since Jan 2002
ID: 15457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile liveonc
Avatar

Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 15463 - Posted: 26 Feb 2010, 18:12:54 UTC - in response to Message 15428.  

Thanks for the interesting post SKGiven, on the politics of chip manufacturers. Not to make any claims of better knowledge, as just reading my other posts will reveal that I'm just a N00B with his own POV on things.

That some might point out that the largest profit is from sale of High End GPU sales, but that even slight profit, in the vast low to mid range would result in an overall higher gross profit, but that nice' markets can even be expanded as Apple has done. What the future brings is what the future brings.

AMD & Intel sells off their less perfect chips as cheaper models, close to perfect as extreme editions. Even if Nvidia does turn out crappier than desired Fermi, surely they can find use for them still...

I'm an Nvidia fan, don't feel the need to apologize for being being one, but I'm thankful that Ati is around to deal out blows on the behalf of Nvidia consumers.
ID: 15463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15469 - Posted: 27 Feb 2010, 9:26:13 UTC - in response to Message 15463.  

NVidia stopped producing their 55nm GTX285 chips. Lets hope they simply use existing improvements in technology, tweak the designs a bit, and move it to 40nm, with DDR5. If Asus can stick two GTX285 chips onto the one card, I think everyone would be able to if the chips were 40nm instead of 55nm. The cards would run cooler and clock better. A 20% improvement on two 1081GFlops (MADD+MUL, not Boinc) might not make it the fastest graphics card out there, but it would still be a top end card and more importantly, sellable and therefore profitable.

Fermi is too fat and NVidia made the mistake of thinking they could redesign and rebuild fabrication equipment as if it was a GPU core design. Bad management to let the project continue as it did. Firmi should have been shelved and NVidia should have been trying to make competitive and profitable cards, rather than an implausible monster card.

The concept of expanding the width of a wafer to incorporate 3.2billion transistors is in itself daft. Perhaps for 28nm technology, but not 40nm. It is about scalability. The wider the chip the greater the heat, and the more difficult it is to manufacturer. Hence the manufacturing problems, and low clocks (600-625MHz). Lets face it, one of my GT240s is factory clocked to 600MHz and it is just about a mid-range card. A GTX280 built on 65nm technology clocks in at 602MHz, from June 2008! A GTS250 clocks at 738MHz. If you go forward in one way but backwards in another, there is no overall progress.

NVidia should have went for something less bulky, and scale could have been achieved through the number of GPU chips on the card. You don’t see Intel or AMD trying to make such jumps between two levels of chip fab technology. Multi chips are the way of the future, not fat chips.

Why not 3 or 4 GTX285 chips at 40nm onto one card with DDR5? Its not as if they would have to rebuild the fabrication works.
Firmi wont be anything until it can be made at 28nm.
ID: 15469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15471 - Posted: 27 Feb 2010, 13:29:06 UTC - in response to Message 15469.  

I also want faster cards to crunch GPUIGrid with but just because Nvidia has had major delays doesn't mean the sky is falling :-)

Nvidia does not own nor is the one making changes to the fab equipement. TSMC, the fab, is currently producing wafers for ATI on 40 nm so while on one hand I agree that making a very large die is more difficult the fab plant and equipment is the same.

Was nvidia agressive in persuing their vision? Yes.
Did they encounter delays because they were so agressive? Yes.
They already shrunk the 200 arch and they already produced a 2 chip card, I think it is time for a fresh start.

Your comparison of core speed leaves scratching my head because I know that the GTX 285 at 648 crunches faster than a GTS250 at 702 MHz.

Which manufacturers are persuing a multi-chip approach?

Are you suggesting that no one buy an Nvidia card based on fermi arch until they shrink it to 28 nm?
Thanks - Steve
ID: 15471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15472 - Posted: 27 Feb 2010, 13:55:32 UTC - in response to Message 15469.  

You don’t see Intel or AMD trying to make such jumps between two levels of chip fab technology.

Intel tried it with the Itanium and ended up losing technology leadership to AMD for a number of years. If they hadn't been able to manipulate and in some cases buy the benchmark companies in order to make the P4 look decent they would have lost much more.

Why not 3 or 4 GTX285 chips at 40nm onto one card with DDR5? Its not as if they would have to rebuild the fabrication works.
Firmi wont be anything until it can be made at 28nm.

The 285 shrink would still have to have major redesign to support new technologies and even at 40nm you'd probably be hard pressed to get more than 2 on a card without pressing power and heat limits. Then would it even be competitive with the ATI 5970 for most apps? By the time NVidia got this shrink and update done where would AMD/ATI be? I think you have a good idea but the time has probably passed for it. It will be interesting to see if Fermi is as bad as SemiAccurate says. We should know within the next few months about speed. power, heat and availability.

ID: 15472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15473 - Posted: 27 Feb 2010, 14:25:56 UTC - in response to Message 15471.  
Last modified: 27 Feb 2010, 14:28:44 UTC

I read that the fabrication plants equipment is actually different; they had to redesign some of it.
NVidia has produced at least 2 dual GPU cards, so why stop or leave it at that. The CPU manufacturers have demonstrated the way forward by moving to quad, hex and even 8 core CPUs. This is scaling done correctly! Trying to stick everything on the one massive chip is backwards, more so for GPUs.
http://www.pcgameshardware.com/aid,705523/Intel-Core-i7-980X-Extreme-Edition-6-core-Gulftown-priced/News/
http://www.pcgameshardware.com/aid,705032/AMD-12-core-CPUs-already-on-sale/News/
I was not comparing a GTX 285 to a GTS 250. I was highlighting that a card, almost 2 years old, will have the same clock speed as Firmi, to show that they have not made any progress on this front, and when compared to a GTS250 at 702 MHz, you could argue that Firmi has lost ground in this area and this deflates other advantages of the Firmi cards.

multi-chip approach?
    ATI:
    Gecube X1650 XT Gemini, HD 3850 X2, HD 3870 X2, HD 2850 X2, 4870 X2...
    NVidia:
    Geforce 6800 GT Dual, Geforce 6600 dual versions, Geforce 7800 GT Dual, Geforce 7900 GX2, Geforce 9800 GX2, GTX295 and Asus’s limited ed. variant with 2 GTX 285 cores... None?!?
    Quantum also made a single board Sli 3dfx Voodoo2.



Are you suggesting that no one buy an Nvidia card based on fermi arch until they shrink it to 28 nm?

No, but I would suggest that the best cannot be achieved from Firmi technology if made at 40nm; but it could (and perhaps will) be seen at 28nm (as 28nm research is in progress).
ID: 15473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15474 - Posted: 27 Feb 2010, 14:47:31 UTC - in response to Message 15472.  

Nvidia could have at least produced 40n GTX 285 type chips for a dual GPU board, but they chose to concentrate on low to mid range cards for mass production and keep researching Firmi technologies. To some extent this all make sense. There were no issues with the lesser cards, so they wasted not time in pushing them out, especially to OEMs. The GT200 and GT200b cards on the other hand did have manufacturing issues, enough to deter them from expanding the range. It would have been risky to go to 40nm. Who would really be buying a slightly tweaked 40nm version of a GTX 260 (ie a GTX 285 with shaders switched off, due to production issues)?

But perhaps now that they have familiarised themselves with 40nm chip production (GT 210, 220 and 240) more powerful versions will start to appear, if only to fill the future mid range card line-up for NVidia. Lets face it, a lot of their technologies are reproductions of older ones. So the better late than never, if its profitable, approach may be applied.

The last thing gamers and crunchers need is for a big GPU manufacturer to diminish into insignificance. Even to move completely out of one arena would be bad for competition.
ID: 15474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15475 - Posted: 27 Feb 2010, 14:58:19 UTC

Guys.. you need to differentiate between the GF100 chip and the Fermi architecture!

The GF100 chip is clearly:
- too large for 40 nm
- thereby very difficult to manufacture
- thereby eats so much power that its clock speed has to be throttled, making it slower than expected

However, the design itself is a huge improvement over GT200! Besides DX 11 and various smaller or larger tweaks it brings along a huge increase in geometry power, a vastly improved cache system and it doesn't need to waste extra transistors for special double precision hardware at 1/8th the single precision speed. Instead it uses the current execution units in a clever way and thereby achieves 1/2 the single precision performance in DP, even better than ATI at 2/5th. Fermi even supports IEEE standard DP, the enhanced programming models (in C) etc.

All of this is very handy and you don't need 512 shader processors to use it! All they need to do is to use these features in a chip of a sane size. and since they already worked all of this out (since the first batch of GT100 chips returned last autumn the design must have been ready since last summer) it would be downright stupid not to use it in new chips. And to waste engineering ressources to add features to GT200 and shift it to 40 nm (which was the starting point of the Fermi design anyway).

Like ATI they must first have been informed about the problems with TSMCs 40 nm process back in the beginning of 2008. At that time they could have scaled Fermi down a little.. just remove a few shader cluster, set the transistor budget to not much more than 2 billion and get a chip which would actually have been manufacturable. But apparanently they choose to ignore the warning, whereas ATI choose to be careful.

But now that "the child already fell into the well", as the Germans say, it's mood discussion these issues. What nVidia really has to do is to get a mainstream version of Fermi out of the door.

Oh, and forget about 28 nm: if TSMC has got that much trouble at 40 nm, why would they fare any better at 28, which is a much more demanding target? Sure, they'll try hard not to make the same mistakes again.. but that's far more complicated than rocket science.

MrS
Scanning for our furry friends since Jan 2002
ID: 15475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15477 - Posted: 27 Feb 2010, 15:34:05 UTC

A few more words on why GF100 is too large:

Ever since G80 nVidia designed the comparably simple shader processors to run at high clock speeds. That is a smart move, as it gives you more performance out of the same amount of transistors. The G92 chip is a very good example of this: at the 65 nm process it reached 1.83 GHz at the high end and ~150 W, whereas it could also be used in the 9800GT Green at 1.38 GHz at 75 W. These numbers are acutally from the later 55 nm versions, but these provided just a slight improvement.

This was a good chip, because its power requirements allowed the same silicon to be run as high performance chip and as a energy efficient one (=lower voltages and clocks).

G92 featured "just" 750 million transistors. Then nVidia went from there to 1.4 billion transistors for the GT200 chip at practically the same process node. The chip became so big and power hungry (*) that its clock speeds had to be scaled back considerably. nVidia no longer had the option to build a high performance version. The initial GTX260 with 192 cores cost them at least twice as much as a G92 (double the amount of transistors, actual cost is higher as yield drops with area squared), yet provided just the same single precision performance and about similar performance in games. 128 * 1.83 = 192 * 1.25, simple as that. And even worse: the new chip was not even consuming less power under load, as would have been expected for a "low clock & voltage" efficient version. That's due to the leakage from twice the amount of transistors.

(*) Cooling a GPU with stock coolers becomes difficult (=loud and expensive) at about 150 W and definitely painful at 200 W.

At 55 nm the GT200 eventually reached up to 1.5 GHz, but that's still considerably lower than the previous generation - due to the chip being very big and running into the power limit. ATI on the other hand could drive their 1 billion transistor chip quite hard (1.3 V compared to 1.1 V for nVidia), extract high clock speeds and thereby more performance from cheaper silicon. They loose power efficiency this way, but proved to be more competitve at the end.

That's why nVidia stopped making GT200 - because it's too expensive to produce for what you can sell it. Not because they wouldn't be interested in the high end market any more, that's total BS. Actually the opposite is true: GF100 was designed very agressively to banish ATI from the high end market.

From a full process shrink (130-90-65-45-32 nm etc.) you could traditionally expect doulbe the amount of transistors at comparable power consumption. However, recent shrinks fell somewhat short of this mark. With GF100 nVidia took an already power-limited design (GT200 at 1.4 billion transistors) and stretched this rule quite a bit (GF100 at 3.2 bilion transistors). And had to deal with a process which performs much worse than the already-not-applicable-any-more rule. Put it all together and you run into real trouble. I.e. even if you can fix the yield issues your design will continue to be very power limited. That means you can't use the clock speed potential of your design at all, you get less performance and have to sell at a lower price. At the same time you have to deal with the high production cost of a huge chip. That's why I thought they went mad when I first heard about the 3.2 billion transistors back in september..

MrS
Scanning for our furry friends since Jan 2002
ID: 15477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15478 - Posted: 27 Feb 2010, 15:48:52 UTC

SK wrote:
The CPU manufacturers have demonstrated the way forward by moving to quad, hex and even 8 core CPUs. This is scaling done correctly! Trying to stick everything on the one massive chip is backwards, more so for GPUs.


Sorry, that's just not true.

One cpu core is a hand crafted, highly optimized piece of hardware. Expanding it is difficult and requires lots of time, effort and ultimatly money. Furthermore we've run into the region of diminishing returns here: there's just so much you can do to speed up the execution of one thread. That's why they started to put more cores onto single chips. Note that they try to avoid using multiple chips, as it adds power consumption and a performance penalty (added latency for off-chip communication, even if you can get the same bandwidth) if you have to signal from chip to chip. The upside is lower manufacturing costs, which is why e.g. AMDs upcoming 12 core chips will be 2 chips with 6 cores each.

For a GPU the situation is quite different: each shader processor (512 for Fermi, 320 for Cypress) is (at least I hope) a highly optimized and possibly hand crafted piece of hardware. But all the other shader processors are identical. So you could refer to these chips as 512- or 320-core processors. Which wouldn't be quite true, as these single "cores" can only be used in larger groups, but I hope you get the point: GPUs are already much more "multi core" than CPUs. Spreading these cores over several chips increases yield, but at the same time reduces performance due to the same reason it hurts CPUs (power, latency, bandwidth). That's why scaling in SLI or Crossfire is not perfect.

MrS
Scanning for our furry friends since Jan 2002
ID: 15478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15480 - Posted: 27 Feb 2010, 16:25:41 UTC

SK wrote:
NVidias mid range cards such as the GT220 and GT240 offer much lower power usage than previous generations, making these cards very attractive to the occasional gamer, the home media centre users (with their HDMI interface), the office environment, and of course to GPUgrid. The GT240 offers up a similar performance to a 8800GS or 9600GT in out and out GPU processing, but incorporates new features and technologies. Yet, where the 8800GS and 9800GT use around 69W when idle and 105W under high usage, the GT240 uses about 10W idle and up to 69W (typically 60 to 65W) when in high use. As these cards do not require any special power connectors, are quiet and not oversized they fit into more systems and are more ergonomically friendly.


IMO you're being a little too optimistic here ;)

First off: the new shader model 1.2 cards are great for GPU-Grid, no questions. However, the applications the performance of these cards is normally judged by are games. And these don't really benefit from 1.2 shaders and DX 10.1. Performance wise it trades blows with the 9600GT. I'd say overall it's a little slower, but nothing to worry about. And performance-wise it can't touch either a 9800GT or its Green edition. Power consumption is quite good, but please note that idle power of 9600GT and 9800GT lies between 20 and 30 W, not 70 W! BTW: the numbers XBit-Labs measures are lower than what you'll get from the usual total system power measurements, as they measure the card directly, so power conversion inefficiency in the PSU is eleminated.

If the 9600GT was the only alternative to the GT240 it would be a nice improvement. However, the 9800GT Green is its real competitor. In Germany you can get either this one or a GT240 GDDR5 for 75€. The 9800GT Green is faster and doesn't have a PCIe connector either, so both are around 70 W max under load. The GT240 wins by 10 - 15 W at idle, whereas the GT9800 Green wins at performance outside of GPU-Grid.

I'm not saying the GT240 is a bad card. It's just not as much of an improvement over the older cards as you made it sound.

I've got a nice suggestion for nVidia to make better use of these chips: produce a GT250. Take the same chip, give the PCB an extra PCIe 6 pin connector and let it draw 80 - 90 W under load. Current clock speeds of these cards already reach 1.7 Ghz easily, so with a slight voltage bump they'd get to ~1.8 GHz at the power levels I mentioned. That would give them 34% more performance, would take care of the performance debate compared to the older generation and would still be more power efficient than the older cards. nVidia would need a slight board modification, a slightly larger cooler and possible some more MHz on the memory. Overall cost might increase by 10€. Sell it for 20€ more and everyone's happy.

MrS
Scanning for our furry friends since Jan 2002
ID: 15480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15482 - Posted: 27 Feb 2010, 17:24:09 UTC

Regarding the Article from SemiAccurate: the basic facts Charlie states are probably true, but I really disagree with his analysis and conclusions.

He states that:
(1) GF100 went from revision A1 to A2 and is now at A3
(2) A-revisions are usually to fix logic errors in the metal interconnect network
(3) GF100 has huge yield issues
(4) and draws too much power, resulting in low clock speeds, resulting in lower performance
(5) nVidia should use a B revision, i.e. changes to the silicon, to combat (4)
(6) since they didn't they used the wrong tool too adress the problem and are therefore stupid

(1-5) are probably true. But he also states (correctly, according to Anand, AMD and nVidia) that there were serious problems with the reliability of the vias. These are the little metal wires used to connect the transistors. These are part of the metal interconnects. Could it be that nVidia used the A-revisions to fix or at least relieve those problems in order to improve yields? Without further knowledge I'd rather give nVidia the benefit of the doubt here and assume they're not total idiots rather than claiming (6).

(7) nvidia designed GF100 to draw more amperes upon voltage increases compared to Cypress

That's just a complicated way of saying: every transistor consumes power. Increasing the average power draw of every transistor by say a factor of 1.2 (e.g. due a voltage increase) takes a 100 W chip to 120 W, whereas a 200 W chip faces a larger increase by 40 W to 240 W.

(8) Fermi groups shader processors into groups of 32
(9) this is bad because you can only disable them in full groups (to improve yields from bad chips)

G80 and G92 used groups of 16 shaders, whereas GT200 went to 24 shader clusters. Fermi just takes this natural evolution to more shading power per texture units and fixed function hardware one step further. Going with a smaller granularity also means more control logic (shared by all shaders of one group), which means you need more transistors, get a larger and more expensive chip, which runs hotter but may extract more performance from the same number of shaders for certain code.
Charlies words are "This level of granularity is bad, and you have to question why that choice was made in light of the known huge die size." I'd say the choice was made because of the known large die size, not despite of it. Besides: arguing this way would paint AMD in an even worse light, as they can "only" disable shaders in groups of 80 in the 5000 series.

(10) nVidia should redesin the silicon layer to reduce the impact of the transistor variances

I'm not aware of any "magic" design tricks one can use to combat this. Except changing the actual process, but this would be TSMCs business. One could obviously use less transistors (at least one of the measures AMD choose), but that's a chip redesign, not a revision. One could also combat transistor gate leakage by using a thicker gate oxide (resulting in lower clock speeds), but that adresses neither the variances nor the other leakage mechanims.
If anyone can tell me what else could be done I'd happily listen :)

(11) The only way to rescue Fermi is to switch to 28 nm

While I agree that GF100 is just too large for TSMCs 40 nm and will always be very power constrained, I wouldn't count on 28 nm as the magic fairy (as I said somewhere above). They'd rather get the high-end to mainstream variant of the Fermi design out of the door as soon as possible. If they're not pushing this option since a couple of months the term "idiots" would be quite an euphemism..

MrS
Scanning for our furry friends since Jan 2002
ID: 15482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15483 - Posted: 27 Feb 2010, 17:35:58 UTC - in response to Message 15480.  

AMD did not suddenly jump to 2x6 cores. They moved from single to dual (on one die, unlike Intel who preferred to use glue), then moved to quads, hex and now 12 cores. Along the way they moved from 90nm to 40nm, step by step. In fact Intel will be selling a 6core 32nm CPU soon.

Unfortunately, NVidia tried to cram too much onto one core too early.
As you say Mr.S, an increase in Transistor quantity results in energy loss (leakage in the form of heat), and probably rises to the square too. The more transistors you have, the higher the heat loss and the less you can clock the GPU.

There is only 2 ways round this; thinner wafers, or to break up the total number of transistors into cool manufacturable sizes. I know this may not scale well, but at least it scales.

The 65nm G200 cards had issues with heat, and thus did not clock well - the first GTX260 sp192 (65nm) clocked to just 576MHz, poor compaired to some G92 cores. The GTX 285 managed to clock to 648MHz, about 12.5% faster in itself. An excellent result given the 240:80:32 core config compared to that of the GTX 260s, 192:64:28. This demonstrated that with design improvements and dropping from 65 to 55nm excess leakage could be overcome. But to try to move from 1.4Billion transistors to 3Billion, even on a 40nm core was madness; not least because 40nm was untested technology and a huge leap from 55nm.

They, “bit off more than they could chew”!

If they had designed a 2Billion transistor GPU that could be paired with another on one board and also dropped to 40nm they could have made a very fast card, and one we could have been using for some time - it would have removed some manufacturing problems and vastly increased yields at 40nm, making the project financially viable in itself.

I think they could have simply reproduced a GTX 285 core at 40nm that could have been built into a dual GPU card, called it a GTX 298 or something and it would have yielded at least a 30% performance compared to a GTX 295. An opportunity missed.

I still see a quad GPU card as being a possibility, even more so with Quad Sli emerging. If it is not on any blueprint drafts I would be surprised.
ID: 15483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15485 - Posted: 27 Feb 2010, 19:12:07 UTC - in response to Message 15483.  

AMD did not suddenly jump to 2x6 cores. They moved from single to dual (on one die, unlike Intel who preferred to use glue), then moved to quads, hex and now 12 cores. Along the way they moved from 90nm to 40nm, step by step. In fact Intel will be selling a 6core 32nm CPU soon.


I know, but what's the point? What I was trying to say is that the CPU guys are trying to avoid multi chip packages. They lead to larger chips (additional circuitry needed for communication), higher power consumption (from the additional circuitry) and lower performance (due to higher latency) and higher costs due to the more complex packaging. The only upsides are improved yields on the smaller chips and faster time to market if you already have the smaller ones. So they only use it when either (i) the yields on one large chip would be too low and offset all the penalties from going multi chip or (ii) the cost for a separate design, mask set etc. for the large chip would be too high compared to the expected profit from the chip, so in the end using 2 existing chips is more profitable and good enough.

There is only 2 ways round this; thinner wafers, or to break up the total number of transistors into cool manufacturable sizes. I know this may not scale well, but at least it scales.


Thinner wafers?

I think they could have simply reproduced a GTX 285 core at 40nm that could have been built into a dual GPU card, called it a GTX 298 or something and it would have yielded at least a 30% performance compared to a GTX 295. An opportunity missed.


And I'd rather see this chip based on the Fermi design ;)

I still see a quad GPU card as being a possibility, even more so with Quad Sli emerging.


Quad SLI: possibly, just difficult to make it scale without microstutter.

Quad chips: a resounding no. Not as an add-in card for ATX or BTX cases. Why? Simple: a chip like Cypress at ~300 mm² generally gets good enough yield and die harvesting works well. However, push a full Cypress a little and you easily approach 200 W. Use 2 of them and you're at 400 W, or just short of 300 W and downclocked as in the case of the 5970. What could you gain by trying to put 3 Cypress onto one board / card? You'd have to downclock further and thereby reduce the performance benefit, while at the same time struggling to stay within power and space limits. The major point is that producing a Cypress-class chip is not too hard, and you can't put more than 2 of these onto a card. And using 4 Junipers is going to be slower and more power hungry than 2 Cypress due to the same reasons AMD is not composing their 12 core CPU from 12 single dies.
I could imagine an external box with its own PSU, massive fans and 800+ W of power draw though. But I'm not sure there's an immediate large market for this ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 15485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile robertmiles

Send message
Joined: 16 Apr 09
Posts: 503
Credit: 769,991,668
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15486 - Posted: 27 Feb 2010, 19:28:25 UTC - in response to Message 15483.  

I still see a quad GPU card as being a possibility, even more so with Quad Sli emerging. If it is not on any blueprint drafts I would be surprised.


Sounds reasonable, IF they can reduce the amount of heat per GPU enough that the new boards don't produce any more heat than the current ones. How soon do you expect that?

Or do you mean boards that occupy four slots instead of just one or two?

If they decide to do it by cutting the clock rate in half instead, would you buy the result?
ID: 15486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15487 - Posted: 27 Feb 2010, 21:14:23 UTC - in response to Message 15486.  
Last modified: 27 Feb 2010, 21:54:26 UTC

Yeah, I was technically inaccurate to say that the 9800 GT uses 69W when idle; I looked it up and that is what I found! It would be more accurate, albeit excessive, to say that the system usage with the GPU installed and sitting idle can increases by up to 69W, with the GPU perhaps using about 23W (the accurate bit); the additional losses coming from the GPUs power dependent components; the PSU and motherboard (the vague and variable bit). Of course if I say this for the 9800GT I would need to take the same approach with the GT240 (6W idle), and mention that in reality idle power consumption also depends on your motherboards ability to turn power off, your PSU, chipset and motherboards general efficiencies. Oh, and the operating systems power settings, but again excessive for a generalisation.

- XbitLabs compared a GT240 to a 9800GT (p4) and then replace the 9800GT with a GTS250 (p6)! Crunching on GPUGrid, and overclocked by between 12 and 15%, my temps, on 3 different types of GT240s stay around 49degrees C. The only exception is one card that sits a bit too close to a chipset heatsink in an overclocked i7 system. How they got 75degrees from native is beyond me. Even my hot (58deg) cards fan is at 34%.

Anyway, I think the GT240 is a good all round midrange card for its many features, and an excellent card for GPUGrid. For the occasional light gamer, that does not use the system as a media centre or for GPUGrid, you are correct, there are other better options, but these cards are less use here, and I don’t think gamers that don’t crunch are too interested in this GPUGrid forum.

I've got a nice suggestion for nVidia to make better use of these chips: produce a GT250. Take the same chip, give the PCB an extra PCIe 6 pin connector and let it draw 80 - 90 W under load. Current clock speeds of these cards already reach 1.7 Ghz easily, so with a slight voltage bump they'd get to ~1.8 GHz at the power levels I mentioned. That would give them 34% more performance, would take care of the performance debate compared to the older generation and would still be more power efficient than the older cards. nVidia would need a slight board modification, a slightly larger cooler and possible some more MHz on the memory. Overall cost might increase by 10€. Sell it for 20€ more and everyone's happy.


Your suggested card would actually be a descent gaming card, competitive with ATI’s HD 5750. NVidia has no modern cards (just G92s) to fill the vast gap between the GT240 and the GTX 260sp216 55nm. A GTX 260 is about two or two and a half times as fast on GPUGrid. Perhaps something is on its way? A GTS 240 or GTS 250 could find its way onto a 2xx 40nm die and into the mainstream market (its presently an OEM card and likely to stay that way as is). Anything higher would make it a top end card.

I could see two or four 40nm GPUs on one board, if they had a smaller transistor count. Certainly not Fermi (with 3Billion), at least not until 28nm fabrication was feasible, and thats probalby 2years+ away, by which time there will be other GPU designs. I think the lack of limitations for a GPU card offers up more possibility than 2 or 4 separate cards, and even though the circutry would increase, it would be less than the circuitry of 4 cards together. I expect there will be another dual card within the next 2 years.

Fermi could do with less transistors, even if that means a chip redesign!
A design that allows NVidia to use more imperfect chips, by switching areas off might make them more financially viable. We know AMD can disable cores if they are not up to standard. Could NVidia take this to a new level? If it means 4 or 5 cards with less and less ROPs & shaders, surely thats better for the consumer too? GTX 480, 470, 460, 450 ,440, 430. More choice, better range, better prices.

NVidia and AMD want to cram more and more transistors into a chip. But this increase in density causes heat increase. Could chips be designed to include spaces, like fire-breakers?
ID: 15487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile liveonc
Avatar

Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 15488 - Posted: 27 Feb 2010, 23:41:05 UTC - in response to Message 15485.  
Last modified: 27 Feb 2010, 23:55:45 UTC

Quad chips: a resounding no. Not as an add-in card for ATX or BTX cases. Why? Simple: a chip like Cypress at ~300 mm² generally gets good enough yield and die harvesting works well. However, push a full Cypress a little and you easily approach 200 W. Use 2 of them and you're at 400 W, or just short of 300 W and downclocked as in the case of the 5970. What could you gain by trying to put 3 Cypress onto one board / card? You'd have to downclock further and thereby reduce the performance benefit, while at the same time struggling to stay within power and space limits. The major point is that producing a Cypress-class chip is not too hard, and you can't put more than 2 of these onto a card. And using 4 Junipers is going to be slower and more power hungry than 2 Cypress due to the same reasons AMD is not composing their 12 core CPU from 12 single dies.
I could imagine an external box with its own PSU, massive fans and 800+ W of power draw though. But I'm not sure there's an immediate large market for this ;)

MrS[/quote]

Sound fascinating with an external option. I'm thinking about the Expressbox that was tried with an external GPU to boost laptops through the ExpressCard slot. Problem was that it was a x1 PCI-E & the idea was too costly & produced too little benefits.

But what about a x16 PCI-e 2.0 external extender riser card connected to an external GPU? Already GPU's are making so much heat & taking up so much space, so why not just separate it once and for all? No need for a redesign of the motherboard... If there was an option to stack external GPU's in a way that could use SLI, an external GPU option could give enthusiasts the opportunity to use insane ammounts of money on external GPU's & stack them on top of each other w/o running into heat or power issues. Even if a fermi requires 800W, if it's external & has it's own dedicated PSU, you'd still be able to stack 3 on top of each other to get that fermi tri-SLI.
ID: 15488 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next

Message boards : Graphics cards (GPUs) : Fermi

©2025 Universitat Pompeu Fabra