GTX 590 coming?

Author	Message
skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20788 - Posted: 27 Mar 2011, 0:57:07 UTC - in response to Message 20787. Perspective, AMD cards do not work here, so any discussion of them is not yet relevant - hopefully some day, but not yet. Anyway, whoever the manufacturer I'm not forking out £500 for a GPU. I very much doubt that NVidia will release a 28nm GPU in the 2nd quarter this year. They may have a prototype ready in the 3rd Q. but I can't see a 28nm release any time this year. While I would expect a speed bump with a move to 28nm, I seriously doubt that any 28nm NVidia card will ever outperform a GTX590 by fifteen times. Still, I'm in no hurry to replace my GTX470's with 500 series cards; the GTX470 is still the best GPU in terms of performance per purchase price, if you can still get one. Maybe next year. ID: 20788 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 20790 - Posted: 27 Mar 2011, 13:30:44 UTC 28 nm cards should be a lot more attractive.. but don't expect miracles! For Cayman ATI changed the fudamental shader architecture (-> different software optimization -> expensive!) for 10% more efficiency per transistor. Sure, more improvements will be made, but we've already reched a quite mature state. And without architetural revolutions (which would mostly need software to catch up to be useful) there's brute force "more transistors, more shaders" left. They will gain some more headroom at 28 nm, but unless any power efficiency miracle happened (which didn't happen at TSMC 40 nm and didn't happen for Intel at 32 nm, just regular solid improvements), I don't see how they could suddenly use 2 to 3 times as many transistors. Sure, at 28 nm they can pack twice as many transistors into the same area as before, but the power savings from the new process alone will not be 2x, i.e. twice as many transistors will require a lot more power than the old design. You can't have both, more transistors and significantly reduced power consumption. You have to choose one of them. And I think we all know how hard nVidia was pushing Fermi already.. so upping power consumption further is not really a promising option. MrS Scanning for our furry friends since Jan 2002 ID: 20790 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20791 - Posted: 27 Mar 2011, 15:23:20 UTC - in response to Message 20787. The transition from 55nm to 40nm, and creating a working brand new Fermi architecture took NVidia more than a year (that was the GTX 480-470-465 line), and then another 9 months to reach the original goal of 512 shaders (GTX 580-570). Then it took them 3 months to select the chips for a twin-chip "world's fastest GPU" which is clearly made for one purpose: to gain in NVidia's prestige. If a new GPU made with the 28nm fabrication process was about to release, why would NVidia (and AMD) waste their time and resources to come up with something like the GTX 590? (which is clearly more expensive to produce than they sell it, that's why they limit it's production) Changing from 40nm litography to 28nm means doubled transistor count on the same die area size, (or half sized chips, which results higher yield = cheaper dies) but twice as much transistor at the same clock speed and voltage dissipates twice as much heat, which would be impossible to do below 90°C. The only way to maintain the heat dissipation without reducing the speed is to reduce the voltage to 71% of the 40nm's, but to achieve this there is more technological development needed than shrinking the transistors. The real basis of performance gain is how much the chip's voltage can be reduced. I guess it will be around 90%, which means 81% TDP of the 40nm chip per transistor. If so, then the overall performance gain on a single (GPU) chip could be around 23.5% (=1/0.81) over the 40nm chips, because the heat dissipation already reached it's peak. If NVidia is about to create such a large die as the Fermi was, they will be facing similar difficulties to release a working product of it. If they were going to release a smaller chip, this would not be much faster, but would cost less. The latter would not be much motivation for enthusiasts to buy this new GPU. ID: 20791 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20793 - Posted: 27 Mar 2011, 21:05:51 UTC - in response to Message 20791. A GPU with many small efficient cores could be the answer to significantly increase crunching and server farm performance per Watt, but such cards would not sell to gamers, or be of much use as entry level GPUs. Perhaps market changes will open gateways to such cards and the development of software such as OpenCL/OpenGL will drive either NVidia and/or ATI towards such cards, in a similar way to Intel being driven towards developing a range of small efficient CPU cores for server farms. ID: 20793 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 20794 - Posted: 27 Mar 2011, 21:45:06 UTC Well said, Retvari. I've just got to add one detail: in a new process node power the reduction in power consumption not only comes from reduced voltage. The smaller transistors usually also require less power for switching because, well, they're smaller (simply said). This adds to the voltage scaling but is difficult to predict (and TSMCs 40 nm was rather dissappointing in this regard). @SK: well.. that's actually what a modern GPU is. You can view the individual "shader multiprocessors" in Fermis as individual "cores". Making them even more fine-granular (i.e. each one smaller) would actually cause more overhead for instruction decoding, distribution etc. Utilization may be better, but you'll get less shaders / computation power per transistor. That's why the SMs actually grew even bigger in CC 2.1. MrS Scanning for our furry friends since Jan 2002 ID: 20794 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20795 - Posted: 27 Mar 2011, 22:03:09 UTC - in response to Message 20794. What I meant is having more cores, as in the GTX590 has two cores. A card with 4, 8 or 16 might well be a better performer in terms of computational power per Watt. I'm not talking about 3Billion transistors per core though, perhaps 1B or 1.5B at 28/22nm for 8 cores. ID: 20795 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20797 - Posted: 28 Mar 2011, 10:07:39 UTC - in response to Message 20794. Last modified: 28 Mar 2011, 10:31:01 UTC The smaller transistors usually also require less power for switching because, well, they're smaller (simply said). Well, this is almost true, and a transistor is only a part of the chip. It's quite logical, that smaller transistors require less power for switching, but on a chip there are parasitic capacitors everywhere on the entire area of the chip. A capacitor's (practical or parasitic in our case, their physics is similar to each other) capacitance is in direct ratio to it's electrode's surface area, but in reciprocal ratio of the electrode's distance. So if a transistor's area is reduced, this is good. But with this reduction the transistor's semiconductor part is also gets thinner, which is bad (and also good, because it's lowers the voltage needed to switch between logical levels). Also, the wiring comes closer to each other, which is also bad. Moreover a chip built thinner litography could contain more components, connected by (overall) longer wires, which is also bad. The only way to overcome this bad factors is creating better insulation between the conductors (dielectric in terms of a capacitor). This adds to the voltage scaling but is difficult to predict (and TSMCs 40 nm was rather dissappointing in this regard). I agree with that. ID: 20797 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20798 - Posted: 28 Mar 2011, 10:13:43 UTC BTW I have ordered a Gainward GTX 590 yesterday, and it will be delivered tomorrow. ID: 20798 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 20799 - Posted: 28 Mar 2011, 12:09:52 UTC - in response to Message 20798. We, as a lab, are waiting to buy the new 28nm cards and PCI3 bus motherboards (for parallel runs). We are expecting a factor at least 2 in performance for kepler because the Fermi core design is now mature (see g80 to g200). gdf ID: 20799 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 20801 - Posted: 28 Mar 2011, 21:01:49 UTC Well said again, Retvari. I think I could add some more, but don't see a need to do, especially since we'd be getting OT. @SK: oh you mean distributing the shaders among several smaller chips (I don't think we should use the term "core" for GPUs, as it's too unspecific). That could give you several benefits regarding power efficiency: - You could physically spread the chips out, so you've got more space for cooling. This generally means higher performance cooling is possible. Reduced chip temperatures lead to higher power efficiency. - You could spread the memory bandwidth demands among several narrower busses, which are themselves easier to implement and can provide more aggregate bandwidth than the current solutions, allowing for lower memory clocks. - You could bin the chips more carefully (less variation in 200 mm² than in 500 mm²) and drive them at optimal working points (voltage & clock). - Yield would improve (which has no direct influence on power efficiency). I do see some problems, though: - You already said such cards would be worse at gaming due to non-ideal scaling in SLI / Crossfire. Well, screw gaming.. but there's more to it. The same principle limits performance at HPC applications. For BOINC we'd be fine with that, since each chip could run its own code without any need for communication with the other ones. However, such GPUs are not built for this market alone, but rather for the entire HPC crowd. And here you've got many problems which you want to solve as fast as possible. Think of a single time step at GPU-Grid - you can not run this across different GPUs efficiently because the communication would be too slow. Far slower than on-chip. Even if we could get the bandwidth between chips (e.g. using optical chip-to-chip communication) this would still add latency, which would make the multi-chip solution slower than the fat chip (everything else being equal). High-bandwidth long-range communication can also quickly become a power hog, reducing efficiency. - In a multi-chip solution you still want to produce only a single type of chip (lithography mask sets are really expensive). Therefore you'd duplicate units, which you need only once for a single card. You'd need more transistors to get the same job done, which usually reduces power efficiency. - If your individual shader multiprocessors in a Fermi execute different code (or just different stages of the same code), the requirements regarding cache-size and -bandwidth and memory bandwidth may differ. In the fat chip you get automatic load balancing accross all shared ressources. In the multi-chip solution you loose this advantage, potentially reducing power efficiency. All in all I think it's probably not worth it, as long as the fat chips can be handled. Anyway, worth thinking about :) MrS Scanning for our furry friends since Jan 2002 ID: 20801 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20802 - Posted: 28 Mar 2011, 21:20:07 UTC - in response to Message 20799. We, as a lab, are waiting to buy the new 28nm cards and PCI3 bus motherboards (for parallel runs). We are expecting a factor at least 2 in performance for kepler because the Fermi core design is now mature (see g80 to g200). gdf This is a realistic expectation, but fifteen times as Zydor said is irrealistic. I expected that before NVidia releases kepler, they transferring the GF110 to 28nm, as they did it with the GT200 (65 to 55nm), and release a dual GPU card built on this smaller, faster, cooler, cheaper chip. But neither NVidia, nor ATI released their new dual GPUs on 28nm, so there must be some reason behind it (I mean technological difficulties at TSMC) I've found the corresponding technological articles on wikipedia (in my previous post I almost recalled everything right... :) http://en.wikipedia.org/wiki/Low-k_dielectric http://en.wikipedia.org/wiki/High-k_dielectric http://en.wikipedia.org/wiki/Silicon_on_insulator Light reading :) ID: 20802 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 20803 - Posted: 28 Mar 2011, 22:54:03 UTC - in response to Message 20802. Easy: TSMC 28 nm is not ready yet :p MrS Scanning for our furry friends since Jan 2002 ID: 20803 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level Scientific publications	Message 20804 - Posted: 29 Mar 2011, 2:43:06 UTC The 15x times refered specifically the Maxwell chip in 2013, which will be on 22nm. NVidia claims, not mine. I share a healthy scepticism particularly as its 2+ years away and referred to a 2013 release, not relevant to Kepler, I dont believe it impossible, guess time will tell in 2+ years. It serves to illustrate NVidia claims and direction, but is not relevant to a 590 successor. 2X for Kepler is vertually a given in that such an increase is almost automatic in a move from 40nm to 28nm. How much more than that remains to be seen. I believe its almost certainly more, but lets wait for reality despite NVidia claims which center either side of 3X the Fermi series. However the minimum 2X is enough to at least give pause for thought in a 590 purchase, given time frames. How much thought is an individual matter, we all have different pressures, needs and wants. Regards Zy ID: 20804 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20806 - Posted: 29 Mar 2011, 14:21:32 UTC My GTX590 has arrived, and I've successfully installed it into this host. More exactly I replaced the GTX480 in this host with the new GTX590. I will not overclock it for the time being. So this host has now an overclocked GTX580 and a GTX590. ID: 20806 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20807 - Posted: 29 Mar 2011, 16:01:19 UTC - in response to Message 20806. Last modified: 29 Mar 2011, 16:52:55 UTC Zoltan, could you post up the details here - thanks In theory the GTX590 (ref clocks) should be about 50% more productive than the GTX580. I noticed that the GTX580 is about 50% faster than the GTX470, which is about 27% faster than a GTX465. Anyway, we are now in the position that the top CC2.0 Fermi has about 290% the performance of the lowest at GPUGrid. Should the GTX590 clock to the same freq as a ref GTX580 (772MHz) then a GTX590 could do about 369% the work of a GTX465. In terms of maturity the GTX500 series cards are definitely more energy efficient than their GTX400 counterparts; in terms of performance per Watt a GT240/GT340 could outperform a GTX470, but this is not the case with the CC2.0 GTX500 series cards. I still think that more refinements are possible and these would be augmented with a move to 28nm. I just hope they do not take the architectural frailties of CC2.1 to 28nm. ID: 20807 · Rating: 0 · rate: / Reply Quote

Carlesa25 Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level Scientific publications	Message 20809 - Posted: 29 Mar 2011, 17:43:50 UTC - in response to Message 20806. Hello: Congratulations Zoltan, pays you 54% more than my GTX295. I do not see clear for some time, is in the series Fermi not reporting good, in the detail of the task, the Number of cores: 128 instead of which has 512...?. Best regards. ID: 20809 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20811 - Posted: 29 Mar 2011, 21:21:00 UTC - in response to Message 20809. Hello: Congratulations Zoltan, pays you 54% more than my GTX295. I do not see clear for some time, is in the series Fermi not reporting good, in the detail of the task, the Number of cores: 128 instead of which has 512...?. Best regards. Thank you! This is a reporting bug in BOINC. BOINC assumes that a streaming multiprocessor has 8 CUDA cores, which is true for G80 and G200 GPUs, but this is changed to 32 (and to 48 in CC2.1 GPUs) in the Fermi architecture. ID: 20811 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20812 - Posted: 30 Mar 2011, 13:43:05 UTC - in response to Message 20799. We, as a lab, are waiting to buy the new 28nm cards and PCI3 bus motherboards (for parallel runs). We are expecting a factor at least 2 in performance for kepler because the Fermi core design is now mature (see g80 to g200). gdf The ACEMD client is having trouble to keep the Fermi GPUs busy. I wonder how will the client keep those faster GPUs busy then? ID: 20812 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 20813 - Posted: 30 Mar 2011, 15:41:20 UTC - in response to Message 20812. The ACEMD client is having trouble to keep the Fermi GPUs busy. I take it you mean the GPU utilization percentage is low? ID: 20813 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 20814 - Posted: 30 Mar 2011, 18:15:02 UTC - in response to Message 20813. Last modified: 30 Mar 2011, 18:28:49 UTC The ACEMD client is having trouble to keep the Fermi GPUs busy. I take it you mean the GPU utilization percentage is low? I mean the Fermi GPU utilization is pretty much CPU speed dependant, and WU type dependant. If these factors won't change for the better in the future, then the ACEMD client will need 8GHz+ CPU cores to feed the new GPUs. ID: 20814 · Rating: 0 · rate: / Reply Quote