NVidia GTX 650 Ti & comparisons to GTX660, 660Ti, 670 & 680

Author	Message
skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29602 - Posted: 29 Apr 2013, 17:38:56 UTC - in response to Message 29601. Last modified: 29 Apr 2013, 17:44:18 UTC A reference GTX660 has a boost clock of 1084. My GTX660Ti operates at around 1200MHz, and the reference GTX660Ti is 1058MHz, so you can't say the GTX660 has faster clocks. I would like to see some actual results from your GTX660, to see if it really is completing Nathan Long tasks in 6h. I doubt it because my GTX660Ti takes around 5 1/2h and it's 40% faster by my reckoning: I34R5-NATHAN_dhfr36_5-17-32-RND3107_0 4392017 24 Apr 2013 \| 11:12:40 UTC 24 Apr 2013 \| 19:20:22 UTC Completed and validated 19,505.27 18,807.06 70,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42) A 960cuda core GTX660 should take ~7.5h. The 1152shader versions are OEM. These GeForce Kepler cards are either GK106 or GK104, but architecturally they are very similar. It's not like comparing high end and mid range Fermi's which were quite different. All the GK106 and GK104 cards are super-scalar. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29602 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29603 - Posted: 29 Apr 2013, 21:14:55 UTC This "not using all shaders" is not the best wording if you don't know what it means. As a brief explanation: it's a property of the fundamental chip architecture, every chip / card since "compute capability 2.1" has it, which would be the mainstream Fermis and then all Keplers. This move allowed nVidia to increase the number of shaders dramatically compared to older designs, but at the cost of sometimes not being able to use all of them. At GPU-Grid it seems like the "bonus shaders" can not be used at all.. but this matters only when comparing to older cards. The newer ones are way more power efficient, even with this handicap. Regarding the actual choice between 650Ti, 650Ti Boost, 660 and 660Ti: Jim is right that higher clock speeds are to be preferred over more shaders.. but in this case all these cards are based on the same architecture and lithography and hence can reach similar clock speeds and similar voltages (=efficiencies). I don't see a penalty here for the bigger GK104 cards. My GTX660Ti runs GPU-Grid happily at 1.23 GHz and POEM at 1.33 GHz. It's been one of the first and from what I've heard more like the norm rather than an exceptionally good card. Regarding the issue "GTX660Ti is not faster than GTX660" I'd say "show us the numbers". Theoretically the Ti should be 30% faster in the same setup.. and this rule hasn't failed us up to now. I'd go for the largest of these cards which fits nicely into your budget. But don't go higher than a GTX660Ti, since the GTX670 has the same number-crunching power but higher game performance.. which you'd unnecessarily pay for. MrS Scanning for our furry friends since Jan 2002 ID: 29603 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 29607 - Posted: 30 Apr 2013, 2:01:13 UTC - in response to Message 29602. I would like to see some actual results from your GTX660, to see if it really is completing Nathan Long tasks in 6h. I doubt it because my GTX660Ti takes around 5 1/2h and it's 40% faster by my reckoning: Here is the first one at 5 hours 50 minutes: 6811663 150803 29 Apr 2013 \| 17:21:11 UTC 30 Apr 2013 \| 0:44:44 UTC Completed and validated 21,043.76 21,043.76 70,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42) You can look at the others for a while, though I will be changing this card to a different PC shortly. http://www.gpugrid.net/results.php?hostid=150803 This GTX 660 is running non-overclocked at 1110 MHz boost (993 MHz default), and is supported at the moment by a full i7-3770. However, when I run it on a single virtual core, the time will increase to about 6 hours 15 minutes. ID: 29607 · Rating: 0 · rate: / Reply Quote

Simba123 Send message Joined: 5 Dec 11 Posts: 147 Credit: 69,970,684 RAC: 0 Level Scientific publications	Message 29608 - Posted: 30 Apr 2013, 2:34:16 UTC - in response to Message 29607. I would like to see some actual results from your GTX660, to see if it really is completing Nathan Long tasks in 6h. I doubt it because my GTX660Ti takes around 5 1/2h and it's 40% faster by my reckoning: Here is the first one at 5 hours 50 minutes: 6811663 150803 29 Apr 2013 \| 17:21:11 UTC 30 Apr 2013 \| 0:44:44 UTC Completed and validated 21,043.76 21,043.76 70,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42) You can look at the others for a while, though I will be changing this card to a different PC shortly. http://www.gpugrid.net/results.php?hostid=150803 This GTX 660 is running non-overclocked at 1110 MHz boost (993 MHz default), and is supported at the moment by a full i7-3770. However, when I run it on a single virtual core, the time will increase to about 6 hours 15 minutes. That's amazingly fast. My 660Ti does Nathan's in about 19000 @ 1097Mhz . I would like to see what a 660 can do with a Noelia unit (when they start working again) and see if the more complex task takes a proportionate time increase. ID: 29608 · Rating: 0 · rate: / Reply Quote

flashawk Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level Scientific publications	Message 29611 - Posted: 30 Apr 2013, 3:14:36 UTC My GTX670's do the current NATHANS in about 4 hours at 1200MHz, that's not the same as a 660. When we were doing the NOELIA's, the difference was even greater. I know some will disagree with this but I believe the 256 bit onboard bus makes a difference, I'm not pushing data through a smaller pipe. They do have a much greater advantage on power consumption and the amount of heat they put off. ID: 29611 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29613 - Posted: 30 Apr 2013, 8:53:49 UTC @Flashhawk: you've also got the significant performance bonus of running Win XP. The other numbers here are from Win 7/8. @Jim: that is impressive performance, thanks for showing the numbers. As a comparison I'll use your config with only a logical core assisting the GPU, as this is what I'm also running. In this case we've got 6:15 h = 22500 s at 1110 MHz. I'm running at 1228 MHz and should hence get 20400 s. The difference in shaders is 1344/960 = 1.4, which means I should be getting 14530 s. Well, this is clearly not the case :D Instead I'm seeing 19030 s for these WUs, just 7% faster at the same clock speed. I can't look the memory controller load up right now, but it was fairly low, so it shouldn't limit performance this much (especially since memory OC didn't really help GPU-Grid historically). I think it seems like these WUs are too small for the larger cards to stretch their legs, i.e. there's too much context switching, PCIe transfers etc. happening. MrS Scanning for our furry friends since Jan 2002 ID: 29613 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 29616 - Posted: 30 Apr 2013, 12:35:07 UTC - in response to Message 29608. That's amazingly fast. My 660Ti does Nathan's in about 19000 @ 1097Mhz . I would like to see what a 660 can do with a Noelia unit (when they start working again) and see if the more complex task takes a proportionate time increase. Good question. I am changing this machine around today so the link will no longer be good for this card, but will post later once I get some. ID: 29616 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 29618 - Posted: 30 Apr 2013, 13:53:19 UTC - in response to Message 29601. I recommend the GTX 660. It takes about 6 hours to complete a Nathan long, which seem to be about the same (or not much longer) as for a GTX 660 Ti. And I just completed a new build, and was able to measure the power into the card as being 127 watts while crunching Nathan longs (that includes 10 watts static power, and accounts for the 91% efficiency of my power supply). The GTX 650 Ti is very efficient too, but takes about 9 or 10 hours on Nathan longs the last time I tried it a couple of months ago. I'm running 3 MSI Power Edition GTX 650 TI cards and the times on the long run Nathans range from 8:15 to 8:19. They're all OCed at +110 core and +350 memory (these cards use very fast memory chips) and none of them has failed a WU yet. Temps range from 46C to 53C with quite low fan sttings. I like the Asus cards too; they are very well built, but to be safe I would avoid the overclocked ones, though Asus seems to do a better job than most in testing the chips used in their overclocked cards. Interesting, for years I have run scores of GPUs (21 running at the moment on various projects) of pretty much all brands and for me ASUS has has BY FAR the highest failure rate. In fact ASUS cards are the ONLY ones that have had catastrophic failures, other brands have only had fan failures. Of the many ASUS cards I've had only 1 is still running, every other ASUS has failed completely except for 1 that is waiting for a new fan. Personally have had good luck with the XFX cards (among other brands) with double lifetime warranty, out of many XFX cards have had 2 fan failures and they've shipped me complete new HS/fan assemblies in 2 days both times. Also have had good luck with MSI, Sapphire, Powercolor and Diamond. ID: 29618 · Rating: 0 · rate: / Reply Quote

tomba Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level Scientific publications	Message 29619 - Posted: 30 Apr 2013, 16:15:27 UTC Last modified: 30 Apr 2013, 16:15:59 UTC Many thanks to all who contributed to my question. Quite stimulating! With apologies to Beyond, who is not impressed with ASUS, I've decided to go with the ASUS GTX 660. All the ASUS GPUs I've had have turned in exemplary performance. €190 (about £161) on Amazon France looks like the best deal available to me. I live in France so I get free shipping! All I have to do now is to persuade Her That Matters that it's a good idea! Tom ID: 29619 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 29621 - Posted: 30 Apr 2013, 18:57:41 UTC - in response to Message 29619. With apologies to Beyond, who is not impressed with ASUS, I've decided to go with the ASUS GTX 660. All the ASUS GPUs I've had have turned in exemplary performance. No need to apologize, just relating my experience with the 7 ASUS cards I've owned as opposed to the scores of other brands. For me the ASUS cards failed at an astounding rate. Of course YMMV. ID: 29621 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29622 - Posted: 30 Apr 2013, 23:04:48 UTC - in response to Message 29607. Last modified: 1 May 2013, 9:44:40 UTC Jim1348, is that definitely a 960shader version and not an 1152 OEM card? Is it a 2GB or 3GB card? It might be the case that some WU's benefit from the extra memory bandwidth. The bus width of the GTX660Ti is 192bits, the same as the GeForce GTX 660 and GTX650Ti, but because the GTX660Ti has more shaders and SM's it's bandwidth is relatively less. I typically see a 40% memory controller load when running GPUGRid WU's on the GTX660Ti. This is a lot higher than any previous cards I've had. One problem in assessing the impact of bandwidth is that these tasks can vary by some margin. Just looking at 8 WU's I've seen a 7% variation in runtime on my own system. It's also hard to know what influence the operating system has. It might just be 11% from XP to W7, as it use to be, or it could be more for some WU's. The 11% difference isn't sufficient to move from 14500sec to 18800sec though. flashawk's GTX670 (on XP) is about 30% faster than my GTX660Ti on W7x64. So, some other factor, other than OS is involved and the likely candidate is bandwidth. Another consideration might be CPU usage. I might look at this at the weekend. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29622 · Rating: 0 · rate: / Reply Quote

flashawk Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level Scientific publications	Message 29623 - Posted: 1 May 2013, 0:01:52 UTC 40% is really good, I think you're actually using more shaders than I am on my 670's. The highest I've seen on the memory controllers is 32% for a 670 and 34% for a 680. SK, out of curiosity, what brand are you're 660's? ID: 29623 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29624 - Posted: 1 May 2013, 9:42:24 UTC - in response to Message 29623. Last modified: 1 May 2013, 11:54:21 UTC My GTX660Ti is made by Gigabyte, and comes with a dual fan. It's a descent model, but fairly standard; most boost up to around 1200MHz. The theory has been that because the GTX660Ti has the same number of shaders as a GTX670 it should be just as fast. However, that doesn't seem to be the case. So perhaps the memory bandwidth is having more of an influence. My CPU is an i7-3770 @4.2 GHz and I've 2133MHz system memory and a SATA6 drive, so it should not be bottleknecked anywhere else. I do use the CPU, but I usually have 2 threads free. It's difficult to compare GPU's on different operating systems but the fact that the GTX660 also performs so well comparatively leads me to think that the GPU memory bandwidth is more of an issue than was previously thought. - It just occurred to me that the drop from 32 to 24 ROP's could be the issue for the GTX660Ti rather than or as well as the memory bandwidth; the GTX670 has 32 Rops, but the GTX660Ti only has 24 Rops. The GTX660 also has 24 Rops - perhaps a better Rops to Cuda Core ratio. If that is the case then we might also see a similar difference between the "GTX650Ti Boost" and the "GTX650Ti"; the Boost also has 24 Rops, but the 650Ti only has 16 Rops. Going back over some old posts it looks like the new apps use the PCIE a bit less than before but use the GPU memory a bit more. The new apps also favor the super-scalar cards, making the old comparison charts redundant even for CC2.0 vs CC2.1 architectures. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29624 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 29626 - Posted: 1 May 2013, 13:08:35 UTC - in response to Message 29624. - It just occurred to me that the drop from 32 to 24 ROP's could be the issue for the GTX660Ti rather than or as well as the memory bandwidth; the GTX670 has 32 Rops, but the GTX660Ti only has 24 Rops. The GTX660 also has 24 Rops - perhaps a better Rops to Cuda Core ratio. If that is the case then we might also see a similar difference between the "GTX650Ti Boost" and the "GTX650Ti"; the Boost also has 24 Rops, but the 650Ti only has 16 Rops. I'd be interested in seeing the performance of the 650 TI Boost if anyone has some figures for GPUGrid. So far I've been adding 650 TI cards as they seem to run at about 65% of the speed of the 660 TI, are less than 1/2 the cost and are very low power. The 650 TIs I've been adding have very fast memory chips and boosting the memory speed makes them significantly faster. That would make me think that more memory bandwidth might also help. PCIe speed (version) has no effect at all in my tests, but that's on the 650 TI. Can't say for faster cards. Will say though that the 650 TI is about 35-40% faster than the GTX 460 at GPUGrid, yet the 460 is faster at other projects (including OpenCL). As an aside, it's also interesting that the 660 is faster than the 660 TI at OpenCL Einstein, wonder why? ID: 29626 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 29628 - Posted: 1 May 2013, 14:40:23 UTC - in response to Message 29622. Jim1348, is that definitely a 960shader version and not an 1152 OEM card? Is it a 2GB or 3GB card? Yes, 960 shaders at 2 GB; nothing special. http://www.newegg.com/Product/Product.aspx?Item=N82E16814500270 It might be the case that some WU's benefit from the extra memory bandwidth. The bus width of the GTX660Ti is 192bits, the same as the GeForce GTX 660 and GTX650Ti, but because the GTX660Ti has more shaders and SM's it's bandwidth is relatively less. I typically see a 40% memory controller load when running GPUGRid WU's on the GTX660Ti. This is a lot higher than any previous cards I've had. I noticed when I bought it that it had good memory bandwidth, relatively speaking. But another factor is that the GTX 660 runs on a GK106 chip, whereas the GTX 660 Ti uses a GK104 as you noted above. The GK106 is not in general a better chip, but there may be other features of the architecture that favor one chip over another for a given type of work unit. Nvidia sells them mainly for gaming, with number-crunching being an afterthought for them. It could well be that the GK104 runs Noelias better; we won't know until we see. ID: 29628 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29637 - Posted: 2 May 2013, 20:27:14 UTC Some quick measurements / observations: - running my stock config (with GPU-OC and everything) I'm seeing 38% memeory controller load and 86 - 90% GPU utilization - deactivating 8 Einstein WUs on the i7+HT increases power consumption a bit and GPU utilization to 92% - increasing the memory speed by 130 MHz increases power by ~1%, but nothing else changes (from 1.5 GHz to 1.63 GHz, without multiplying by 4 for the DDR clock rate) - decreasing the memory speed by 250 MHz lowers power by about 5-6%, increases the memory controller load to 41%(+/-1%) and increases GPU utilization to 93% It seems like "waiting for memory" does not count as idle time for the GPU and that some of it is happening here. Been running a regular Nathan. MrS Scanning for our furry friends since Jan 2002 ID: 29637 · Rating: 0 · rate: / Reply Quote

flashawk Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level Scientific publications	Message 29639 - Posted: 2 May 2013, 21:48:11 UTC It looks as though you found part of the puzzle. Just for reference, my GTX680's have a 96% GPU utilization and the 670's are at 94%. I'm sure you and sk will figure this out. What about downclocking due to heat? All my rigs are liquid cooled and don't get over 45* C. ID: 29639 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 29645 - Posted: 3 May 2013, 10:07:00 UTC - in response to Message 29639. Last modified: 3 May 2013, 13:26:14 UTC Heat isn't an issue; dual fan card, open case with additional large case fans, 137W used - GTX660Ti presently at 55°C and would rarely reach 60°C. The problem isn't lack of memory either; the 3GB versions don't perform any better running one GPUGrid task at a time. I noticed that even when running two GPUGrid tasks at a time the GPU memory controller load only rose by about 1% even though the GPU utilization rose from around 90% to around 98%. I suspect that the reported memory controller load does not accurately reflect saturation - the full extent of the bottleneck. The way the memory is accessed could be part of the issue. It certainly seems to struggle to go much past 40%. I'm still not fully aware of how the ROP's are used. Since the shaders were trimmed down, a lot now takes place on the GPU cores, so having relatively fewer ROP's could be key to the relatively poor performance. The drop to 75% of the ROP's from the 670 to the 660Ti could be the problem. It's either the memory bandwidth/controller load or the ROP count, or both. I suspect the ROP count is limiting the memory controller load, but I don't know enough about the ROP usage to be confident. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 29645 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 29666 - Posted: 4 May 2013, 9:56:46 UTC - in response to Message 29645. Ah, I forgot to comment on the ROP suggestion yesterday. Well, ROP stands for "render output processor". I've never heard of these units being used in GP-GPU at all, which seems logical to me, since the calculation results wouldn't need to be assembled into a framebuffer the size of the screen. And blended/softened via anti-aliasing.. which you really wouldn't want to do with accurate calculation results :D And if there's a bottleneck in memory bandwidth then this might not need to show up as 100% memory controller utilization. For example the need for new data from memory might not be distributed evenly over time. Could anyone run some comparable tests with the current GPU-Grid app and probably the Nathan long-runs? Preferrably within the same PC. We'd need an exceptionally low-bandwidth GPU like the GT640 or GTX660Ti compared to a regularly balanced one like.. pretty much any other. Over and downclock the memory by a noticable amount (about 10% should be fine.. not sure if my memory could take this as OC, though) and observe the change in performance. Ideally no CPU tasks would be run along with this, so the measurements aren't disturbed by changing CPU load. Run a few units in each configuration, average the runtimes and compare: is the performance change with memory clock significantly higher on the low-bandwidth card? MrS Scanning for our furry friends since Jan 2002 ID: 29666 · Rating: 0 · rate: / Reply Quote

GoodFodder Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level Scientific publications	Message 29675 - Posted: 4 May 2013, 12:31:11 UTC Been following this thread as I recently put together a dedicated folding machine on a tight budget hence been looking for the best 'bang per buck per watt'. My tuppence; I suspect the performance improvement of the gtx 660 over say the gtx 650ti is due to the larger cache size (384K v 256K) rather than bandwidth (gtx 670/680 incidently has 512K). Agree I fail to see how ROPs could affect CUDA performance or the amount of physical mem for that matter for single wu's; personally have not seen mem utilisation above 880mb with the largest of wu's for the past year - would be interesting to see if anybody has seen otherwise. Incidently I also suspect memory latency could play an important part for GpuGrid; increasing mem clock could of course affect reliability as well potentially outputting inaccurate results. For those who may be interested with a budget of 400 Euro I ended up with 2x gtx650 ti (1Gb), G2020 cpu, MSI Z77 matx motherboard (supporting 2x pcie 16 @ 8x), a Fractal 1000 case and a single 4GB DIMM. Recycled an old 300W 80+ PSU and a laptop hdd. Hopefully expecting the overall daily folding output to be similar to a single GTX 680 when OC to 1006/1500 (MemtestG80 tested for 12hrs). Power measured at the plug is around the 180W mark at full load, expected running cost 0.7 Euro/day. Interestingly in my case (Win XP SP3) I found fixing the cpu affinity helped with gpu utilisation which seems opposite to what I have been reading. Using a little tool called imagecfg I set the wu's an exe to run in uniprocessor mode; this appears to have the effect of binding the wu's exe for the run duration to a 'free' core hence reducing context switching. (e.g.'imagecfg.exe -u acemd.2865P.exe') My 650tis are now running at an almost constant 99% which is cool. ID: 29675 · Rating: 0 · rate: / Reply Quote