Tests on GTX680 will start early next week [testing has started]

Author	Message
GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 24304 - Posted: 7 Apr 2012, 14:46:32 UTC - in response to Message 24302. The beta WUs are from before, they don't go out because there is no beta app now. 1) we will upload a NEW application for linux faster for any fermi card and it will work for a gtx680 2) it will be compiled with cuda4.2 3) some days later the same app will be provided for windows 4) later there will be an optimized app for gtx680 for linux and windows Note that we are testing a new app, new cuda and new architecture. Expect some problems and some time. Within 10 days, we should have 1 and 2. Some variation on the plan are also possible. We might put out a cuda3.1 new app for instance. gdf ID: 24304 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24305 - Posted: 7 Apr 2012, 14:58:13 UTC Thank you for detailed post. MUCH appreciation. ID: 24305 · Rating: 0 · rate: /

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 24398 - Posted: 12 Apr 2012, 1:43:39 UTC Last modified: 12 Apr 2012, 1:44:17 UTC Any progress? Could you please share some information about the performance of the GTX 680 perhaps? I'm afraid that the CPU intensive GPUGrid tasks will suffer much more performance penalty on GTX 680 than on GTX 580 (and on the other CC2.0 GPUs). Maybe an Ivy Bridge CPU overclocked to 5GHz could compensate this penalty. ID: 24398 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24399 - Posted: 12 Apr 2012, 5:48:10 UTC Besides that fact Zoltan, which from what I can tell, the CPU is basically what's crippling the 680 across the board for every project. However, I've been steadily re-reading several lines from Anandtech in-depth review about the card itself: 1)Note however that NVIDIA has dropped the shader clock with Kepler, opting instead to double the number of CUDA cores to achieve the same effect, so while 1536 CUDA cores is a big number it’s really only twice the number of cores of GF114 as far as performance is concerned. So if I am correct, and its 12:22 am so give me a break if im wrong, but since we use shader clock, what this means is that if you were to double the cores of 580 to 1024 you would be operating at 772 Mz. (set rops and everything aside, as crazy as that sounds). You know, I can't figure this math out, but what I will say, as posted earlier the primegrid sieve ran 25% faster on 680 (240s vs 300s) Which to me, I just keep looking at the fact that's roughly the same difference between the 1005Mz clock and the 580's 772. Don't really know where i was going with this, or how i was going to get there, but is that why sieve increased by 25%, and there's also the 20% decrease in TDP. Compared to 580, it has 1/3 more cores than 580 (1536 vs 1024), but a 1/3 less ROPS. Again sorry for confused typing, its late, but that 25% increase in clock just kept staring at me. My bet goes to the 25% increase until the optimized app comes out to take more advantage of CC3.0 Good night, and I can't wait to see what happens. ID: 24399 · Rating: 0 · rate: /

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 24400 - Posted: 12 Apr 2012, 7:07:44 UTC - in response to Message 24398. Last modified: 12 Apr 2012, 7:08:04 UTC The kepler optimized application is 25% faster than a gtx580 regardless of the processor for a typical WU. I don't see why the CPU should have any different impact between compared to Fermi. gdf Any progress? Could you please share some information about the performance of the GTX 680 perhaps? I'm afraid that the CPU intensive GPUGrid tasks will suffer much more performance penalty on GTX 680 than on GTX 580 (and on the other CC2.0 GPUs). Maybe an Ivy Bridge CPU overclocked to 5GHz could compensate this penalty. ID: 24400 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24402 - Posted: 12 Apr 2012, 12:33:18 UTC Last modified: 12 Apr 2012, 12:40:58 UTC : ). If they would have kept that ROP at 48 as with the 580, it would have been 50% faster though , but 25% sounds good to me. Keep up the good work guys can't wait til it's released. EDIT. Are you guys testing on pci 2 or 3, I've heard additional increases are coming from this, from what I've seen roughly 5% on other sites. ID: 24402 · Rating: 0 · rate: /

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 24410 - Posted: 12 Apr 2012, 20:01:11 UTC - in response to Message 24402. Last modified: 12 Apr 2012, 20:15:08 UTC Compared to 580, it has 1/3 more cores than 580 (1536 vs 1024), but a 1/3 less ROPS. A GTX 580 has 512 cuda cores and a GTX 680 has 1536. CUDA is different from OpenCL. On several OpenCL projects high CPU requirement appears to be the norm. I would expect a small improvement when using PCIE3 with one GPU. If you have 2 GTX680's in a PCIE2 system that drops from PCIE2 x16 to PCIE2 x8, then the difference would be much more noticeable, compared to a board supporting two PCIE3 x16 lanes. If you're going to get 3 or 4 PCIE3 capable GPU's then it would be wise to build a system that properly supports PCIE3. The difference would be around 35% of one card, on an PCIE3 X16, X16, X8 system compared to a PCIE2 X8, X8, X4 system. For one card it's not really worth the investment. If we are talking 25% faster @ 20% less power, then in terms of performance per Watt the GTX680 is ~50% better than a GTX580. However that doesn't consider the rest of the system. Of the 300W a GTX680 system might use, for example, ~146W is down to the GPU. Similarly, for a GTX580 it would be ~183W. The difference is ~37W. So the overall system would use ~11% less power. If the card can do ~25% more work then the overall system improvement is ~39% in terms of performance per Watt. Add a second or third card to a New 22nm CPU system and factor in the PCIE improvements and the new systems performance per Watt would be more significant, perhaps up to ~60% more efficient. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 24410 · Rating: 0 · rate: /

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 24411 - Posted: 12 Apr 2012, 20:54:44 UTC - in response to Message 24400. Last modified: 12 Apr 2012, 20:55:28 UTC The kepler optimized application is 25% faster than a gtx580 regardless of the processor for a typical WU. It sounds promising, and a little disappointing at the same time (as it is expected). I don't see why the CPU should have any different impact between compared to Fermi. Because there is already 25-30% variation in GPU usage between different type of workunits on my GTX 580. For example NATHAN_CB1 runs at 99% GPU usage while NATHAN_FAX4 runs at only 71-72%. I wonder how much the GPUGrid client could feed a GPU with as many CUDA cores as the GTX 680 has, while it could feed a GTX 580 to run at only 71-72% (and the GPU usage drops as I raise the GPU clock, so the performance is CPU and/or PCIe limited). To be more specific, I'm interested in how much is the GPU usage of a NATHAN_CB1 and a NATHAN_FAX4 on a GTX 680 (and on a GTX 580 with the new client)? ID: 24411 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24414 - Posted: 12 Apr 2012, 23:01:01 UTC Last modified: 12 Apr 2012, 23:12:40 UTC I brought the the core count up to 1024, instead of 512, since I kept trying to figure out the math for what the improvement was going to be. Meaning, if I doubled the core count, I could do away with the shader clock, as they did in kepler. (i know kepler was quadrupled, but in terms of performance it was just doubled) The math SEEMED to work out ok. So, I was working with 1024 cores working at core clock of 772 meant 1/3 more cores on 680 than 580 (adjusted for the doubled shader freq).This led to a difference in shader clock of 23.2% faster for Kepler (772/1005). Which meant (to me and my 0 engineering knowledge), a benefit of 56.6% (increase in amount of cores*increase in adjusted freq) However, since there are 1/3 less ROPs, that got me down to 23.4% (but if I'm not mistaken, the ROP freq. is calc. off core, and I learned this after adjusting for 570, 480 and 470, once i learned the ROP freq i quit trying) What's weird, is that this math kept LOOKING correct the further I went. There was roughly a 45% increase compared to a 570 (as shown on sieve tasks), on a 480 in my math it showed an increase of roughly 35%, but compared to a 470 it jumped to 61%. Again, not an engineer, just someone who had the day off. It strike me as odd though that it seemed to work. But, adding ROPs in may have been the mistake, I honestly don't even know how important they are for what we do. Meaning that since it is coorelated with pixel (again out of my league :) ) it could be like a high memory bandwitdh and not mean as much to us. The 25% increase and 45% were the ones that kept my math skills going, b/c that was what was seen on PPS sieve tasks. Ah, coincidences..... ;) Oh, and I have been looking for mobo that support 3.0 @ x16,x16 but I think Ive only found one that did and it was like $300, however I refuse to get one that doesn't, merely b/c I want everthing at 100% (even if the extra bandwidth isn't used) ID: 24414 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24415 - Posted: 12 Apr 2012, 23:23:24 UTC One more thing. I'm assuming Zoltan meant, as he already explained in relation to GPUgrid WUs, that like Einstein apps, have we hit a."wall" to where the CPU matters more than gpu once you reach a certain point. As per his description some tasks are dependent on a fast CPU,someone in other forum is failing tasks because he has a 470 or a 480 can't remember, in a xeon @ 2.5, which is currently causing him issues. ID: 24415 · Rating: 0 · rate: /

Chatocl Send message Joined: 17 Aug 10 Posts: 1 Credit: 376,817,016 RAC: 0 Level Scientific publications	Message 24417 - Posted: 13 Apr 2012, 2:22:45 UTC - in response to Message 24415. I have a 550 ti and my cpu AMD athlon X4 underclocked to 800 mhz and gpugrid uses only 10% of my cpu (runing in linux) I doubt that cpu can be an issue, at least with gpugrid app in linux ID: 24417 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24419 - Posted: 13 Apr 2012, 2:35:28 UTC FYI. Half of your tasks error out. ID: 24419 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24420 - Posted: 13 Apr 2012, 3:41:48 UTC Oh, and it's not whether or not they'll finish, it's about whether or not the CPU will bottleneck the GPU. I reference Einstein, b/c as mentioned, anything above a 560ti 448 will finish the task in roughly the same GPU time, and what makes the difference in how fast you finish the WU is based off of how fast your CPU is. This can SEVERELY cripple performance. ID: 24420 · Rating: 0 · rate: /

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 24421 - Posted: 13 Apr 2012, 3:50:58 UTC - in response to Message 24162. cuda4.2 is publicly available from nvidia forums but not widely advertised. You need the latest drivers to run on gtx680. gdf Could you mention which of the new drivers allow using it without the sleep bug? ID: 24421 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24423 - Posted: 13 Apr 2012, 4:00:38 UTC Last modified: 13 Apr 2012, 4:03:28 UTC The newest R300 series doesn't have the sleep bug, but it is a beta. The 4.2 came out with 295, so it's either beta or wait til WHQL is released. The beta version is 301.24. Or if possible in your situation, you can tell Windows to Never turn off display. This prevents sleep bug, and you can do whatever you want. ID: 24423 · Rating: 0 · rate: /

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 24424 - Posted: 13 Apr 2012, 4:10:19 UTC - in response to Message 24423. Thanks. Now I'll go look for GTX680 specs to see if it will fit the power limits for my computer room, and the length limits for my computers. ID: 24424 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24430 - Posted: 13 Apr 2012, 20:50:07 UTC Oh, if you're possibly getting a 680, use 301.10 ID: 24430 · Rating: 0 · rate: /

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 24433 - Posted: 14 Apr 2012, 5:02:05 UTC - in response to Message 24424. Thanks. Now I'll go look for GTX680 specs to see if it will fit the power limits for my computer room, and the length limits for my computers. The GTX 680 exceeds both the power limit and the length limit for my computers. I'll have to look for later members of the GTX6nn family instead. ID: 24433 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24436 - Posted: 14 Apr 2012, 14:35:38 UTC Just a friendly reminder about what you're getting with anything less than 680/670 The 660ti will be based off othe 550ti's board. Depending on each users power requirements. I would HIGHLY reccomend waiting for results from said boards, or would reccomend the 500 series. Since a 660ti will most likely have half the cores, and a 15% decrease in clock compared to 580, this could severely cripple other 600 series as far as crunching is concerned. Meaning, a 560Ti 448 and above will, IMO (I can't stress this enough), probably be able to beat a 660Ti when it's released. Again, IMHO. This is as far as speed is concerned. Performace/watt may be a different story, but a 660ti will be based off of a 550ti specs (keep that in mind) As always, Happy Crunching ID: 24436 · Rating: 0 · rate: /

5pot Send message Joined: 8 Mar 12 Posts: 411 Credit: 2,083,882,218 RAC: 0 Level Scientific publications	Message 24437 - Posted: 14 Apr 2012, 21:29:42 UTC Sorry, meant to say half the cores of the 680 in prior statement. Again, this new design is not meant for crunching, and all boards are effectively "1" off, so 660ti = 550ti BOARD. Sorry for the typo. P.S. Hopefully next week Gianni? ID: 24437 · Rating: 0 · rate: /