Message boards : Graphics cards (GPUs) : Maxwell now
Author | Message |
---|---|
There's a rumor going around that Maxwell is coming out next month. I wonder if this was planned of if AMD's sales are hurting them? | |
ID: 34732 | Rating: 0 | rate: / Reply Quote | |
It looks more like a delaying action to hold off AMD until the 20 nm process arrives, probably later than they had originally hoped. A GTX 750 Ti won't set the world on fire in performance, and won't make them a ton of money. But it gives them a chance to see how well the design works in practice, and to give the software developers a head start before the real Maxwell arrives. | |
ID: 34736 | Rating: 0 | rate: / Reply Quote | |
It likely is just a false rumor. No prove has been shown that these cards use Maxwell chips, despite relatively complete benchmarks already appeared. It's probably just GK106 with 768 shaders. | |
ID: 34807 | Rating: 0 | rate: / Reply Quote | |
Producing a Maxwell on 28nm process would be a complete change of direction for NVidia, so I agree this is likely to be a false rumor. There are two revision models (Rev. 2) of GPU's in the GF600 lineup (GT630 and GT640), so perhaps NVidia want to fill out their GF700 range with a lower end card, so if there is a Rev2 version of the GK650Ti en route, it makes more sense to shift it to the GF700 range. | |
ID: 34823 | Rating: 0 | rate: / Reply Quote | |
ARM can already be used to support an OS, so IMO it's inevitable that ARM will bolster their CPU with an NVidia GPU. That's what the market really wants; sufficient CPU processing power to start up and run an OS and a high end GPU for the video-interface, gaming... isn't it? I guess a fairly large part of the market wants that. I would be happy with a motherboard with a BIOS that can boot from PXE, no SuperIO (USB, RS-232, parallel port, PS/2), ISC bus for onboard temperature sensors, no IDE or SATA (no disks), just lots of RAM, an RJ-45 connector and gigabit ethernet, no wifi, enough CPU processing power to startup and run a minimal OS that has a good terminal, SSH and can run BOINC client and project apps. Don't need a desktop or anything to do with a GUI, no TWAIN or printer drivers/daemons, no PnP or printer service, no extra fonts (just a decent terminal and 1 font), network services required, Python or some other scripting language would be nice but not much more. If they could fit all that onto a slightly larger video card I'd be happy, otherwise put it on a 2" x 5" board with a PCIe slot and power connectors and call it a cruncher. Something so no frills IKEA would consider stocking it. What else would be unnecessary... no RTC (get the time off the LAN), no sound, no disk activity LED. ____________ BOINC <<--- credit whores, pedants, alien hunters | |
ID: 34825 | Rating: 0 | rate: / Reply Quote | |
Seems like the cat is out of the bag.. and we were all wrong, as usual for a new generation ;) | |
ID: 35023 | Rating: 0 | rate: / Reply Quote | |
Sounds like a big performance per watt increase will be coming too. I think I'll put planned purchases on hold, build savings and see what the picture looks like 4 months from now. | |
ID: 35024 | Rating: 0 | rate: / Reply Quote | |
That's not what nVidia would like you to do.. but I agree ;) | |
ID: 35029 | Rating: 0 | rate: / Reply Quote | |
For GPUGrid, performance touting is premature - we don't even know if it will work with the current app. It could take 6 months of development and debugging. It took ages before the Titan's worked. | |
ID: 35030 | Rating: 0 | rate: / Reply Quote | |
I still feel like I'm stuck between a rock and a hard place. Haswell e will have an 8 core variant in q3. So this is definitely going to be bought. However, I would like this to be my last system build for more than a year, as pumping 5k annually is something I can not continue. Every other year, sure. | |
ID: 35038 | Rating: 0 | rate: / Reply Quote | |
But with Volta and it's stacked dram.... I'm very cautious about dropping 1.8k+ on gpus that most likely won't be that large of a change. Well see I suppose Volta will still take some time, as GPUs have matured quite a bit (compared to the wild early days of a new chip every 6 months!) and progress is generally slower. That's actually not so bad, because we can keep GPUs longer and the software guys have some time to actually think about using those beasts properly. If you still have Fermis or older running, get rid of them as long as you can still find (casual) gamers willing to pay something for them. If you think about upgrading from Kepler to Maxwell and don't want to spend too much I propose the following: replace 2 Keplers by 1 Maxwell for about the same throughput, which should hopefully be possible with 20 nm and the architectural improvements. This way you don't have to spend as much and reduce power usage significantly (further savings). You throughput won't increase, but so what? If you feel like spending again you could always add another GPU. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 35097 | Rating: 0 | rate: / Reply Quote | |
There is no ARM CPU on the block diagram of the GM107: | |
ID: 35103 | Rating: 0 | rate: / Reply Quote | |
As far as I remember this "ARM on chip" was still a complete rumor. Could well be that someone confused some material about future nVidia server chips with GPU (project Denver) for the regular GPUs. | |
ID: 35104 | Rating: 0 | rate: / Reply Quote | |
I would be a bit concerned about the 1306GFlops rating for the GTX750Ti. That's actually below the GTX650Ti (1420). The 750Ti also has a 128bit bus and bandwidth of 86.4GB/s. While the theoretical GFLOPS/W SP is 21.8, it's still an entry level card; it would talk 4 of these card to have the overall performance of a GTX780Ti. There should be plenty of OC models and potential for these GPU's to boost further. | |
ID: 35114 | Rating: 0 | rate: / Reply Quote | |
I see EVGA are selling a GTX750Ti with a 1268MHz Boost. In theory that's 16.8% faster than the reference model, though I would expect the reference card to boost higher than the quoted 1085MHz (if it works)! | |
ID: 35128 | Rating: 0 | rate: / Reply Quote | |
I have some GTX750Tis on order; should have them in my hands next week. | |
ID: 35139 | Rating: 0 | rate: / Reply Quote | |
I read that the 128bit bus is a bottleneck, but as the card uses 6GHz GDDR5 a 10% OC is a given. The GPU also OC's well (as the temps are low). So these cards could be tweeked to be significantly more competitive than the reference model. | |
ID: 35147 | Rating: 0 | rate: / Reply Quote | |
Don't be fooled by the comparably low maximum Flops. We got many of those with Kepler, and complained initially that we couldn't make proper use of them, as the performance per shader per clock was significantly below non-superscalar Fermis. Now we're going non-superscalar again and gain some efficiency through that, as well as through other tweaks. | |
ID: 35157 | Rating: 0 | rate: / Reply Quote | |
OK no purchases but I would rather a professional or a Ph.D. test the pretend Maxwells so we can be sure of what we're looking at ;-) | |
ID: 35158 | Rating: 0 | rate: / Reply Quote | |
Professional enough? Or shall I search for a review written by someone with a PhD in ancient greek history? ;) | |
ID: 35159 | Rating: 0 | rate: / Reply Quote | |
EVGA (Europe) just announced they have Titan Black and GTX 750 and 750Ti for sale. The latter for 150 euro, really cheap with 2GB. However not in stock so can not be ordered yet, but I won't. | |
ID: 35162 | Rating: 0 | rate: / Reply Quote | |
The GPU memory latency is supposedly better, so the GTX750Ti's memory bandwidth bottleneck might not be as bad as I first suspected. That said, compute performances are a bit 'all over the place'. It's definitely a wait and see situation for here. | |
ID: 35165 | Rating: 0 | rate: / Reply Quote | |
Some more info on Maxwell. | |
ID: 35171 | Rating: 0 | rate: / Reply Quote | |
NVidia are comparing the GTX750Ti to a 3 generation old GTX480 for performance and a GT640 for power usage, but not a GTX650Ti! For some games its roughly equal to a GTX480 and in terms of performance/Watt the GTX750Ti is 1.7times better than a GT640 (and similar). While it is a GX107 product, the name suggests its an upgrade to a GTX650Ti. | |
ID: 35173 | Rating: 0 | rate: / Reply Quote | |
One of my team mates has a couple 750Ti's and they keep failing here. They are running good at Einstein. I have encouraged our team members to post in here that have the new Maxwell cards. | |
ID: 35236 | Rating: 0 | rate: / Reply Quote | |
Thanks. That is important information to share! | |
ID: 35237 | Rating: 0 | rate: / Reply Quote | |
I think I might be the one Coleslaw is referring about. I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds. Einstein seems to run just fine. I bought these cards to run GPUGRID and I'm not to happy that they can't. I'm not that interested in running Einstein and other than F@H there isn't much else out there for GPUs. I refuse to participate in F@H anymore. If you need any info feel free to ask. | |
ID: 35238 | Rating: 0 | rate: / Reply Quote | |
I think I might be the one Coleslaw is referring about. I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds. Einstein seems to run just fine. I bought these cards to run GPUGRID and I'm not to happy that they can't. I'm not that interested in running Einstein and other than F@H there isn't much else out there for GPUs. I refuse to participate in F@H anymore. If you need any info feel free to ask. Yes you are. Thanks for volunteering. :) Gilthanis ____________ | |
ID: 35239 | Rating: 0 | rate: / Reply Quote | |
That's pretty annoying - it likely means that we'll not be able to use them until CUDA 6 goes public. Matt | |
ID: 35240 | Rating: 0 | rate: / Reply Quote | |
Sorry for being annoying. | |
ID: 35241 | Rating: 0 | rate: / Reply Quote | |
I bought these cards to run GPUGRID and I'm not to happy that they can't. Well, it's a new architecture (or more precisely: significantly tweaked and rebalanced) so some "unexpected problems" can almost be expected. Be a bit patient, I'm sure this can be fixed. Maxwell is the new architecture for all upcoming nVidia chips in the next 1 - 2 years, after all. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 35270 | Rating: 0 | rate: / Reply Quote | |
Don't worry, support is coming just as soon as possible. These new cards are very exciting for us! Unfortunately, because of the way we build our application, we need to wait for the next version of CUDA which contains explicit Maxwell support. The other GPU-using projects don't have this limitation because they build their applications in a different way. The other side of that is that they aren't able to make GPU-specific optimisations the way we do. Matt | |
ID: 35272 | Rating: 0 | rate: / Reply Quote | |
Don't worry, support is coming just as soon as possible. These new cards are very exciting for us! Unfortunately, because of the way we build our application, we need to wait for the next version of CUDA which contains explicit Maxwell support. I hope Nvidia comes out with the new version of CUDA soon. I expect GPUs that use much less electricity will become very popular quickly. I'll be switching back to GPUGRID when you get the Maxwell compatible client out. | |
ID: 35275 | Rating: 0 | rate: / Reply Quote | |
The new Haswell E 6-core is available in the Netherlands, but pricy. Any idea when the real Maxwell is launched. I read "soon" on the net in some articles, but did not find any date. | |
ID: 37914 | Rating: 0 | rate: / Reply Quote | |
The new Haswell E 6-core is available in the Netherlands, but pricy. Any idea when the real Maxwell is launched. I read "soon" on the net in some articles, but did not find any date. Rumor has it- GTX980 (or whatever board will be called) will be showcased (or released) at NVidia's Game24 event on September 18th, along with a 343 branch driver. GTX 970/960 could be released by early/mid October. Leaked benchmarks (if there not fake) show GM204 Maxwell to be at reference GTX780ti (5teraFlops) performance levels with a lower TDP. Maxwell's Integer/256AES/TMU/ROP performance is higher then Kelper's core. GTX 980 will have 256bit memory interface. Float (double/single) will be similar to a disabled DP core GK110 (GTX780/780ti) cards. A Titan with 64DP core SMX enabled for double precision tasks won't be replaced until another Maxwell stack is created for Titan's market position. A dual Maxwell board with 11/12 single teraflops' and 3/4 Teraflops for double would be an ultimate board. | |
ID: 37915 | Rating: 0 | rate: / Reply Quote | |
| |
ID: 37919 | Rating: 0 | rate: / Reply Quote | |
Thanks for this Jozef J. | |
ID: 37920 | Rating: 0 | rate: / Reply Quote | |
So here is deciding which card would be best for GPUgrid The GTX 780Ti is superscalar, so not all of the 2880 CUDA cores can be utilized by the GPUGrid client. The actual number of the utilized CUDA cores of the GTX 780Ti is somewhere between 1920 and 2880 (most likely near the lower end). This could be different for each workunit batch. If they really manufacture the GM204 on 28nm lithography, than this is only a half step towards a new GPU generation. The performance per power ratio will be slightly better of the new GPUs, and (if the data in this chart are correct) I expect the GTX980 could be 15~25% faster than the GTX780Ti (here at GPUGrid). When we'll have the real GPUGrid performance of the GTX980, we'll know how much of the 2880 CUDA cores of the GTX780Ti is actually utilized by the GPUGrid client. But as NVidia choose to move back to scalar architecture, I expect that the superscalar architecture of the Keplers (and the later Fermis) wasn't as successful as expected. | |
ID: 37921 | Rating: 0 | rate: / Reply Quote | |
So here is deciding which card would be best for GPUgrid Is NVidia skipping 20nm for 16nm? After couple years of development, TSMC is struggling badly to find proper Die size(s) for 20nm. Nvidia changes lithography every two years or so. Now after Two and half years, boards are still at 28nm, after three series releases(600,700,800m) of 28nm generations, while GTX980 will be the fourth 28nm released. What could be the problem with finding a pattern to fit cores on 20nm? The change from superscalar to scalar? How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times. Is Maxwell's cache sub system architecture, TMU rendering that much better than Kelper's, running GPUGRID code? Maxwell's core architecture may be more efficient than Kepler's, but is Maxwell's really more advanced, when Float processing is similar to Kelper? Maxwell Integer performance is higher, due to having more integer cores in SMM vs. SMX, and the added barrel shifter, which is missing in Kepler. | |
ID: 37923 | Rating: 0 | rate: / Reply Quote | |
How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times. That's very easy to answer: The SMXes of the GTX650Ti and the GTX660 are superscalar, so only (approximately) 2/3rd of their cores can be utilized (512 and 640, respectively). | |
ID: 37924 | Rating: 0 | rate: / Reply Quote | |
How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times. If this is the case, then why do GPU utilization (MSI afterburner, eVGA precision) programs show +90% for most GPUGRID tasks? Are these programs not accounting for type of (scalar or superscalar) architecture? If only 2/3rd of cores are active, won't GPU utilization be at ~66%, instead of the typical 90%? These programs are capable of monitoring Bus usage, memory control (frame buffer), Video processing, amount of power, and much more. | |
ID: 37925 | Rating: 0 | rate: / Reply Quote | |
I estimate that the new GM204 will be about 45% faster than a 780ti. | |
ID: 37926 | Rating: 0 | rate: / Reply Quote | |
How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times. The "GPU utilization" is not equivalent of the "CUDA cores utilization". These monitoring utilities are right in showing that high GPU utilization, as they showing the utilization of the untis which feeding the CUDA cores with work. I think the actual CUDA cores utilization can't be monitored. | |
ID: 37928 | Rating: 0 | rate: / Reply Quote | |
I estimate that the new GM204 will be about 45% faster than a 780ti. That's a bit optimistic estimation as (1216/928)*(16/15)=1.3977. but... 1. my GTX780Ti is always boosting to 1098MHz, 2. the 1219MHz boost clock seems to be a bit high, as the GTX750Ti's boost clock is only 1085MHz, and it's a lesser chip. We'll see it soon. BTW there's an error in the chart, as the GTX780Ti has 15*192 CUDA cores. | |
ID: 37929 | Rating: 0 | rate: / Reply Quote | |
[url]http://images.anandtech.com/doci/7764/SMMrecolored_575px.png | |
ID: 37931 | Rating: 0 | rate: / Reply Quote | |
http://images.anandtech.com/doci/7764/SMMrecolored_575px.png | |
ID: 37933 | Rating: 0 | rate: / Reply Quote | |
http://images.anandtech.com/doci/7764/SMMrecolored_575px.png Thank you for fixing links. | |
ID: 37934 | Rating: 0 | rate: / Reply Quote | |
The GTX980 does quite well in the Folding@home benchmarks. | |
ID: 37941 | Rating: 0 | rate: / Reply Quote | |
The GTX980 does quite well in the Folding@home benchmarks. Wow! Than it's possible that the GTX980's performance improvement over the GTX780Ti will be in the 25-45% range. | |
ID: 37942 | Rating: 0 | rate: / Reply Quote | |
The GTX980 does quite well in the Folding@home benchmarks. Well if I read the graph correct then its 6.2 and the 780Ti is 11. The GTX 980 is available in the Netherlands and about €80 cheaper then a GTX 780Ti. However no EVGA boards are available yet. I am anxious to see the results of GTX 980 here. ____________ Greetings from TJ | |
ID: 37943 | Rating: 0 | rate: / Reply Quote | |
The GTX980 does quite well in the Folding@home benchmarks. Double precision performance is lower but SP which is most common for folding is higher. | |
ID: 37946 | Rating: 0 | rate: / Reply Quote | |
OK guys, how's got the first one up and running? I'd like to pull the trigger on a GTX970, but would prefer to know beforehand that it works as good, or better than expected, over here. | |
ID: 37948 | Rating: 0 | rate: / Reply Quote | |
Well I am waiting on the 20nm Maxwell. However I will buy a GTX980 as soon as there is one from EVGA with 1 radial fan, to replace my GTX770. As soon as I have it installed I will let you all know. | |
ID: 37949 | Rating: 0 | rate: / Reply Quote | |
Hopefully they develop an XP Driver too for these. | |
ID: 37951 | Rating: 0 | rate: / Reply Quote | |
Aha you changed the name of the thread ETA? Yes :) MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 37952 | Rating: 0 | rate: / Reply Quote | |
Hopefully they develop an XP Driver too for these. The 344.11 driver is released for Windows XP (and for x64 too), and there are GTX 980 and 970 in it (I've checked the nv_dispi.inf file). However if you search for drivers on the NVidia homepage, it won't display any results for WinXP / GTX 980. | |
ID: 37953 | Rating: 0 | rate: / Reply Quote | |
GTX970 with driver 344.11 Stderr output I also tried with latest beta driver 344.16, but had same error. Is that a problem with my computer, or is GTX970 not supported yet? | |
ID: 37954 | Rating: 0 | rate: / Reply Quote | |
GTX970 with driver 344.11 I think that the problem is the Compute Capability 5.2 is not supported yet. | |
ID: 37956 | Rating: 0 | rate: / Reply Quote | |
I've sent Matt a PM. Hopefully this is easy to fix! | |
ID: 37958 | Rating: 0 | rate: / Reply Quote | |
ext2097, did you also try other projects like Einstein@Home? | |
ID: 37962 | Rating: 0 | rate: / Reply Quote | |
It's a new compute capability. CC 5.2 | |
ID: 37963 | Rating: 0 | rate: / Reply Quote | |
SETI@home - http://setiathome.berkeley.edu/results.php?hostid=7376773 | |
ID: 37965 | Rating: 0 | rate: / Reply Quote | |
It's a bit off topic, but it's a quite interesting Maxwell advertisement: | |
ID: 37966 | Rating: 0 | rate: / Reply Quote | |
It's a bit off topic, but it's a quite interesting Maxwell advertisement: Cool. However, the folks at the flat earth society are not impressed with Nvidia's effort. :) http://forum.tfes.org/index.php?topic=1914.0 While we are waiting for GPUgrid GTX980/970 numbers, F@H performance looks encouraging. http://forums.evga.com/Someone-needs-to-post-980-andor-970-folding-numbers-here-when-they-get-one-m2218148.aspx | |
ID: 37969 | Rating: 0 | rate: / Reply Quote | |
Ok gang, | |
ID: 37972 | Rating: 0 | rate: / Reply Quote | |
..or not. The current CUDA release seems not to support that architecture yet. | |
ID: 37974 | Rating: 0 | rate: / Reply Quote | |
Have you tried this yet?? [url] https://developer.nvidia.com/cuda-downloads-geforce-gtx9xx Driver 343.98 is included offering support for C.C 5.2 cards (GTX980/970) [/url] | |
ID: 37975 | Rating: 0 | rate: / Reply Quote | |
Yeah, looked straight through that. | |
ID: 37976 | Rating: 0 | rate: / Reply Quote | |
It's a bit off topic, but it's a quite interesting Maxwell advertisement: NVidia showing off VX Global illuminati... err illumination power. The Metro series Redux games utilize this tech with Kepler and above generations. (Even X-box/PS4 ports) GM204 technical advances compared to Kelper is rather striking. Fermi 480/580 compared to 680: GPU-GRID performance jump won't be nearly the GTX980 jump compared to GTX 680. Filtering and sorting performance for images, or to create programs with atoms and DNA/RNA strands- are higher than Kelper. Maxwell also has more internal memory bandwidth, and Third generation Color Compression enhancements offers more ROP performance, with better Cache latency. A single GM204 CUDA core is 25-40% faster compared to a single CUDA core Gk104, from the added atomic memory enhancements, and more registers for a thread. For Gaming: a GTX 980 is equal to [2]GTX 680. In every area GP(GPU) performance jump from GK104 to GM204 is significant. GK110 only now offers higher TMU performance, but without new type of filtering offer by GM204, unless Nvidia offers this for the Kelper Generation. GK110 has higher Double precision. A full fat Maxwell (?20nm/16nm FinFET ?/250W/3072~CUDA cores)that offers 1/3 SP/DP core ratio like GK110 is going to be a card for the ages. (Will first be a Tesla, Quadro, Titan variant?) Maybe AMD will come out with board shortly to challenge GM204 with a similar power consumption. Continuous performance competition should raise standards for each company. These next few years are key. GM204 replaces the GK104 stack, not GK110 [GTX 780](ti)disabled 64DP core SMX stack.(Driver allows 8 be active in SMX) GM204 per nm transistor density is only few percent more than GK104 (3.54B transistors/294mm2) and (7.1B transistors/551mm2) for GK110. Kepler's GK110 (4500-6000GFLOPS Desktop card are near [Single]20GFOLPS/W, while GK104 (2500-3500GFLOPS) is 15-18GFLOPS/W depending on clock speeds and voltage. Maxwell GM107(1400~FLOPS) is 20-23GFLOPS/W. Maxwell GM204(5000~FLOPS) is 30-35GFLOPS/W, depending on voltage and GDDR5/CORE speeds. GK104 highest rated mobile (GTX880m/2900~FLOPS) card is 27-29GFLOPS/W. GM204 compute time/ power usage ratios with a new app will be world class compared to more power hungry cards. Crunchers whose States/Countries higher taxes rates for power bills- a GTX970/980 is top notch choice. | |
ID: 37978 | Rating: 0 | rate: / Reply Quote | |
Yeah, looked straight through that. 6.5.19 for 343.98 driver. Are 344 drivers the same? Updated Documents for PTX, programming guide, many others are included with 343.98/6.5 CUDA SDK. Before updating to 6.5.19 driver , I completed GPU-GRID tasks was CUDA 6.5.12 Linux, also has a new CUDA 6.5 driver also for download. | |
ID: 37979 | Rating: 0 | rate: / Reply Quote | |
Right. New app version cuda65 for acemdlong. Windows only, needs driver version 344. | |
ID: 37981 | Rating: 0 | rate: / Reply Quote | |
I ordered a GTX980 from Newegg. Zotac and gigabyte were my only options so I went with gigabyte. All other manufacturers cards were "out of stock". | |
ID: 37983 | Rating: 0 | rate: / Reply Quote | |
Time to replace my trusty GTX 460, which has been GPUGrid-ing for years! At ~£100 the GTX 750Ti fits my budget nicely but I need some guidance. I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds. This quote is from January. I hope the problem has been fixed! | |
ID: 37984 | Rating: 0 | rate: / Reply Quote | |
The 750tis are great and work just fine. | |
ID: 37985 | Rating: 0 | rate: / Reply Quote | |
I ordered a GTX980 from Newegg. Zotac and gigabyte were my only options so I went with gigabyte. All other manufacturers cards were "out of stock". Good luck with your card biodoc. In the Netherlands only Asus and MSI, but I will wait for EVGA. They are not out of stock, but just in production. A few weeks more is no problem. Moreover I first want to see some results. ____________ Greetings from TJ | |
ID: 37986 | Rating: 0 | rate: / Reply Quote | |
http://www.techpowerup.com/ | |
ID: 37989 | Rating: 0 | rate: / Reply Quote | |
Does anybody already have a working GTX 980 or 970? | |
ID: 37990 | Rating: 0 | rate: / Reply Quote | |
ext2097, would you mind giving GPU-Grid another try? | |
ID: 37991 | Rating: 0 | rate: / Reply Quote | |
I had a CUDA6.5 task on one of my GTX680s, but it's failed after 6 sec with the following error: # The simulation has become unstable. Terminating to avoid lock-up (1) 40x35-NOELIA_5bisrun2-2-4-RND5486_0Does anybody had a successful CUDA6.5 task on any older card? | |
ID: 37992 | Rating: 0 | rate: / Reply Quote | |
And there's another failed CUDA6.5 task on my GTX780Ti: # Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2) | |
ID: 37993 | Rating: 0 | rate: / Reply Quote | |
There were two more CUDA6.5 workunits on my GTX780Ti, both of them failed the same way: # Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2) I16R23-SDOERR_BARNA5-32-100-RND7031_0 I12R83-SDOERR_BARNA5-32-100-RND2687_0 Now this host received a CUDA6.0 task, so I don't want to try again, but I think that the CUDA6.5 app has a bug. | |
ID: 37994 | Rating: 0 | rate: / Reply Quote | |
I ordered a GTX980 from Newegg. Zotac and gigabyte were my only options so I went with gigabyte. All other manufacturers cards were "out of stock". Yes, my preference would have been EVGA or PNY (lifetime warranty but fixed core voltage). This will be my first Gigabyte card so I hope it works out. The F@H numbers sold me. I think Nvidia GPU performance on F@H generally translates to GPUGrid. Besides, I usually spend the month of December folding, so the GTX980 will be a nice companion to my 780Ti. | |
ID: 37995 | Rating: 0 | rate: / Reply Quote | |
I have more failed CUDA6.5 tasks on my GTX680: # The simulation has become unstable. Terminating to avoid lock-up (1) Actually all CUDA6.5 tasks are failing on my GTX680 (OC) and GTX780Ti (Non-OC) | |
ID: 37996 | Rating: 0 | rate: / Reply Quote | |
I haven't found any successfully finished CUDA6.5 tasks (obviously these are too fresh). # Simulation unstable. Flag 11 value 1
# The simulation has become unstable. Terminating to avoid lock-up
# The simulation has become unstable. Terminating to avoid lock-up (2) I2R31-SDOERR_BARNA5-30-100-RND8191_0 (GTX770 OC) I11R57-SDOERR_BARNA5-32-100-RND3266_0 (GTX770 non-OC) # The simulation has become unstable. Terminating to avoid lock-up (1) | |
ID: 37997 | Rating: 0 | rate: / Reply Quote | |
I just had one of these errors on a 780Ti as well. | |
ID: 37998 | Rating: 0 | rate: / Reply Quote | |
I7R110-SDOERR_BARNA5-32-100-RND5097_1 10082727 181608 22 Sep 2014 | 17:59:10 UTC 23 Sep 2014 | 0:04:53 UTC Error while computing 6.07 2.70 --- Long runs (8-12 hours on fastest card) v8.41 (cuda65) | |
ID: 37999 | Rating: 0 | rate: / Reply Quote | |
Jozef, there's a CUDA 6.0 task among these. Maybe you need to reboot the host after those CUDA 6.5 failures? Or is it running normally again? | |
ID: 38000 | Rating: 0 | rate: / Reply Quote | |
CUDA 65s should only have been going to the new Maxwells, sorry about that. | |
ID: 38001 | Rating: 0 | rate: / Reply Quote | |
I had two CUDA65 tasks, but same "FATAL: cannot find image for module [.nonbonded.cu.] for device version 520" error. | |
ID: 38002 | Rating: 0 | rate: / Reply Quote | |
Right. The new 65 app is failing for non-obvious reasons, so I've moved it to the acemdbeta queue. If you have a GTX9x0, please get some work from that queue. | |
ID: 38003 | Rating: 0 | rate: / Reply Quote | |
I have GTX970 and checked at "Run test applications?" and "ACEMD beta", but BOINC says "No tasks are available for ACEMD beta version". | |
ID: 38004 | Rating: 0 | rate: / Reply Quote | |
I have GTX970 and checked at "Run test applications?" and "ACEMD beta", but BOINC says "No tasks are available for ACEMD beta version". On the GPUGRID preference page, about in the middle is a option called: Run test applications? You have to set it to yes as well. But perhaps you did then sorry about this post. ____________ Greetings from TJ | |
ID: 38005 | Rating: 0 | rate: / Reply Quote | |
Use NVIDIA GPU : yes | |
ID: 38006 | Rating: 0 | rate: / Reply Quote | |
Should I open that ESD bag? :) | |
ID: 38016 | Rating: 0 | rate: / Reply Quote | |
These are two GK104/GK110 time from you're hosts to compared with new GM204. | |
ID: 38017 | Rating: 0 | rate: / Reply Quote | |
The good news is that I've successfully installed the GTX980 under Windows XP x64. 23/09/2014 19:38:23 | GPUGRID | Requesting new tasks for NVIDIA
23/09/2014 19:38:25 | GPUGRID | Scheduler request completed: got 0 new tasks
23/09/2014 19:38:25 | GPUGRID | No tasks sent
23/09/2014 19:38:25 | GPUGRID | No tasks are available for ACEMD beta version
23/09/2014 19:38:25 | GPUGRID | No tasks are available for the applications you have selected.
23/09/2014 19:41:53 | GPUGRID | update requested by user
23/09/2014 19:41:55 | GPUGRID | Sending scheduler request: Requested by user.
23/09/2014 19:41:55 | GPUGRID | Requesting new tasks for NVIDIA
23/09/2014 19:41:57 | GPUGRID | Scheduler request completed: got 0 new tasks Before you ask: I did all the necessary settings. | |
ID: 38018 | Rating: 0 | rate: / Reply Quote | |
Seems like Matt has to fill the beta queue, or already got enough failed results from the batch he submitted :p | |
ID: 38019 | Rating: 0 | rate: / Reply Quote | |
Seems like Matt has to fill the beta queue, or already got enough failed results from the batch he submitted :p According to the server status page, there are 100 unsent beta workunits, and the application page shows that there is only the v8.42 CUDA6.5 beta app. Somehow these didn't bound together. I think this could be another scheduler issue. | |
ID: 38022 | Rating: 0 | rate: / Reply Quote | |
Nice card you have there Zoltan. Would love to see the results as soon as Matt has got it working. | |
ID: 38024 | Rating: 0 | rate: / Reply Quote | |
Nice card you have there Zoltan. Would love to see the results as soon as Matt has got it working. I second that! BTW: what are you currently running on the card? Any results from other projects to share? :) MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38026 | Rating: 0 | rate: / Reply Quote | |
Thank you TJ & ETA! | |
ID: 38027 | Rating: 0 | rate: / Reply Quote | |
Thank you TJ & ETA! Any comment on you're phenomenal card's wattage usage for tasks, or temps? | |
ID: 38028 | Rating: 0 | rate: / Reply Quote | |
Any comment on you're phenomenal card's wattage usage for tasks, or temps? It's awesome! :) The Einstein@home app is CUDA3.2 - ancient in terms of GPU computing, as this version is released for the GTX 2xx series - so the data you've asked for is almost irrelevant, but here it is: Ambient temperature: 24.8°C Task: p2030.20140610.G63.60-00.95.S.b6s0g0.00000_3648_1 Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-cuda32-nv301) GPU temperature: 53°C GPU usage: 91-92% (muhahaha) GPU wattage: 90W (the difference between the idle GPU and the GPU in use, but the CPU is consuming a little to keep the GPU busy) GPU clock: 1240MHz GPU voltage: 1.218V GPU power 55% | |
ID: 38030 | Rating: 0 | rate: / Reply Quote | |
Any comment on you're phenomenal card's wattage usage for tasks, or temps? Integer task? 90 watts (91-92%) for 1024 cores at 3.2 CUDA API shows Maxwell(2) GM204 internal core structure enhancements. Other components will be under less "stress" from energy usage drop. Percentage of taxes risen is off. Any efficiency updates help. Running 24/7 for weeks/months/years at time-- 250TDP card or 175TDP? 50W-105W TDP GM204 wattage change compared to 225W/250W GK110? 145TDP for 1664 core GTX970. GTX980 TDP 30 watts away from a 6/8 core Haswell-E @140watts. (A few 6/8 core E5 Haswell Xeons are 85W) Having multiple cards- energy savings add up. Higher MB/PSU efficiency, included. | |
ID: 38031 | Rating: 0 | rate: / Reply Quote | |
I think I know why we don't receive beta tasks. | |
ID: 38032 | Rating: 0 | rate: / Reply Quote | |
Zoltan - how many WU are You crunching at once at Einstein? | |
ID: 38036 | Rating: 0 | rate: / Reply Quote | |
Zoltan - how many WU are You crunching at once at Einstein? Only one. Now I've changed my settings to run two simultaneously, but the power consumption haven't changed, only the GPU usage risen to 97%. | |
ID: 38039 | Rating: 0 | rate: / Reply Quote | |
May I quote all this data at Einstein forum? | |
ID: 38040 | Rating: 0 | rate: / Reply Quote | |
May I quote all this data at Einstein forum? Sure. | |
ID: 38041 | Rating: 0 | rate: / Reply Quote | |
Any comment on you're phenomenal card's wattage usage for tasks, or temps? The Einstein numbers look great. Congrats on the new card Zoltan! | |
ID: 38042 | Rating: 0 | rate: / Reply Quote | |
What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS. | |
ID: 38045 | Rating: 0 | rate: / Reply Quote | |
What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS. Could somebody running a Maxwell-aware version of BOINC check and report this, please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42 We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA. | |
ID: 38048 | Rating: 0 | rate: / Reply Quote | |
No idea why the scheduler wasn't giving out the 842 beta app. Look out for 843 now. | |
ID: 38049 | Rating: 0 | rate: / Reply Quote | |
no, that's deliberate. It's a Maxwell-only build | |
ID: 38050 | Rating: 0 | rate: / Reply Quote | |
There's now a linux build on acemdbeta. You'll definitely be needing to use a Linux client that reports the right driver version. | |
ID: 38051 | Rating: 0 | rate: / Reply Quote | |
No idea why the scheduler wasn't giving out the 842 beta app. Look out for 843 now. I still could no get beta work. 24/09/2014 18:16:35 | GPUGRID | update requested by user
24/09/2014 18:16:38 | GPUGRID | Sending scheduler request: Requested by user.
24/09/2014 18:16:38 | GPUGRID | Requesting new tasks for NVIDIA
24/09/2014 18:16:41 | GPUGRID | Scheduler request completed: got 0 new tasks
24/09/2014 18:16:41 | GPUGRID | No tasks sent
24/09/2014 18:16:41 | GPUGRID | No tasks are available for ACEMD beta version
24/09/2014 18:16:41 | GPUGRID | No tasks are available for the applications you have selected. | |
ID: 38052 | Rating: 0 | rate: / Reply Quote | |
I gave Folding@home a try, and the power consumption risen by 130W when I started folding on the GPU (GTX980) only. When I started folding on the CPU also, the power consumption went up by 68W (Core i7-870@3.2GHz, 7 threads). | |
ID: 38054 | Rating: 0 | rate: / Reply Quote | |
Thanks Zoltan! Those numbers are really encouraging and show GM204 power consumption to be approximately where we expected them to be. This is in stark contrast to the ~250 W THG has measured under "some GP-GPU load". Maybe it was FurMark? With these results we can rest assured that the cards won't draw more than their power target to run GPU-Grid. | |
ID: 38055 | Rating: 0 | rate: / Reply Quote | |
What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS. GPU info in sched_request/(projects)file/ or slot init_data file. Also, client_state provides working size? | |
ID: 38056 | Rating: 0 | rate: / Reply Quote | |
Time to replace my trusty GTX 460, which has been GPUGrid-ing for years! At ~£100 the GTX 750Ti fits my budget nicely but I need some guidance. Something you didn't mention: If possible, get one that blows the hot air out of the case rather than blowing it around within the case. That should reduce the temperature for both the graphics board and the CPU, and therefore make both of them last longer. | |
ID: 38057 | Rating: 0 | rate: / Reply Quote | |
Something I'm having trouble finding: How well do the new cards using PCIE3 work if the motherboard has only PCIE2 sockets? | |
ID: 38058 | Rating: 0 | rate: / Reply Quote | |
Something I'm having trouble finding: How well do the new cards using PCIE3 work if the motherboard has only PCIE2 sockets? Physically, the sockets are the same. Electrically, they're compatible. | |
ID: 38059 | Rating: 0 | rate: / Reply Quote | |
Something I'm having trouble finding: How well do the new cards using PCIE3 work if the motherboard has only PCIE2 sockets? We'll find out when there will be a working GPUGrid app, as I will move my GTX 980 to another host wich has PCIe3. | |
ID: 38060 | Rating: 0 | rate: / Reply Quote | |
And while we're at it: what about memory controller load? Folding@home: 23% Einstein@home 2 tasks: 62-69% (Perseus arm survey/BRP5 & Arecibo, GPU/BRP4G) Einstein@home 1 task : 46-48% (Perseus arm survey/BRP5) Einstein@home 1 task : 58-64% (Arecibo, GPU/BRP4G) | |
ID: 38061 | Rating: 0 | rate: / Reply Quote | |
I've got my GTX980 running on linux. | |
ID: 38062 | Rating: 0 | rate: / Reply Quote | |
Just got two GTX 980's, will install tomorrow. Should be interesting to see how we go! | |
ID: 38066 | Rating: 0 | rate: / Reply Quote | |
What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS. Here's what boinc 7.4.22 (64bit-linux version) is reporting: Starting BOINC client version 7.4.22 for x86_64-pc-linux-gnu CUDA: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, CUDA version 6.5, compute capability 5.2, 4096MB, 3557MB available, 4979 GFLOPS peak) OpenCL: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, device version OpenCL 1.1 CUDA, 4096MB, 3557MB available, 4979 GFLOPS peak) | |
ID: 38067 | Rating: 0 | rate: / Reply Quote | |
What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS. OpenCl 1.1 ! Spec from 2010 (Fermi) 2.0 OpenCL spec been released for almost a year. This is Nvidia telling Intel and AMD, they don't give a hoot about OpenCL, because of CUDA. | |
ID: 38068 | Rating: 0 | rate: / Reply Quote | |
............. Any more thoughts on when we might see a revised app for the 980 - mine looks very nice, but I'd like to put it to work!! | |
ID: 38069 | Rating: 0 | rate: / Reply Quote | |
Just got two GTX 980's, will install tomorrow. Should be interesting to see how we go! We're waiting for a working app, so prepare a spare project for awhile. | |
ID: 38070 | Rating: 0 | rate: / Reply Quote | |
There's now a linux build on acemdbeta. You'll definitely be needing to use a Linux client that reports the right driver version. I've got the latest boinc client for linux but am still getting no tasks for my GTX 980. Thu 25 Sep 2014 07:29:53 AM EDT | | Starting BOINC client version 7.4.22 for x86_64-pc-linux-gnu Thu 25 Sep 2014 07:29:53 AM EDT | | log flags: file_xfer, sched_ops, task Thu 25 Sep 2014 07:29:53 AM EDT | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3 Thu 25 Sep 2014 07:29:53 AM EDT | | Data directory: /home/mark/BOINC Thu 25 Sep 2014 07:29:53 AM EDT | | CUDA: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, CUDA version 6.5, compute capability 5.2, 4096MB, 3566MB available, 4979 GFLOPS peak) Thu 25 Sep 2014 07:29:53 AM EDT | | OpenCL: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, device version OpenCL 1.1 CUDA, 4096MB, 3566MB available, 4979 GFLOPS peak) Thu 25 Sep 2014 09:48:27 AM EDT | GPUGRID | Sending scheduler request: Requested by user. Thu 25 Sep 2014 09:48:27 AM EDT | GPUGRID | Requesting new tasks for NVIDIA GPU Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | Scheduler request completed: got 0 new tasks Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | No tasks sent Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | No tasks are available for ACEMD beta version Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | No tasks are available for the applications you have selected. | |
ID: 38071 | Rating: 0 | rate: / Reply Quote | |
Regarding the power consumption of the new BigMaxwell: | |
ID: 38073 | Rating: 0 | rate: / Reply Quote | |
Regarding the power consumption of the new BigMaxwell: Project 7621 uses the GPU "core 15" (Fahcore:0x15) version which is the oldest GPU client and runs exclusively on Windows machines. Those Wus generally run hot and use very little CPU as you've noticed. They are fixed credit WUs so PPD is low. Core 17 WUs are more efficient since they use a more recent version of openMM and are distributed to both windows and linux via an OpenCL app. They generally use 100% of a core on machines with an Nvidia card. These WUs offer a quick return bonus (QRB) and are very popular because the faster the card the higher the bonus. My GTX980 has finished several 9201 project (core17) WUs and is averaging 330,000 ppd. Amazing. Linux users have an advantage in that only core 17 WUs are delivered to linux machines. There are core 18 WUs now available to windows users. I don't know anything about them yet. | |
ID: 38074 | Rating: 0 | rate: / Reply Quote | |
The acemd.841-65.exe file is 3.969.024 bytes long, but the acemd.842-65.exe is only 1.112.576 bytes long, so something went wrong with the latter. I've made my BOINC manager to start this acemd.842-65.exe as an acemd.841-60.exe by overwriting the latter and setting <dont_check_file_sizes> in the cc_config.xml, and I've modified the client_state.xml to copy the cudart32_65.dll and the cufft32_65.dll to the slot with the app, but I've got the same result as before with the 841-65 client. #SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 520 http://www.gpugrid.net/result.php?resultid=13130843 http://www.gpugrid.net/result.php?resultid=13132835 http://www.gpugrid.net/result.php?resultid=13135543 | |
ID: 38075 | Rating: 0 | rate: / Reply Quote | |
I have data comparing my 780Ti with the 980 at Folding@home. | |
ID: 38079 | Rating: 0 | rate: / Reply Quote | |
Well, I've not fixed the scheduler, but would you like to try that trick again with the new version 844? | |
ID: 38080 | Rating: 0 | rate: / Reply Quote | |
Well, I've not fixed the scheduler, but would you like to try that trick again with the new version 844? At once, sire. :) | |
ID: 38081 | Rating: 0 | rate: / Reply Quote | |
Well, I've not fixed the scheduler, but would you like to try that trick again with the new version 844? ...aaaaand we have a lift-off! It's crunching. # GPU [GeForce GTX 980] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:03:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r343_98 : 34411
# GPU 0 : 41C
# GPU 0 : 43C
# GPU 0 : 44C
# GPU 0 : 46C
# GPU 0 : 47C
# GPU 0 : 49C
# GPU 0 : 50C
# GPU 0 : 52C
# GPU 0 : 53C
# GPU 0 : 54C
# GPU 0 : 55C
# GPU 0 : 56C
# GPU 0 : 57C
# GPU 0 : 58C
# GPU 0 : 59C
# GPU 0 : 60C
# GPU 0 : 61C
# GPU 0 : 62C
# GPU 0 : 63C 709-NOELIA_20MGWT-1-5-RND4766_0 GPU usage: 93-97% (CPU 100%, PCIe2.0x16) GPU power 93% (~160W increase measured at the wall outlet) GPU temperature: 64°C (ambient: 24°C) GPU memory controller load: 50% GPU memory used: 825MB GPU voltage: 1.218V GPU core clock: 1240MHz I estimate it will take 19.200 sec to finish this workunit (5h20m), which is more than it takes on a GTX780Ti (16.712), so I really should move this card to another host with PCIe3.0. | |
ID: 38082 | Rating: 0 | rate: / Reply Quote | |
Good news that the app is working but disappointing performance. | |
ID: 38083 | Rating: 0 | rate: / Reply Quote | |
Good news that the app is working but disappointing performance. Disappointing compared to GK110? Or GK104 boards? GTX980 (64DP cores/4DPperSMM/1DPper32coreblock) is replacement for GTX680 (64DP/8DPperSMX), NOT 96DPcore GTX780 or 120DPcore GTX780ti. Titan(Black)250TDP have 896/960 DP cores (64DPperSMX) Compared to GTX680, I'd say GTX980 is an excellent performer, other than Double Float. | |
ID: 38084 | Rating: 0 | rate: / Reply Quote | |
Good news that the app is working but disappointing performance. I believe the GPUGrid app uses SP floating point calculations. F@H also uses SP. | |
ID: 38085 | Rating: 0 | rate: / Reply Quote | |
Good news that the app is working but disappointing performance. You're right about GPUGrid. I'll swap my GTX670 and GTX980 and we'll see how's its performance in a PCIe3.0x16 slot. I expect that a GTX980 should be faster than a GTX780Ti at least by 10%. Maybe it won't be faster in the beginning, but in time the GPUGrid app could be refined for Maxwells. Besides different workunit batches will gain different performance (it could be even a loss of performance). | |
ID: 38086 | Rating: 0 | rate: / Reply Quote | |
Good news that the app is working but disappointing performance. What do you think difference between PCIe2x16/PCIe3x16 is for GPUGRID, and similar programs? Also, do have idea how many of those "scalar" GM204 cores are cooking? Earlier in this thread-- You estimated 1920-2880 cores are being utilized for "superscalar" GK110. | |
ID: 38087 | Rating: 0 | rate: / Reply Quote | |
Could you crop me the Performance information the *0_0 output file, please? | |
ID: 38088 | Rating: 0 | rate: / Reply Quote | |
biodoc, I send you a off topic PM in this very moment. | |
ID: 38090 | Rating: 0 | rate: / Reply Quote | |
Could you crop me the Performance information the *0_0 output file, please? It's already finished, and uploaded. I'll swap my cards when I get home. 709-NOELIA_20MGWT-1-5-RND4766_0 18,458.00 sec | |
ID: 38091 | Rating: 0 | rate: / Reply Quote | |
Could you crop me the Performance information the *0_0 output file, please? I've successfully swapped my GTX670 ans GTX980 and hacked this client, so now I have another workunit in progress. The workunit is 13.103% completed at 40 minutes, the estimated total computing time is 18.316 sec (5h5m) A similar workunit took 16.616 sec (4h37m) to finish on my GTX780Ti (@1098MHz) CPU: Core i7-4770K @4.3GHz, 8GB DDR3 1866MHz GPU usage: 98% (CPU thread 100%, PCIe3.0x16) GPU Temperature: 62°C GPU Memory Controller load: 52% GPU Memory usage: 804MB GPU Voltage: 1.218V GPU Power: 95% (Haven't measured at the wall outlet) GPU Core Clock: 1240MHz # Simulation rate 83.10 (ave) 83.10 (inst) ns/day. Estimated completion Sat Sep 27 11:06:30 2014
# Simulation rate 88.80 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 10:19:41 2014
# Simulation rate 91.00 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 10:03:10 2014
# Simulation rate 92.05 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:55:35 2014
# Simulation rate 92.69 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:51:02 2014
# Simulation rate 93.18 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:47:33 2014
# Simulation rate 93.49 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:45:27 2014
# Simulation rate 93.76 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:43:32 2014
# Simulation rate 93.94 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:42:21 2014
# Simulation rate 94.03 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:41:41 2014
# Simulation rate 94.19 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:40:38 2014
# Simulation rate 94.28 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:39:59 2014
# Simulation rate 94.39 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:39:13 2014
# Simulation rate 94.46 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:38:46 2014
# Simulation rate 94.49 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:38:33 2014
# Simulation rate 94.57 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:38:02 2014
# Simulation rate 94.64 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:37:34 2014
# Simulation rate 94.68 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:18 2014
# Simulation rate 94.73 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014
# Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:43 2014
# Simulation rate 94.79 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:32 2014
# Simulation rate 94.83 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:15 2014
# Simulation rate 94.86 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:06 2014
# Simulation rate 94.88 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:35:58 2014
# Simulation rate 94.88 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:35:57 2014
# Simulation rate 94.85 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:09 2014
# Simulation rate 94.82 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:20 2014
# Simulation rate 94.84 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:12 2014
# Simulation rate 94.84 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:11 2014
# Simulation rate 94.83 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:15 2014
# Simulation rate 94.81 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:25 2014
# Simulation rate 94.67 (ave) 90.65 (inst) ns/day. Estimated completion Sat Sep 27 09:37:20 2014
# Simulation rate 94.57 (ave) 91.40 (inst) ns/day. Estimated completion Sat Sep 27 09:38:01 2014
# Simulation rate 94.51 (ave) 92.55 (inst) ns/day. Estimated completion Sat Sep 27 09:38:26 2014
# Simulation rate 94.52 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:38:21 2014
# Simulation rate 94.54 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:38:12 2014
# Simulation rate 94.56 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:38:03 2014
# Simulation rate 94.59 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:55 2014
# Simulation rate 94.60 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:47 2014
# Simulation rate 94.61 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:44 2014
# Simulation rate 94.63 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:37 2014
# Simulation rate 94.65 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:30 2014
# Simulation rate 94.66 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:24 2014
# Simulation rate 94.68 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:18 2014
# Simulation rate 94.68 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:15 2014
# Simulation rate 94.71 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014
# Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014
# Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014
# Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:54 2014
# Simulation rate 94.74 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:52 2014
# Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014
# Simulation rate 94.75 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:49 2014
# Simulation rate 94.74 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:54 2014
# Simulation rate 94.72 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014
# Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014
# Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:49 2014
# Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014
# Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:43 2014
# Simulation rate 94.74 (ave) 93.33 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014
# Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014
# Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:51 2014
# Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:47 2014
# Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:46 2014
# Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014
# Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:44 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:40 2014
# Simulation rate 94.77 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:42 2014
# Simulation rate 94.76 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:46 2014
# Simulation rate 94.74 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:36:52 2014
# Simulation rate 94.72 (ave) 93.33 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014
# Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:58 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:58 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014
# Simulation rate 94.72 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014
# Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014
# Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014
# Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:58 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014
# Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.72 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014
# Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014
# Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014
# Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014
# Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014
# Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.71 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014
# Simulation rate 94.71 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014
# Simulation rate 94.71 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.70 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:37:08 2014
# Simulation rate 94.70 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014
# Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014
# Simulation rate 94.70 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014
# Simulation rate 94.70 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:08 2014
# Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014
# Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.72 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014
# Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014
# Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014
# Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014
# Simulation rate 94.70 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014
# Simulation rate 94.70 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014
# Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014
# Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014
# Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014
# Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014
# Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014
# Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014
# Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014
# Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014
# Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014
# Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014
# Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:54 2014
# Simulation rate 94.74 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014
# Simulation rate 94.73 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014
# Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014
# Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014
# Simulation rate 94.74 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014
# Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:51 2014
# Simulation rate 94.74 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:52 2014
# Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:50 2014
# Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014
# Simulation rate 94.75 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014
# Simulation rate 94.75 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014
# Simulation rate 94.75 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014
# Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:46 2014
# Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014
# Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:44 2014
# Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:44 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:42 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:41 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:39 2014
# Simulation rate 94.76 (ave) 92.55 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014
# Simulation rate 94.76 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:43 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:41 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:40 2014
# Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:38 2014
# Simulation rate 94.78 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:38 2014
# Simulation rate 94.78 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:37 2014
# Simulation rate 94.78 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:35 2014
# Simulation rate 94.79 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:34 2014
# Simulation rate 94.79 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:31 2014
# Simulation rate 94.79 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:31 2014
# Simulation rate 94.79 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:33 2014
# Simulation rate 94.78 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:36:35 2014
# Simulation rate 94.77 (ave) 93.33 (inst) ns/day. Estimated completion Sat Sep 27 09:36:39 2014
# Simulation rate 94.77 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:39 2014
# Simulation rate 94.78 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:38 2014 | |
ID: 38094 | Rating: 0 | rate: / Reply Quote | |
My GTX980 is crunching fine, a little slower than a GTX780Ti, while consuming much less power. So probably the GPUGrid client can use more than 1920 CUDA cores of the GTX780Ti (or it can't use all CUDA cores in Maxwell). | |
ID: 38097 | Rating: 0 | rate: / Reply Quote | |
Great work, Zoltan! biodoc wrote: Good news that the app is working but disappointing performance. I would say it's only disappointing if your expectations were set really high. So far GM204 is not performing miracles here, but it's performing solidly at almost the performance level of GK110 for far less power used. biodoc wrote: I believe the GPUGrid app uses SP floating point calculations. Correct. eXaPower wrote: Also, do have idea how many of those "scalar" GM204 cores are cooking? Earlier in this thread-- You estimated 1920-2880 cores are being utilized for "superscalar" GK110. It was always hard for GPU-Grid to use the superscalar shaders, which amounts to 1/3 of all shaders in "all but the high-end Fermis" and all Keplers. That's where this number comes from. Maxwell has no such restrictions, hence all shaders can be used in principle. This says nothing about other potential bottlenecks, however: PCIe bus, memory bandwidth, CPU support etc. Translating these limitations into statments along the lines of "can only use xxxx shaders" would be misleading. Edit: BTW, what's the memory controller load for GTX780Ti running such tasks? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38098 | Rating: 0 | rate: / Reply Quote | |
There are more potential variables in Zoltan's tests so far: | |
ID: 38099 | Rating: 0 | rate: / Reply Quote | |
[url]http://www.anandtech.com/show/8568/the-geforce-gtx-970-review-feat-evga/13 [/url] | |
ID: 38100 | Rating: 0 | rate: / Reply Quote | |
http://www.anandtech.com/show/8568/the-geforce-gtx-970-review-feat-evga/13 | |
ID: 38101 | Rating: 0 | rate: / Reply Quote | |
@Biodoc: valid points. Regarding the clockspeed Zoltan said his GTX780Ti was running at 1098MHz, so it's got a "typical" overclock. And the new app claims to be CUDA 6.5. However, I don't think Matt changed the actual crunching code for this release, so any differences would come from changes in built-in functions. During the last few CUDA releases we haven't seen any large changes of GPU-Grid performance, so I don't expect it this time either. Anyway, for the best comparison both cards should run the new version. | |
ID: 38102 | Rating: 0 | rate: / Reply Quote | |
@Biodoc: valid points. Regarding the clockspeed Zoltan said his GTX780Ti was running at 1098MHz, so it's got a "typical" overclock. And the new app claims to be CUDA 6.5. However, I don't think Matt changed the actual crunching code for this release, so any differences would come from changes in built-in functions. During the last few CUDA releases we haven't seen any large changes of GPU-Grid performance, so I don't expect it this time either. Anyway, for the best comparison both cards should run the new version. Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature. Quoted from newest CUDA programming guide-- "new managed memory space in which all processors see a single coherent memory image with a common address space. A processor refers to any independent execution unit with a dedicated MMU. This includes both CPUs and GPUs of any type and architecture. " | |
ID: 38104 | Rating: 0 | rate: / Reply Quote | |
I posted some power consumption data for my GTX980 (+/- overclock) at the F@H forum. | |
ID: 38106 | Rating: 0 | rate: / Reply Quote | |
Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature. Dynamic parallelism: no. It would break compatibility with older cards or require two separate code paths. Besides, GPU-Grid doesn't have much of a problem occupying all shader multiprocessors (SM, SMX etc.). Unified memory: this is only meant to ease programming for new applications, at the cost of some performance. For any existing code with optimized manual memory management (e.g. GPU-Grid) this would actually be a drawback. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38108 | Rating: 0 | rate: / Reply Quote | |
Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature. In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions. | |
ID: 38109 | Rating: 0 | rate: / Reply Quote | |
That is a good question. One which I can unfortunately not answer. I'm just a forum mod and long-term user, not a GPU-Grid developer :) | |
ID: 38110 | Rating: 0 | rate: / Reply Quote | |
We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need. I expect the GM204 performance will be marked improved once I have my hands on one. Matt | |
ID: 38111 | Rating: 0 | rate: / Reply Quote | |
I expect the GM204 performance will be marked improved once I have my hands on one. I can give you remote access to my GTX980 host, if you want to. | |
ID: 38112 | Rating: 0 | rate: / Reply Quote | |
In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions. There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2. I think this huge change doesn't come very often. I think the GM204 can run older code more efficiently than the Fermi or the Kepler based GPUs, that's why other projects benefit more than GPUGrid, as this project had this at the transition for CUDA3.1 to CUDA4.2. | |
ID: 38113 | Rating: 0 | rate: / Reply Quote | |
I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative. | |
ID: 38115 | Rating: 0 | rate: / Reply Quote | |
Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed. | |
ID: 38118 | Rating: 0 | rate: / Reply Quote | |
You don't see these jumps often. A 32core block with an individual warp scheduler, rather Kelper Flat design (sharing all cores with warp scheduler) ) is contributing to better core management, as is Maxwell redesigned crossbar, dispatch, issue. | |
ID: 38121 | Rating: 0 | rate: / Reply Quote | |
http://www.gpugrid.net/forum_thread.php?id=3603&nowrap=true#38075 | |
ID: 38122 | Rating: 0 | rate: / Reply Quote | |
That change marked the transition to a new code base. The improvement wasn't down to the change in CUDA version, so much as us introducing developing improved algorithms. Matt | |
ID: 38123 | Rating: 0 | rate: / Reply Quote | |
It's not ready just yet... Matt | |
ID: 38124 | Rating: 0 | rate: / Reply Quote | |
I'm doing a bit of work to improve the performance of the code for Maxwell hardware - expect an update before the end of the year. Matt | |
ID: 38125 | Rating: 0 | rate: / Reply Quote | |
Most kind, but I've got some on order already. Just waiting for the slow boat from China to wend its way across the Med. Matt | |
ID: 38126 | Rating: 0 | rate: / Reply Quote | |
Not had the time to look into this in great detail but my tuppence worth: GPU Memory Controller load: 52% That is very high and I expect its impacting on performance, and I don't think its simply down to Bus Width (256bit) but also down to architectural changes. Was hoping this would not be the case and it's somewhat surprising saying as the GTX900's have 2MB L2 cache. That said, some of Noelia's WU's are more memory intensive than other WU's, and on my slightly underclocked GTX770 a 147-NOELIA_20MGWT WU's load is presently 30% (which is higher than other WU's). This suggests the GTX970 is a better choice than the GTX980, certainly when you consider the ~50% price difference (UK) for ~80% performance. That said, I would want to know what the memory controllers utilization is on a GTX970 before concluding that it is definitely a problem and recommending the 970 over the 980 (which will still do more work despite the constraints). For the 790 it might be ~43% which isn't great and suggests another problem (architecture/code) besides the 256bit limitation. Any readings for other WU's? In terms of performance these GPU's appear to only be on par with the high-ish end GTX700's, and performance is basically in line with the number of Cuda Cores. Again suggesting that there is some potential for app improvement. It's possible that if the apps are recompiled with new CUDA Development Tools the new drivers will inherently offer improvements for the GTX900 series, but given that these are GM204 I'm not expecting miracles. The big question was always going to be, What's the performance per Watt like for here? Apparently, when gaming a GTX970 uses up to 30W less than a GTX770, and significantly outperforms it (on reviewed games) but the TDP's are 145W and 230W. So a GTX970 might use ~63% of a GTX770's power and at first glance appears to outperform it by ~10%. Thus I'm expecting the performance/Watt to be about 1.75 times that of a GTX770 (ball park). So from that point of view it's a winner, and maybe app tweaks can increase that further. PS. My GTX770's Memory Controller load is only 22% for a trphisx3-NOELIA_SH2 WU, so I'm guessing the same type of WU would have a 38% load on a GTX980. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 38128 | Rating: 0 | rate: / Reply Quote | |
Trying to fix the scheduler now - if you have a 980, please sub to the acemdbeta app, accept beta work, and try again. It won't work, but I'm logging the problems now. | |
ID: 38136 | Rating: 0 | rate: / Reply Quote | |
Did as you requested and it's now crunching what looks to be a test WU: MJHARVEY_TEST | |
ID: 38138 | Rating: 0 | rate: / Reply Quote | |
My GTX980 has finished 2 of the beta WUs successfully. | |
ID: 38140 | Rating: 0 | rate: / Reply Quote | |
My GTX980 has finished 2 of the beta WUs successfully. The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too. | |
ID: 38144 | Rating: 0 | rate: / Reply Quote | |
My GTX980 has finished 2 of the beta WUs successfully. I'm getting "no tasks available" for either the beta or the short run WUs. 9/29/2014 5:36:32 AM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU 9/29/2014 5:36:33 AM | GPUGRID | Scheduler request completed: got 0 new tasks 9/29/2014 5:36:33 AM | GPUGRID | No tasks sent 9/29/2014 5:36:33 AM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card) | |
ID: 38147 | Rating: 0 | rate: / Reply Quote | |
It looks like Matt has just added a new beta app (version 8.45). | |
ID: 38149 | Rating: 0 | rate: / Reply Quote | |
Just got a test WU with the new beta app (8.45). | |
ID: 38159 | Rating: 0 | rate: / Reply Quote | |
Not any more. the CUDA65 error rate is suspiciously high for non GM204 cards. Matt | |
ID: 38160 | Rating: 0 | rate: / Reply Quote | |
Hello fellow crunchers, | |
ID: 38347 | Rating: 0 | rate: / Reply Quote | |
if the results from now are relative the same to older ones from early 2014, the performance with cuda65 doesn't look very well? | |
ID: 38349 | Rating: 0 | rate: / Reply Quote | |
if the results from now are relative the same to older ones from early 2014, the performance with cuda65 doesn't look very well? I'm finding cuda65 to be just a bit faster than cuda60 on all my cards (Win7-64), including the Maxwell 750Ti. Could be the updated NV driver (344.11) or the app... | |
ID: 38351 | Rating: 0 | rate: / Reply Quote | |
i cannot test the old 660ti with actual wu, but i will test both at the same time and we will see. | |
ID: 38358 | Rating: 0 | rate: / Reply Quote | |
The 980 is less than 10% faster than a 780ti. Matt | |
ID: 38360 | Rating: 0 | rate: / Reply Quote | |
That is not what I found. When checking Retvari Zoltans's results on his GTX980 and his GTX780TI, then the 780Ti is about 2000 seconds faster. Zoltan runs both on XP and he knows how to set up a rig. Proof is that his rigs are among the fattest. Would like to see some results with Win7. I'm interested as I am still undecided to by another mighty 780Ti or a new 980, the price difference is not that huge, but the 980 uses less energy. But I think I wait for a 980Ti that is expected before the end of this year according to rumors. ____________ Greetings from TJ | |
ID: 38362 | Rating: 0 | rate: / Reply Quote | |
TJ, don't buy a GTX780Ti now! In fact, nobody else should do that for GPU-Grid. Even if you're not paying german electricity prices, the far more efficient Maxwell will save you money soon irregardless of actual performance. TJ wrote:
Actually these don't contradict: if GTX980 is ~10% slower than GTX780Ti, it is also "less than 10% faster". I obviously see your point, though, that Matts formulation sounds much more euphemistic towards the new card ;) MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38371 | Rating: 0 | rate: / Reply Quote | |
Thanks for the input ETA. I indeed want to see some real data as well. | |
ID: 38374 | Rating: 0 | rate: / Reply Quote | |
Thanks for the input ETA. I indeed want to see some real data as well. Some numbers:
Delta of ----------------------GPU----------------------
PC Power ---------------core---------------- ----Memory---
GPU Workunit Consumption usage clock voltage power temp MCL usage
980 NOELIA_5MG 167W 98% 1240MHz 1.218V 98% 66°C 52% 778MB
980 SDOERR_BARNA5 164W 99% 1227MHz 1.193V 92% 66°C 46-52% 946MB
780Ti NOELIA_5MG 249W 98% 993MHz 1.125V 99% 72°C 30-31% 765MB
780Ti SDOERR_BARNA5 247W 99% 993MHz 1.125V 96% 72°C 30-31% 925MB
780Ti NOELIA_5MG 260W 98% 1098MHz 1.125V 102% 74°C 33-34% 765MB
780Ti SDOERR_BARNA5 263W 98% 1098MHz 1.125V 102% 74°C 33% 925MB
CPU: Core i7-4770K @4.3GHz, 66-72°C RAM: 8GB DDR3 1866MHz 1.5V PSU: Enermax Modu 87+ 850W MB: Gigabyte GA-Z87X-OC Ambient temperature: 24.6°C I tried to raise the 780Ti's GPU clock and the voltage a little more, but as I raise the GPU clock, the GPU voltage is lowered automatically to stay within the power limit, which is endangers the stability of the calculation. Conclusion: The GTX780Ti is faster by 8-10% than the GTX980, but the GTX980 consumes only the 2/3rd of the GTX780Ti. I don't recommend to buy GTX780Ti, only if it's a very cheap 2nd hand card. I think it's safe to buy a GTX980 or a GTX970 made on 28nm technology, as the 20nm version isn't around the corner, and I think the first chips made on 20nm wouldn't be much more energy efficient (probably they will be a little cheaper). | |
ID: 38375 | Rating: 0 | rate: / Reply Quote | |
Thank you for the numbers Zoltan. I will replace the 550Ti by a 970. And for the rest I wait. | |
ID: 38376 | Rating: 0 | rate: / Reply Quote | |
Thanks Zoltan, those seem to be very solid numbers! | |
ID: 38382 | Rating: 0 | rate: / Reply Quote | |
Some numbers: Wow, great information. The 980 looks like a winner. Question, are the above power draw figures for the GPU alone or for the system as a whole? If for the system are there any CPU WUs running? Thanks for the info! | |
ID: 38383 | Rating: 0 | rate: / Reply Quote | |
http://www.anandtech.com/show/8223/an-introduction-to-semiconductor-physics-technology-and-industry | |
ID: 38385 | Rating: 0 | rate: / Reply Quote | |
RZ, what is your metric for performance? The best measure of performance is to look in the output of tasks completed by the cuda65 app. You'll see the line:
The "ns/day" figure gives the rate of the simulation - the higher the better. The "Natoms" figure gives the size of the system - the greater the number of atoms, the slower the simulation, in a not-quite linear relationship (cf our performance estimator at http://www.acellera.com/products/molecular-dynamics-software-gpu-acemd/ ). Matt | |
ID: 38386 | Rating: 0 | rate: / Reply Quote | |
The output of the linux cuda6.5 app doesn't provide performance information. | |
ID: 38388 | Rating: 0 | rate: / Reply Quote | |
Taken from RZ GTX980 host---Let's see if I have this straight- NOELIA_5MG for a GTX980- # PERFORMANCE: 70205 Natoms 3.624 ns/day 0.000 ms/step 0.000 us/step/atom | |
ID: 38389 | Rating: 0 | rate: / Reply Quote | |
Actually, there's a bug there. | |
ID: 38390 | Rating: 0 | rate: / Reply Quote | |
biodoc
Quite right. Slightly different code-base. I'm not going to rev the applications just to include that. If you are really keen, you can get the rates by looking for the "Simulation rate" lines in the 0_0 output file in the task's slot directory. Matt | |
ID: 38392 | Rating: 0 | rate: / Reply Quote | |
http://www.anandtech.com/show/8223/an-introduction-to-semiconductor-physics-technology-and-industry That's a nice article. If you want to get really technical, dive in to the International Technology Roadmap for Semiconductors reports. http://www.itrs.net/Links/2013ITRS/Summary2013.htm MJH | |
ID: 38393 | Rating: 0 | rate: / Reply Quote | |
If the higher the better for ns/day then is the GTX780Ti still better. But uses more energy and produces more heat. Have you already your hands on a 980 Matt, to improve the app? ____________ Greetings from TJ | |
ID: 38394 | Rating: 0 | rate: / Reply Quote | |
70205 Natoms NOELIA_5MG tasks With 1000/[value]-- GTX980= 275.938ns/day -- GT650m= 36.293ns/day | |
ID: 38395 | Rating: 0 | rate: / Reply Quote | |
Those numbers are only directly comparable if the Natoms for the two tasks are very similar, remember. | |
ID: 38396 | Rating: 0 | rate: / Reply Quote | |
Yes, all mentioned ns/day were same task type- I edited 70205 Natoms NOELIA_5MG tasks accordingly. | |
ID: 38397 | Rating: 0 | rate: / Reply Quote | |
The GTX780Ti is faster by 8-10% than the GTX980, but the GTX980 consumes only the 2/3rd of the GTX780Ti.RZ, what is your metric for performance? My metric for performance is the data could be find under the "performance" tab, which is based on the time it takes to complete a WU from the same batch by different GPUs (hosts). GTX-980 GTX-780Ti GTX-780TiOC
SDOERR_BARNA5 15713~15843 14915~15019 14892~14928
NOELIA_5MG 18026~18165 16601~16713 16826~16924
NOELIA_20MGWT 18085~18099 16849 17034
NOELIA_20MGK36I 16617~16779 16844~17049
NOELIA_20MG2 16674~16831
NOELIA_UNFOLD 16533 15602
As it takes more time for the GTX-980 to complete similar workunits as it takes for the GTX780Ti, I consider the GTX-980 slower (the motherboard, CPU, RAM are similar, actually my host with the GTX 980 has slightly faster CPU and RAM). | |
ID: 38398 | Rating: 0 | rate: / Reply Quote | |
Yes, I can see that now looking at individual runs on your two machines. That is rather surprising, my testing in more controlled circumstances shows the opposite. I'll have to looking into that a bit more, and see if it's peculiar to your systems or if it reflects a general trend. Matt | |
ID: 38399 | Rating: 0 | rate: / Reply Quote | |
Wow, great information. The 980 looks like a winner. Question, are the above power draw figures for the GPU alone or for the system as a whole? The heading of that column reads of "Delta of PC power consumption", which is the difference of the whole PC's power consumption between the GPU is crunching and not crunching. If for the system are there any CPU WUs running? Thanks for the info! There were 6 SIMAP CPU workunits running on that host, the total power consumption is 321W using the GTX-980. | |
ID: 38400 | Rating: 0 | rate: / Reply Quote | |
Yes, I can see that now looking at individual runs on your two machines. That is rather surprising, my testing in more controlled circumstances shows the opposite. I'd like to have a pair of those circumstance controllers you use. :) | |
ID: 38401 | Rating: 0 | rate: / Reply Quote | |
Hello fellow crunchers, You and me both. I upgrade based on this usually (CPU and GPU). Cheers. ____________ | |
ID: 38402 | Rating: 0 | rate: / Reply Quote | |
On linux, using the cuda 6.5 app the 980 is a bit slower than the 780Ti. I only have enough data on the long SDOERR_BARNA5 WUs. The students T test gives a very low p value so it appears to be a statistically significant difference. Time (sec)
SDOERR_BARNA5 980 13188827 17379
SDOERR_BARNA5 980 13188792 17677
SDOERR_BARNA5 980 13188118 17309
SDOERR_BARNA5 980 13186657 17318
SDOERR_BARNA5 980 13186376 16934
SDOERR_BARNA5 980 13184193 17253
SDOERR_BARNA5 980 13183699 17361
SDOERR_BARNA5 980 13183697 17209
SDOERR_BARNA5 980 13182886 17455
SDOERR_BARNA5 980 13182196 17201
average 17310
std dev 191
SDOERR_BARNA5 780Ti 13189221 16503
SDOERR_BARNA5 780Ti 13187759 16546
SDOERR_BARNA5 780Ti 13187315 16562
SDOERR_BARNA5 780Ti 13186024 16571
SDOERR_BARNA5 780Ti 13185027 16597
SDOERR_BARNA5 780Ti 13183827 16544
SDOERR_BARNA5 780Ti 13183437 16904
SDOERR_BARNA5 780Ti 13182225 17374
SDOERR_BARNA5 780Ti 13181484 17380
average 16776
std dev 361 P value 0.00095 using student's T test | |
ID: 38409 | Rating: 0 | rate: / Reply Quote | |
Thanks for the information. You're (Linux) GTX980 average time is 97% as fast(16667/17310) compared to you're (Linux) GTX780ti. You're cards performance closer to each other then RZ (windowsXP) who's GTX980 is around 92% compared to his quicker GTX780ti. | |
ID: 38411 | Rating: 0 | rate: / Reply Quote | |
Thanks for the information. You're (Linux) GTX980 average time is 97% as fast(16667/17310) compared to you're (Linux) GTX780ti. You're cards performance closer to each other then RZ (windowsXP) who's GTX980 is around 92% compared to his quicker GTX780ti. eXaPower, if you try to eco-tune the less efficient GK110 down to GM204 power consumption, you either loose the performance advantage or you still consume more power. Let's start with a GTX780Ti with a mild OC: 1100 MHz @ 1.1 V, 250 W. Some cards go higher, but let's not discuss extreme cases. And significantly higher clocked ones exceed 250 W. Maximum stable frequency scales approximately linearly with voltage, whereas power consumption scales approximately quadratic with voltage (let's neglect leakage). Hence we could get GK110 down to these operation points: - 1000 MHz @ 1.0 V -> 187 W, at 91% the performance - 900 MHz @ 0.9 V -> 137 W, at 81.8% the performance While these numbers looks far nicer and are indeed more energy efficient than running stock, the first one illustrates my point: less performance and still higher power consumption than GTX980. Trying the same with GTX980 with a nice OC, starting from 1300 MHz @ 1.2 V, 165 W, I get the following: - 1192 MHz @ 1.1 V -> 127 W, at 91.7% the performance ... at this point it probably doesn't make sense to eco-tune further, since you just spent 500$/€ on that fast card. Summarizing this, I'm not saying everyone should rush out and buy a GTX980. At least consider the GTX970, but certainly don't buy GTX780/Ti anymore for sustained GP-GPU! Even if you don't pay anything for your electricity it doesn't hurt to run the more energy-efficient setup. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38431 | Rating: 0 | rate: / Reply Quote | |
Some numbers for GTX970 from this linux host Time (sec)
SDOERR_BARNA5 970 13193684 19850
SDOERR_BARNA5 970 13191852 19534
SDOERR_BARNA5 970 13189355 19650
SDOERR_BARNA5 970 13188418 19544
SDOERR_BARNA5 970 13187452 19567
SDOERR_BARNA5 970 13185793 19548
SDOERR_BARNA5 970 13185723 19559
SDOERR_BARNA5 970 13184931 19586
SDOERR_BARNA5 970 13184168 19552
SDOERR_BARNA5 970 13182398 19627
average 19602
std dev low :D
That's 88% of the throughput of Zoltan's GTX980. The reported clock speed of 1250 MHz is relatively high for that host, but Zoltan's card isn't running at stock speeds either. Overall that's pretty strong performance from a card with just 81% the raw horse power per clock! MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38434 | Rating: 0 | rate: / Reply Quote | |
For comparison here's the last number of SDOERR_BARNA5 times form one of my factory OCed PNY cards (no additional OC): SDOERR_BARNA5 750Ti 43,562.77
SDOERR_BARNA5 750Ti 43,235.24
SDOERR_BARNA5 750Ti 43,313.90
SDOERR_BARNA5 750Ti 43,357.37
SDOERR_BARNA5 750Ti 43,400.42
SDOERR_BARNA5 750Ti 43,392.66
SDOERR_BARNA5 750Ti 43,525.33
Average = 43398 This on Windows 7-64 that if I remember correctly is about 11% slower than Linux on GPUGRID. That would make the above 970 about 2x faster at least on SDOERR_BARNA5 WUs than the 750Ti (19602 x 2 x 1.11 = 43,516.44) if I haven't forgotten some important factor :-) | |
ID: 38463 | Rating: 0 | rate: / Reply Quote | |
So, roughly, a 750Ti produces half of the points of a 970, its TDP is about a half of the 970 and its price is about a half of the price of a 970... is there a winner? | |
ID: 38468 | Rating: 0 | rate: / Reply Quote | |
This on Windows 7-64 that if I remember correctly is about 11% slower than Linux on GPUGRID. That would make the above 970 about 2x faster at least on SDOERR_BARNA5 WUs than the 750Ti (19602 x 2 x 1.11 = 43,516.44) if I haven't forgotten some important factor :-) Personal preference. I personally like running more low power boxes, and gold/platinum power supplies in the 450 watt range are often available on sale. I also like running CPU projects so again more machines equals more CPU power. Just bought 750Ti number 11, the Asus OC Edition which will be my first ASUS GPU in the last few years. With discounts and rebate ended up being $103, hard to beat. | |
ID: 38469 | Rating: 0 | rate: / Reply Quote | |
is there a winner? Not a clear one. Beyond made some good points for more of the smaller cards. I tend towards the larger ones for the following reasons: - They will be able to finish long run WUs (which yield the most credits per day here) within the time for maximum bonus for longer. The time per WU is increased slowly over time, as the average computing power increases. - You can run more of them with less overhead, by which I mean "system needed to run the GPUs in". This improves power efficiency and, if you don't go for extremely dense systems, purchase cost. This argument is actually the exact opposite of what Beyond likes with his many machines for CPU projects. - Resale value: once a GPU is not energy efficient enough any more to run 24/7 GP-GPU it can still provide a decent gaming experience.Finding interested gamers is easier if the GPU in question is 2-3 times as fast. This might not necessarily get you more money, since you're selling fewer cards, but IMO it makes things easier. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38476 | Rating: 0 | rate: / Reply Quote | |
Looking at performance tab- someone has finally equaled RZ GTX780ti host time. Host 168841 [3] GTX980 with same OS as RZ (WinXP) is competing tasks as fast. (RZ GTX780ti been the fastest card for awhile) | |
ID: 38488 | Rating: 0 | rate: / Reply Quote | |
Looking at performance tab- someone has finally equaled RZ GTX780ti host time. Host 168841 [3] GTX980 with same OS as RZ (WinXP) is competing tasks as fast. (RZ GTX780ti been the fastest card for awhile) That GTX980 is an overclocked one, so its performance/power ratio must be lower than the standard GTX980's. However it's still better than a GTX780Ti. <core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
# GPU [GeForce GTX 980] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 980
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:04:00.0
# Device clock : 1342MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r343_98 : 34411
# GPU 0 : 79C
# GPU 1 : 74C
# GPU 2 : 78C
# GPU 1 : 75C
# GPU 1 : 76C
# GPU 1 : 77C
# GPU 1 : 78C
# GPU 1 : 79C
# GPU 1 : 80C
# GPU 0 : 80C
# Time per step (avg over 3750000 steps): 4.088 ms
# Approximate elapsed time for entire WU: 15331.500 s
# PERFORMANCE: 87466 Natoms 4.088 ns/day 0.000 ms/step 0.000 us/step/atom
00:19:43 (3276): called boinc_finish
</stderr_txt>
]]> 1342/1240=1.082258, so this card is overclocked by 8.2% which equal to the performance gap between a GTX780Ti and the GTX980. | |
ID: 38507 | Rating: 0 | rate: / Reply Quote | |
1342/1240=1.082258, so this card is overclocked by 8.2% which equal to the performance gap between a GTX780Ti and the GTX980. The base clock may not correspond to the real clock, with Maxwell more so than ever before. Still, it's safe to say that this card is significantly overclocked :) BTW: your GTX780Ti is (factory-)overclocked as well, isn't it? MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38511 | Rating: 0 | rate: / Reply Quote | |
BTW: your GTX780Ti is (factory-)overclocked as well, isn't it? I have two GTX780Ti's: one standard, and one factory overclocked. I had to lower the memory clock of the overclocked one to 3.1GHz... | |
ID: 38515 | Rating: 0 | rate: / Reply Quote | |
The GTX970 Maxwell is only about 10% more energy efficient than a GTX750Ti Maxwell. Considering efficiency scales well with core count this suggests an issue with the GTX900's. | |
ID: 38551 | Rating: 0 | rate: / Reply Quote | |
Also at Einstein the GTX750Ti is slightly more efficient than GM204. Einstein is known to be very memory-bandwidth hungry. Compared to GTX750Ti it looks like this: | |
ID: 38552 | Rating: 0 | rate: / Reply Quote | |
L2 cache KB amount/ total number of core ratio for Maxwell and Kepler (desktop) cards from best to worse | |
ID: 38554 | Rating: 0 | rate: / Reply Quote | |
I have some more numbers from my hardware | |
ID: 38562 | Rating: 0 | rate: / Reply Quote | |
L2 cache KB amount/ total number of core ratio for Maxwell and Kepler (desktop) cards from best to worse (added the percent of GTX 980 total cores) I neglected to mention the best GPUGRID Kelper Power usage/runtime ratio card-- GTX660ti-- 384KB/1344cores=0.28-1 Current cards with best runtime/power usage ratio for GPUGRID. (eco-tune) Amount of core Compared to a GTX 980 (2048cores) 1. GTX970 [81.2%] 2. GTX750ti [31.2%] 3. GTX980 [100%] 4. GTX660ti [65.6%] 5. GTX780 [112.5%] I think the reason for Kelper memory controller unit is lower (20-40%MCU usage is being reported by GPUGRID users depending upon task type) - the Maxwell GPC (4SMM/512cores/256integer/32TMU/16ROP) is revised- compared to the Kelper GPC differences (3SMX/576cores/96interger/48TMU,16ROPS) When more Maxwell GPC are being processed- the cache set up changed how info transfers. A GTX 660ti has 3GPC/7SMX/112TMU/24ROP. A GK106/2GPC/ 3GPC/768c/64TMU/24ROP/192bitMemorybus 650ti boost has SMX disabled in different GPC. These two cards have configured GPC that shut SMX off in different GPC rather than the same GPC. Nvidia hasn't yet revealed how disabled SMM in GPC are shut off. (Anand tech wrote about this in their GTX660ti review and GTX650tiboost.) The way an SMX/SMM is disabled in GPC and how this affects GPGPU processing is not fully understood. Nvidia Current programming Guide is included in newest CUDA toolkit release. Depending upon OS- for windows Located in Program File/Nvidia Corporation/Installer2/CUDAtoolkit6.5[ a bunch of number and letters] files can be read as html or pdf. For custom install- a option is available to install only Samples or just the Documents instead of whole tool kit. Maxwell C.C5.0 has 64KB- C.C5.2 96KB for Maximum amount of shared memory per multiprocessor while Kelper C.C 3.0/3.5 Max amount is 48KB. Cache working set per multiprocessor for constant memory is same for C.C 3.0/3/5 at 8KB. Maxwell C.C5.0/5.2 is 10KB. Cache working set per multiprocessor for texture memory-- C.C3.0 is 12KB. C.C3.5/5.0/5.2 is 12KB-48KB. If someone could post what AIDA64 is reporting for cache or memory prosperities- maybe we can discern why Maxwell MCU for GM204 is higher than GM107/GK110/GK106/GK104/GK107 GDDR5 128/192/256/384bit memory buses Variations are seen though as AIDA64 reports for my [2] GK107-- L1 Cache / Local Data Share is 64 KB per multiprocessor L1 Texture Cache is 48 KB per multiprocessor. (GK110 has read only 48KB cache) Here is CUDA report for GK107 by AIDA64- Memory Properties: Memory Clock 2000 MHz Global Memory Bus Width 128-bit Total Memory 2 GB Total Constant Memory 64 KB Max Shared Memory Per Block 48 KB Max Shared Memory Per Multiprocessor 48 KB Max Memory Pitch 2147483647 bytes Texture Alignment 512 bytes Texture Pitch Alignment 32 bytes Surface Alignment 512 bytes Device Features: 32-bit Floating-Point Atomic Addition Supported 32-bit Integer Atomic Operations Supported 64-bit Integer Atomic Operations Supported Caching Globals in L1 Cache Not Supported Caching Locals in L1 Cache Supported Concurrent Kernel Execution Supported Concurrent Memory Copy & Execute Supported Double-Precision Floating-Point Supported ECC Disabled Funnel Shift Supported Host Memory Mapping Supported Integrated Device No Managed Memory Not Supported Multi-GPU Board No Stream Priorities Not Supported Surface Functions Supported TCC Driver No Warp Vote Functions Supported __ballot() Supported __syncthreads_and() Supported __syncthreads_count() Supported __syncthreads_or() Supported __threadfence_system() Supported 64-bit Integer Atomic Operations (64-bit version of atomicMin-64-bit version of atomicMax-64-bit version of atomicAnd-64-bit version of atomicOr-64-bit version of atomicXor) are suppose to be only supported for C.C3.5/5.0/5.2 boards. A theory- Nvidia harvested GK110 core to put into unknown low power Kelper- while keeping some features locked (Dynamic Parallelism/funnel shift/64DPSMX) on some GK107/GK108 cards- but allow others to have C.C 3.5 compute feature (GT730m/GT740m/GT640/GT635/Gt630) Atomic functions operating on 64-bit integer values in shared memory are supported for C.C2.0/2.1/3.0/3.5/50/5.2 [url]http://images.anandtech.com/doci/8526/GeForce_GTX_980_Block_Diagram_FINAL_575px.png[url] [/url]http://images.anandtech.com/doci/8526/GeForce_GTX_980_SM_Diagram_FINAL_575px.png [/url] [/url]http://images.anandtech.com/doci/8526/GM204DieB_575px.jpg[url] [/url] http://images.anandtech.com/doci/7764/GeForce_GTX_680_SM_Diagram_FINAL_575px.png[url] [/url]http://images.anandtech.com/doci/7764/GeForce_GTX_750_Ti_SM_Diagram_FINAL_575px.png[url] [/url]http://images.anandtech.com/doci/7764/SMX_575px.png[url] [/url]http://images.anandtech.com/doci/7764/SMMrecolored_575px.png[url] New high and low level optimizations- NVidia secret sauce is certainly being held close to the chest. Anandtech is suppose to investigate Maxwell's GPC in future article due GTX970 64ROPS not being fully utilized. If someone with GTX970 computing a GPUGRID task could comment on MCU usage and share any other information(cache amounts/CUDA) to compare. Once a GTX960 (12SMM/3GPC?) is released more info about the Higher GM204 MCU usage could understood more. | |
ID: 38565 | Rating: 0 | rate: / Reply Quote | |
There is a new application (v8.47) distributed since yesterday. | |
ID: 38566 | Rating: 0 | rate: / Reply Quote | |
RZ, | |
ID: 38567 | Rating: 0 | rate: / Reply Quote | |
http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980 | |
ID: 38569 | Rating: 0 | rate: / Reply Quote | |
With a 970, I’ve seen Memory Controller loads from 27% for short NOELIA_SH2 tasks to 50% for several different long task types. | |
ID: 38571 | Rating: 0 | rate: / Reply Quote | |
Nvidia Inspector 1.9.7.3 supports GM204 boards.(release notes) Inspector shows the brand name of GDDR5. I suggest a clean uninstall of afterburner so driver doesn't become conflicted by having two program settings. I wish a way existed to monitor non-standard internal working's of GPU- other than typical reading all monitoring programs show. | |
ID: 38572 | Rating: 0 | rate: / Reply Quote | |
What was the improvement for GTX 680 compared to GTX580? The 680 was eventually ~42% faster and had a TDP of 195W vs 244W for the 580. Overall that jump improved performance slightly more whereas this jump has improved performance/Watt more. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 38573 | Rating: 0 | rate: / Reply Quote | |
Reference rated TDP Wattage per Fermi 32coreSM/ Kelper 192coreSMX/ Maxwell 128coreSMM | |
ID: 38574 | Rating: 0 | rate: / Reply Quote | |
Reference rated TDP Wattage per Fermi 32coreSM/ Kelper 192coreSMX/ Maxwell 128coreSMM Reflects efficiency (GFlops/Watt) quite accurately and goes some way to explaining the design rational. Can boost the 970 core to 1400MHz but just cant shift the GDDR5 which for here would be more productive (with most tasks)! Can lower core and week for efficiency; dropping the Power and Temp target results in an automatic Voltage drop. Even @1265 can drop the Power and Temp target to 90% without reducing throughput. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 38576 | Rating: 0 | rate: / Reply Quote | |
Do you have a GTX970 reference or custom partner board? | |
ID: 38578 | Rating: 0 | rate: / Reply Quote | |
With a 970, I’ve seen Memory Controller loads from 27% for short NOELIA_SH2 tasks to 50% for several different long task types. I see very similar numbers for my 980. The memory clock seems "locked" at 6000 MHz when running GPUGrid tasks. It doesn't respond to overclocking inputs. It does jump to 7000 MHz when running Heaven stress testing however. | |
ID: 38580 | Rating: 0 | rate: / Reply Quote | |
It's a Palit NE5X970014G2-2041F (1569) GM204-A Rev A1 with a default core clock of 1051MHz. | |
ID: 38581 | Rating: 0 | rate: / Reply Quote | |
It's a Palit NE5X970014G2-2041F (1569) GM204-A Rev A1 with a default core clock of 1051MHz. The same applies to my Gigabyte GTX-980. So while it can run at ~110% power @ 1.212V (1406MHz) @64C Fan@75% I cannot reduce the MCL bottleneck (53% @1406MHz) which I would prefer to do. Is 53% MCL really a bottleneck? Shouldn't this bottleneck lower the GPU usage? Did you try to lower the memory clock to measure the effect of this 'bottleneck'? I've tried Furmark, and it seems to be limited by memory bandwith, while GPUGrid seems to be limited by GPU speed: The history of the graph is: GPUGrid -> Furmark (1600x900) -> Furmark (1920x1200 fullscreen) -> GPUGrid biodoc, thanks for letting us know you are experiencing the same GDDR5 issue. Anyone else seeing this (or not)? It's hard to spot, (3005MHz instead of 3505MHz), but my GTX980 does the same, but I don't think that this is an error. | |
ID: 38582 | Rating: 0 | rate: / Reply Quote | |
Mysteries of Maxwell continue- Here are some GTX970/980 board layout photos. Note: not all are reference. Maybe something changed concerning the GDDR5 circuitry or overclocking utilities haven't accounted all Maxwell PCB's ? | |
ID: 38583 | Rating: 0 | rate: / Reply Quote | |
Is 53% MCL really a bottleneck? That's the question I started out trying to find the answer to - is the increased MCL really a bottleneck? Our point of reference is that we know it was with some Kepler's. While that picture was complicated by cache variations the GTX650TiBoost allowed us to determine that cache wasn't the only bottleneck and the MCL was definitely a bottleneck in itself (for some other cards). Shouldn't this bottleneck lower the GPU usage? Depends on how GPU usage is being measured, but MCL should rise with GPU usage, as more bandwidth is required to support the GPU, and it appeared to do just that: When I reduced CPU usage from 100% to 55% the GPU usage rose from 89% to 93% and the MCL increased from ~46% to 49%. At 100% CPU usage both the GPU usage and MCL were also more erratic. Also, when I increased the GPU clock the MCL increased: 1126MHz GPU - 45% MCL 1266MHz GPU - 49% MCL 1406MHz GPU - 53% MCL So the signs are there. Being able to OC or boost the GDDR5 should offset the increase in MCL (it did with Kepler's). Did you try to lower the memory clock to measure the effect of this 'bottleneck'? I tried but I cant change the memory clock - the Current Clock remains at 3005MHz (the default clock). It appears that NVidia Inspector, GPUZ (and previously MSI Afterburner) recognised that I've asked that the GDDR5 clocks are increased, but they don't actually rise. I've tried Furmark, and it seems to be limited by memory bandwith, while GPUGrid seems to be limited by GPU speed: I'm wondering if the measured MCL is actually measuring usage of the new compression system and if this actually reflects a bottleneck or not. Increasing the GDDR5 would be the simple test, but that's a non-starter, which is another question in itself. The only way to confirm if the MCL increase is really a bottleneck is to run similar WU's at different GPU frequencies and plot the results looking for diminishing returns. You would still expect to gain plenty from a GPU OC, but should see less gain as a result of MCL increases at higher GPU frequencies. Even with a frequency difference of 1406 vs 1126 (280MHz) the MCL difference is just 18% (53% vs 45% load), but six or seven points down to around 1051MHz might be enough to spot the effect of a MCL bottleneck, if it exists. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 38584 | Rating: 0 | rate: / Reply Quote | |
Using NVIDIA Inspector you can make sure the Current GDDR5 clocks are high, but you have to match the P-State value on the Overclocking panel to the state shown on the left. For me the P-State is P2, so in order to ensure 3505MHz is used I have to set the overclocking Performance Level to P2. Then I can push the Memory CLock Offset to 3505MHz. | |
ID: 38604 | Rating: 0 | rate: / Reply Quote | |
Using NVIDIA Inspector you can make sure the Current GDDR5 clocks are high, but you have to match the P-State value on the Overclocking panel to the state shown on the left. For me the P-State is P2, so in order to ensure 3505MHz is used I have to set the overclocking Performance Level to P2. Then I can push the Memory CLock Offset to 3505MHz. Thanks skgiven. Do you see an increase in performance on GPUGrid WUs when the memory clock is increased to 3500 MHz? | |
ID: 38628 | Rating: 0 | rate: / Reply Quote | |
Thanks skgiven. Do you see an increase in performance on GPUGrid WUs when the memory clock is increased to 3500 MHz? I think so but I don't have much to go on so far. I was really just looking for a MCL drop, which I found (~53% to ~45%). To confirm actual runtime improvement (if any) that results solely from the memory freq. increase I would really need to run several long same type WU's at 3505MHz, then several at 3005MHz, all with the same GPU clock and Boinc settings. Ideally others would do the same to confirm findings. That will take two or three days as there are a mixture of Long task types and each take 5 or 6h to run... I think you would be less likely to spot a small performance change from running short WU's as those only have a MCL of around about 27%; it's not like we are overclocking here, just making sure the GDDR5 runs at the speed it's supposed to. Most of us run the Long WU's anyway, so that's what we should focus on. ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 38629 | Rating: 0 | rate: / Reply Quote | |
The improvement was only 2 or 3% for me with the GPU at around 1406MHz. | |
ID: 38677 | Rating: 0 | rate: / Reply Quote | |
Regarding the question of whether ~50% MCL could be a bottleneck: the memory accesses don't have to be distributed homogenously over time. So even if the memory controllers may be idle about 50% of the time, the chip may be waiting for new data to continue processing at other times. | |
ID: 38682 | Rating: 0 | rate: / Reply Quote | |
I have not tried EVGA precision (yet) but I did eventually work out how to increase GDDR5 past 3505MHz using NV Inspector. | |
ID: 38706 | Rating: 0 | rate: / Reply Quote | |
Give that card a good spanking :D | |
ID: 38720 | Rating: 0 | rate: / Reply Quote | |
I finally got my GTX970 yesterday! | |
ID: 38811 | Rating: 0 | rate: / Reply Quote | |
Are you using the latest Nvidia Inspector? They mention a couple of changes that might be of interest. | |
ID: 38812 | Rating: 0 | rate: / Reply Quote | |
Yes, I'm using the current 1.9.7.3. | |
ID: 38816 | Rating: 0 | rate: / Reply Quote | |
Congratulations ETA. | |
ID: 38820 | Rating: 0 | rate: / Reply Quote | |
Thanks! Have fun with your as well :) | |
ID: 38825 | Rating: 0 | rate: / Reply Quote | |
BTW: the bitcoin folks are seeing the same problem. Some can't OC the memory, while one can. They don't get blue screen at stock memory clock, though. | |
ID: 38827 | Rating: 0 | rate: / Reply Quote | |
OK, I'm getting closer: when I have GPU-Grid crunching in the background not even running Heaven triggers P0! It just stays there at P2, in the same way as everything I've tried with inspector. | |
ID: 38835 | Rating: 0 | rate: / Reply Quote | |
I installed nvidia inspector and it shows the card runs in P2. The clock is at 1304MHz (I have set with PresicionX the max temp to 73°C and the power target to 105). The clock can run higher if I set the temperature higher. | |
ID: 38855 | Rating: 0 | rate: / Reply Quote | |
I suppose your memory clock is set to 1500 MHz (in GPU-Z) as well? I'm trying again to manually run 1750 MHz. I passed "memtestcl" at that and since a few minutes everything looks fine. I could live with manually forcing the clock, while staying in P2, as long as I don't get BS again! | |
ID: 38859 | Rating: 0 | rate: / Reply Quote | |
The 1920 DP64 core (5760 total cores/450WattTDP?) Titan-Z is currently priced @ 1399-1499usd equaling cost of [2]GTX980(4096totalcore/128DP64/340wattTDP) | |
ID: 38862 | Rating: 0 | rate: / Reply Quote | |
Clarification for New Maxwell card owners and Kelper owners looking to upgrade. What will Maxwell GM2*0 bring to it's brethren ? We know of few proven GM107/GM204 changes from Kelper: refined crossbar/dispatch/issue - Better power efficiency per SMM compared to SMX- runs cooler- lower DP64 performance from loss of 64bit Banks and 8byte shared banking mode (less DP core per SMM) - Higher Integer (more cores per SMM) - different TMU/CUDA core ratio - revised filtering/cache/memory algorithms - a barrel shifter (left out of Kelper)- and enhanced SIP core block for video encoding/decoding.(Kelper is first generation) | |
ID: 38863 | Rating: 0 | rate: / Reply Quote | |
That is absolutely and exhaustively TLDR. | |
ID: 38864 | Rating: 0 | rate: / Reply Quote | |
I suppose your memory clock is set to 1500 MHz (in GPU-Z) as well? I'm trying again to manually run 1750 MHz. I passed "memtestcl" at that and since a few minutes everything looks fine. I could live with manually forcing the clock, while staying in P2, as long as I don't get BS again! Yes indeed the memory clock runs at 1500MHz. I have now set the temperature maximum to 73°C. The GTX970 runs smooth, troughs hot air out the cast and is around 10.000 seconds faster then my GTX777. Checked via an energy monitor, the systems runs with 100W less energy then with the GTX770 in it. So not bad at all. What will be the effect to get it in P0 state and memory higher? Will that result in faster processing of WU's? ____________ Greetings from TJ | |
ID: 38865 | Rating: 0 | rate: / Reply Quote | |
I'm looking forward to Matt's analysis. | |
ID: 38866 | Rating: 0 | rate: / Reply Quote | |
What will be the effect to get it in P0 state and memory higher? Will that result in faster processing of WU's? It should get the memory clock up to 1750 Mhz automatically. This should be good for about a 4% performance boost according to SK. So it's not dramatic, but quite a few credits per day more. Memory controller load would reduce from almost 50% to a good 40%. Scaling with higher GPU clocks should be improved. And GTX980 might like this a lot, as with higher performance it needs even more bandwidth to feed the beast. And I'm not sure if P2 uses a lower memory voltage. That's pretty much the point I care about most. Initially I got blue screens when I tried to run my memory at stock 1750 MHz in P2, whereas Heaven ran fine for about 2h at that clock speed in P0. When I tried it again yesterday I got a calculation error. Other people are reporting memory clocks up to 2000 MHz for these cards, which might be totally out of reach if I can not even get 1750 Mhz stable. And finally: I'm asking myself, why is nVidia limiting the mem clock speed in P2? and why are they changing to P2 when ever CUDA or OpenCL is being used? It can't be low utilization (~50%, as in Heaven). Maybe they're using stricter timings in this mode, which might generally benefit GP-GPU more than higher throughput? This could explain why my card would take this clock speed in P0, but not in P2. MrS ____________ Scanning for our furry friends since Jan 2002 | |
ID: 38867 | Rating: 0 | rate: / Reply Quote | |
I'm asking myself, why is nVidia limiting the mem clock speed in P2? and why are they changing to P2 when ever CUDA or OpenCL is being used? It can't be low utilization (~50%, as in Heaven). Maybe they're using stricter timings in this mode, which might generally benefit GP-GPU more than higher throughput? This could explain why my card would take this clock speed in P0, but not in P2. I think that stability is a more important factor than speed while using CUDA or OpenCL. I had memory clock issues with my Gigabyte GTX780Ti OC, so perhaps NVidia is trying to avoid such issues as much as possible. I've overclocked my GTX980 to 1407MHz @ 1.237V, the memory controller usage had risen to 56-60%, but it's still running at 3005MHz. I'm not sure I can increase the memory clock to 3505MHz, as according to MSI afterburner my card is using 104-108% power. | |
ID: 38869 | Rating: 0 | rate: / Reply Quote | |
I was able to clock the GDDR5 to 3500MHz and over, even when OC'ing the GPU core and with power >110%. That said I did get an error at silly speeds. Initially I was just being inquisitive to understand why the 256bit bus has such a high usage for some tasks, how much of a problem it was and looking for a happy spot WRT power and efficiency. | |
ID: 38870 | Rating: 0 | rate: / Reply Quote | |
skgiven: | |
ID: 38871 | Rating: 0 | rate: / Reply Quote | |
(Sorry for TLDR posts) Thanks eXaPower for those long posts. A lot of information concerning the NV architectures. Very interesting. | |
ID: 38873 | Rating: 0 | rate: / Reply Quote | |
Maxwell and Kelper's: CUDA/LD/ST/SFU/TMU/ROP/warp schedulers/Instruction cache buffer/Dispatch unit/Issue/Cross Bar/Polymorph Engine"SMM" "SMX" | |
ID: 38877 | Rating: 0 | rate: / Reply Quote | |
I wrote my vendor (Galax) an email - hopefully I'm not just getting some useless standard reply. | |
ID: 38886 | Rating: 0 | rate: / Reply Quote | |
Finally found the time to summarize it and post at Einstein. I also sent it as bug report to nVidia. | |
ID: 39082 | Rating: 0 | rate: / Reply Quote | |
Any information regarding the Maxwell update? | |
ID: 39341 | Rating: 0 | rate: / Reply Quote | |
In time there may well be a GTX960, 990, 960Ti, 950Ti, 950, 940, 930 &/or others, and when GM200/GM210 turns up there could be many variants in the GeForce, Quadro and Tesla ranges... | |
ID: 39372 | Rating: 0 | rate: / Reply Quote | |
Nvidia released a statement about GTX970 memory allocation issues. There are reports GTX970 can't properly utilize 4GB. “The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments. http://www.pcper.com/news/Graphics-Cards/NVIDIA-Responds-GTX-970-35GB-Memory-Issue http://images.anandtech.com/doci/7764/SMX_575px.png http://images.anandtech.com/doci/7764/SMMrecolored_575px.png | |
ID: 39698 | Rating: 0 | rate: / Reply Quote | |
Nvidia admits the GTX970 allows 1792KB of L2 cache and 56ROPS to be accessed. http://www.pcper.com/reviews/Graphics-Cards/NVIDIA-Discloses-Full-Memory-Structure-and-Limitations-GTX-970 Despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. http://anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation The benchmarking program Si software has reported 1.8MB (1792KB) of cache for GTX970 since the beginning. It was always chocked up has a bug. | |
ID: 39744 | Rating: 0 | rate: / Reply Quote | |
For here, the MCU usage is lower on the GTX970 than on the GTX980. | |
ID: 40004 | Rating: 0 | rate: / Reply Quote | |
Well, we can surely say that the smaller L2 cache doesn't hurt GPU-Grid performance much. But we can't say anything more specific, can we? | |
ID: 40008 | Rating: 0 | rate: / Reply Quote | |
The 970 reminds me of the 660Ti, a lot. | |
ID: 40061 | Rating: 0 | rate: / Reply Quote | |
You're correct: Kelper disables a full memory controller and 8 ROP that comes with SMX. The GTX970 is still the best NVidia performance/cost card for amount of feature sets included. In the prior generation of Kepler-derived GPUs, Alben explained, any chips with faulty portions of L2 cache would need to have an entire memory partition disabled. For example, the GeForce GTX 660 Ti is based on a GK104 chip with several SMs and an entire memory partition inactive, so it has an aggregate 192-bit connection to memory, down 64 bits from the full chip's capabilities. From Damien Triolet at Hardware.fr: The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is : Testing (forums) reveals GTX970 from different manufactures have varying results for Peak rasterization rates and Peak pixel fill rates. (at the same clock/memory speeds) Is this because GTX970 SMM/cache/ROPS/Memory structures are disabled differently form one another? Reports include - not ALL GTX970 are affected by 224+32bit bus or the separate 512MB pool with a 20-28GB bandwidth slow down. Does this reveal NVidia changed the SMM disablement process? Is/was a second revision of GTX970 produced yet? AnandTech decent explanations about SMX structures can be found in the GTX660ti and GTX650ti reviews to compare with SMM recent articles. When sorting through for reliable information: Some tech forums threads misinformed 970 comments are hyperbolic - completely unprofessional and full of trolling. | |
ID: 40077 | Rating: 0 | rate: / Reply Quote | |
Message boards : Graphics cards (GPUs) : Maxwell now