Advanced search

Message boards : Graphics cards (GPUs) : Maxwell now

Author Message
flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 34732 - Posted: 19 Jan 2014 | 20:25:30 UTC

There's a rumor going around that Maxwell is coming out next month. I wonder if this was planned of if AMD's sales are hurting them?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34736 - Posted: 20 Jan 2014 | 2:33:11 UTC - in response to Message 34732.

It looks more like a delaying action to hold off AMD until the 20 nm process arrives, probably later than they had originally hoped. A GTX 750 Ti won't set the world on fire in performance, and won't make them a ton of money. But it gives them a chance to see how well the design works in practice, and to give the software developers a head start before the real Maxwell arrives.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34807 - Posted: 24 Jan 2014 | 22:21:47 UTC

It likely is just a false rumor. No prove has been shown that these cards use Maxwell chips, despite relatively complete benchmarks already appeared. It's probably just GK106 with 768 shaders.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34823 - Posted: 26 Jan 2014 | 11:21:18 UTC - in response to Message 34807.
Last modified: 26 Jan 2014 | 11:33:19 UTC

Producing a Maxwell on 28nm process would be a complete change of direction for NVidia, so I agree this is likely to be a false rumor. There are two revision models (Rev. 2) of GPU's in the GF600 lineup (GT630 and GT640), so perhaps NVidia want to fill out their GF700 range with a lower end card, so if there is a Rev2 version of the GK650Ti en route, it makes more sense to shift it to the GF700 range.

The idea of constructing a Maxwell on 28n, does make a lot of sense however; GM could be tested directly against GK, and they could produce more competitive entry to mid-range cards earlier. Small cards are the largest part of the GPU market so why produce a big, immature card first? As GM's will have a built in CPU (of sorts) it would be better to test these (and their usefulness / scalability) on smaller cards first - no point producing a fat GM which has a insufficient CPU to support it.

I've always wondered why they produced the larger cards first. It's just been a flag waving exercise IMO. While that might be marketable, it makes no sense when dealing with the savvy buyer and other businesses (OEM's), especially for supercomputers.

NVidia could also have produced GF cards at 28nm, and they would have had a market. Perhaps they did; just for engineering, design and testing purposes and managed to keep these chips completely in-house. While such designs might have been/will be marketable, from a business point of view they would primarily be competing against other NVidia products - probably a bad thing - better to focus your developmental skill set on one controllable futuristic objective rather than tweaking.

The eventual 40% reduction in die size will probably facilitate cooler GPU's. In the main, GK temperatures are significantly less of an issue than GF temps, but for several high end cards in one system it's still a problem. So while NVidia doesn't have temperature licked now, it should fall into place at 20nm.

In the mean time, entry to mid-range 28nm cards are easy to produce and easy to cool. 28nm Maxwell's might be easier to work with now and for early 20nm products. When moving to 20nm, yields will inevitably be low (always are) so it would make sense to start at the small end where you are actually going to get enough sample to test with and enough product to release. The lesser bins tend to go to OEM anyway, so it might be better to start there and get a product out which will compete with AMD and Intel's integrated GPU processors ASAP. Lets face it, this is where the competition is highest and NVidia is weakest. So the first 28nm Maxwell's could well be for laptops and other mobile devices. ARM can already be used to support an OS, so IMO it's inevitable that ARM will bolster their CPU with an NVidia GPU. That's what the market really wants; sufficient CPU processing power to start up and run an OS and a high end GPU for the video-interface, gaming... isn't it?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34825 - Posted: 26 Jan 2014 | 16:21:02 UTC - in response to Message 34823.
Last modified: 26 Jan 2014 | 16:23:00 UTC

ARM can already be used to support an OS, so IMO it's inevitable that ARM will bolster their CPU with an NVidia GPU. That's what the market really wants; sufficient CPU processing power to start up and run an OS and a high end GPU for the video-interface, gaming... isn't it?


I guess a fairly large part of the market wants that. I would be happy with a motherboard with a BIOS that can boot from PXE, no SuperIO (USB, RS-232, parallel port, PS/2), ISC bus for onboard temperature sensors, no IDE or SATA (no disks), just lots of RAM, an RJ-45 connector and gigabit ethernet, no wifi, enough CPU processing power to startup and run a minimal OS that has a good terminal, SSH and can run BOINC client and project apps. Don't need a desktop or anything to do with a GUI, no TWAIN or printer drivers/daemons, no PnP or printer service, no extra fonts (just a decent terminal and 1 font), network services required, Python or some other scripting language would be nice but not much more.

If they could fit all that onto a slightly larger video card I'd be happy, otherwise put it on a 2" x 5" board with a PCIe slot and power connectors and call it a cruncher. Something so no frills IKEA would consider stocking it.

What else would be unnecessary... no RTC (get the time off the LAN), no sound, no disk activity LED.
____________
BOINC <<--- credit whores, pedants, alien hunters

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35023 - Posted: 13 Feb 2014 | 20:09:11 UTC

Seems like the cat is out of the bag.. and we were all wrong, as usual for a new generation ;)
It's still not official, but far more solid than any rumors before this:

- at least 1, probably 2 small chips in 28 nm soon
- the bigger ones later in 20 nm
- architectural efficiency improvements
- much larger L2 cache
- the compute to texture ratio increases from 12:1 to 16:1 (like AMDs)
- the SMX goes down to 128 shaders (192 in Kepler)
-> that could mean they're going back to non-superscalar (i.e. just scalar)

If the latter is true this could mean significant per-clock per shader performance improvements here and in many other BOINC projects :)

MrS
____________
Scanning for our furry friends since Jan 2002

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35024 - Posted: 13 Feb 2014 | 23:05:14 UTC - in response to Message 35023.

Sounds like a big performance per watt increase will be coming too. I think I'll put planned purchases on hold, build savings and see what the picture looks like 4 months from now.
____________
BOINC <<--- credit whores, pedants, alien hunters

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35029 - Posted: 14 Feb 2014 | 8:12:58 UTC - in response to Message 35024.

That's not what nVidia would like you to do.. but I agree ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35030 - Posted: 14 Feb 2014 | 10:11:16 UTC - in response to Message 35024.
Last modified: 14 Feb 2014 | 11:48:48 UTC

For GPUGrid, performance touting is premature - we don't even know if it will work with the current app. It could take 6 months of development and debugging. It took ages before the Titan's worked.

As the GTX750Ti will only have 640 Cuda Cores, the 128bit bus probably won't be an issue. The Cuda core to bus lane ratio is about the same as a GTX670. However, the 670 is super-scalar and the GTX480 had 384lanes. Suggesting a 60W GTX750Ti will be slightly faster than a GTX480 still sounds unrealistic, but assuming the non-super-scalar Cuda cores aren't 'semi-skimmed' it might be powerful enough. I suspect they will not be 'full fat' in the way GF110 was, and there could be additional bottlenecks, driver bugs... So it's wait and see.

Having 6power pins means the GTX750Ti could be powered directly from the PSU, rather than through the motherboard. This is good if you want to use this card on a Riser in say a 3rd slot (which might not actually be capable of supplying 75W).

Avoid cards with small fans - they don't last.

I still say 'stay clear of the GTX750' if it's only got 1GB GDDR5.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35038 - Posted: 14 Feb 2014 | 13:53:07 UTC

I still feel like I'm stuck between a rock and a hard place. Haswell e will have an 8 core variant in q3. So this is definitely going to be bought. However, I would like this to be my last system build for more than a year, as pumping 5k annually is something I can not continue. Every other year, sure.

But with Volta and it's stacked dram.... I'm very cautious about dropping 1.8k+ on gpus that most likely won't be that large of a change. Well see I suppose

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35097 - Posted: 16 Feb 2014 | 14:24:22 UTC - in response to Message 35038.

But with Volta and it's stacked dram.... I'm very cautious about dropping 1.8k+ on gpus that most likely won't be that large of a change. Well see I suppose

Volta will still take some time, as GPUs have matured quite a bit (compared to the wild early days of a new chip every 6 months!) and progress is generally slower. That's actually not so bad, because we can keep GPUs longer and the software guys have some time to actually think about using those beasts properly.

If you still have Fermis or older running, get rid of them as long as you can still find (casual) gamers willing to pay something for them. If you think about upgrading from Kepler to Maxwell and don't want to spend too much I propose the following: replace 2 Keplers by 1 Maxwell for about the same throughput, which should hopefully be possible with 20 nm and the architectural improvements. This way you don't have to spend as much and reduce power usage significantly (further savings). You throughput won't increase, but so what? If you feel like spending again you could always add another GPU.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35103 - Posted: 17 Feb 2014 | 18:48:21 UTC - in response to Message 35023.

There is no ARM CPU on the block diagram of the GM107:


After reading the article it seems to me that this is only a half step towards the new generation: it has better performance/watt ratio because of the evolution of the 28nm process and because of the architectural changes (probably these two aspects are bound together: this architecture can achieve higher CUDA core/chip area ratio than the GK architecture).
As its performance is expected to be like the GTX480's performance, perhaps there is no need for an on-chip CPU to fully utilize this GPU.
Also, it's possible that there is no need for big changes in the GPUGrid application to work with this GPU.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35104 - Posted: 17 Feb 2014 | 21:13:20 UTC - in response to Message 35103.

As far as I remember this "ARM on chip" was still a complete rumor. Could well be that someone confused some material about future nVidia server chips with GPU (project Denver) for the regular GPUs.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35114 - Posted: 18 Feb 2014 | 13:38:14 UTC - in response to Message 35104.
Last modified: 18 Feb 2014 | 13:42:48 UTC

I would be a bit concerned about the 1306GFlops rating for the GTX750Ti. That's actually below the GTX650Ti (1420). The 750Ti also has a 128bit bus and bandwidth of 86.4GB/s. While the theoretical GFLOPS/W SP is 21.8, it's still an entry level card; it would talk 4 of these card to have the overall performance of a GTX780Ti. There should be plenty of OC models and potential for these GPU's to boost further.

There may also be a 1GB version of the GTX750Ti (avoid).

My confusion over ARM came from fudge reports which presumed Maxwell and ARM are joined at the hip. Just because Tesla's might get an ARM this decade does not mean any other card will. It hasn't even been announced that Maxwell based Tesla's will - just interpreted that way.
The use of ARM doesn't require Maxwell architecture; the Tegra K1 is based on Kepler and uses a Quad-Core ARM Cortex-A15 R3, and previous Tegra's also used ARM.
It is the case that NVidia want to do more on the discrete GPU and be less reliant on the underlying system but that doesn't in itself require an ARM processor.
The only really interesting change is that the Shader Model is 5.0 - so it's CC5.0. This non-super-scalar architecture probably helped throw people into thinking that these GPU's would come with ARM processors, but when you think about it, there is no sense putting a discrete processor onto an entry level GPU. A potential obstacle to the use of ARM might be Windows licences, as these typically limit your software use to 2CPU's (makes a second card a no-no).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35128 - Posted: 18 Feb 2014 | 20:48:02 UTC - in response to Message 35114.
Last modified: 18 Feb 2014 | 20:48:31 UTC

I see EVGA are selling a GTX750Ti with a 1268MHz Boost. In theory that's 16.8% faster than the reference model, though I would expect the reference card to boost higher than the quoted 1085MHz (if it works)!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 35139 - Posted: 19 Feb 2014 | 1:00:46 UTC

I have some GTX750Tis on order; should have them in my hands next week.
It's not yet clear whether we'll need to issue a new application build.
Stay tuned!

Matt

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35147 - Posted: 19 Feb 2014 | 14:02:13 UTC - in response to Message 35139.
Last modified: 19 Feb 2014 | 14:03:26 UTC

I read that the 128bit bus is a bottleneck, but as the card uses 6GHz GDDR5 a 10% OC is a given. The GPU also OC's well (as the temps are low). So these cards could be tweeked to be significantly more competitive than the reference model.

Compute is a bit mixed going by Anandtech, so its wait and see about the performance (if they work),

http://anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/22
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35157 - Posted: 19 Feb 2014 | 20:51:05 UTC

Don't be fooled by the comparably low maximum Flops. We got many of those with Kepler, and complained initially that we couldn't make proper use of them, as the performance per shader per clock was significantly below non-superscalar Fermis. Now we're going non-superscalar again and gain some efficiency through that, as well as through other tweaks.

And this show in the compute benchmarks at Anandtech: GTX750Ti beats GTX650Ti easily and consistently, often hangs with GTX650Ti Boost and GTX660 and sometimes performs more than twice as fast as GTX660! Neither of those benchmarks is GPU-Grid, but this bodes well for Maxwell here, since GPU-Grid never really liked the super-scalarity all that much. Let's wait for Matt's test.. but I expect Maxwell to do pretty well.

The 128 bit memory bus on GM107 is somewhat limiting, but mitigated by the far larger L2 cache. To what extend for GPU-Grid.. I don't know. And those chips seem to clock ridiculously high. I've seen up to almost 1.3 GHz at stock voltage (1.13 - 1.17 V). If wish the testers had lowered the voltage to see what the chips really can do, instead of being limited by the software sliders. The bigger chips naturally won't clock as well, but 20 nm should shake things up anyway.

Bottom line: don't rush to buy those cards, since they're only mainstream models after all. But don't buy any other cards for GPU-Grid until we know how good Maxwell really is over here.

MrS
____________
Scanning for our furry friends since Jan 2002

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35158 - Posted: 19 Feb 2014 | 21:23:05 UTC - in response to Message 35157.

OK no purchases but I would rather a professional or a Ph.D. test the pretend Maxwells so we can be sure of what we're looking at ;-)

____________
BOINC <<--- credit whores, pedants, alien hunters

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35159 - Posted: 19 Feb 2014 | 22:02:53 UTC - in response to Message 35158.

Professional enough? Or shall I search for a review written by someone with a PhD in ancient greek history? ;)

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35162 - Posted: 19 Feb 2014 | 22:33:29 UTC

EVGA (Europe) just announced they have Titan Black and GTX 750 and 750Ti for sale. The latter for 150 euro, really cheap with 2GB. However not in stock so can not be ordered yet, but I won't.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35165 - Posted: 20 Feb 2014 | 0:12:35 UTC - in response to Message 35162.

The GPU memory latency is supposedly better, so the GTX750Ti's memory bandwidth bottleneck might not be as bad as I first suspected. That said, compute performances are a bit 'all over the place'. It's definitely a wait and see situation for here.

My concerns for these cards are first and foremost compatibility; will it work straight out of the box, will an app revamp be required, will we have to wait on drivers or might they never be compatible (time, money and effort developing for an entry level GPU might well be better spent elsewhere).

If the apps work the memory bandwidth may or may not be an issue, but the performance/Watt should be very good nonetheless. Some of the Folding benchmarks are promising, so if they are not up to much for here they are good for there (SP), and possibly a Boinc project or two.

I get the distinct feeling that the 20nm Maxwell cards will bring exceptional performances for here, when they turn up. They won't all have memory bottlenecks, and performance/Watt is likely to be much better than with the 28nm versions (which are already great). I think it's really a good time to watch this space, and start to think about and prepare for future purchases; sell on existing hardware when the value is still good!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 35171 - Posted: 20 Feb 2014 | 12:52:48 UTC - in response to Message 35165.

Some more info on Maxwell.

http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf

gdf

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35173 - Posted: 20 Feb 2014 | 14:05:00 UTC - in response to Message 35171.

NVidia are comparing the GTX750Ti to a 3 generation old GTX480 for performance and a GT640 for power usage, but not a GTX650Ti! For some games its roughly equal to a GTX480 and in terms of performance/Watt the GTX750Ti is 1.7times better than a GT640 (and similar). While it is a GX107 product, the name suggests its an upgrade to a GTX650Ti.

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 36
Credit: 363,857,679
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35236 - Posted: 22 Feb 2014 | 22:52:05 UTC
Last modified: 22 Feb 2014 | 22:52:54 UTC

One of my team mates has a couple 750Ti's and they keep failing here. They are running good at Einstein. I have encouraged our team members to post in here that have the new Maxwell cards.

http://www.gpugrid.net/show_host_detail.php?hostid=167781
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35237 - Posted: 22 Feb 2014 | 23:07:56 UTC - in response to Message 35236.
Last modified: 22 Feb 2014 | 23:10:50 UTC

Thanks. That is important information to share!

Both long and short run tasks are failing. All present task types are failing. Tasks fail quickly, 0 to 4sec.

This strongly suggests that the present app would need to be modified to support these GPU's. It's likely that the different architecture is the reason for GPUGrid failures (non-super-scalar...) and running CUDA 5.5apps. At Einstein they use a CUDA 3.x app IIRC - a much more unrefined CUDA version; more tolerant, but slower.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ChelseaOilman
Send message
Joined: 6 Jan 14
Posts: 2
Credit: 1,351,352,961
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 35238 - Posted: 22 Feb 2014 | 23:33:16 UTC

I think I might be the one Coleslaw is referring about. I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds. Einstein seems to run just fine. I bought these cards to run GPUGRID and I'm not to happy that they can't. I'm not that interested in running Einstein and other than F@H there isn't much else out there for GPUs. I refuse to participate in F@H anymore. If you need any info feel free to ask.

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 36
Credit: 363,857,679
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35239 - Posted: 23 Feb 2014 | 0:03:29 UTC - in response to Message 35238.

I think I might be the one Coleslaw is referring about. I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds. Einstein seems to run just fine. I bought these cards to run GPUGRID and I'm not to happy that they can't. I'm not that interested in running Einstein and other than F@H there isn't much else out there for GPUs. I refuse to participate in F@H anymore. If you need any info feel free to ask.


Yes you are. Thanks for volunteering. :)

Gilthanis
____________

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 35240 - Posted: 23 Feb 2014 | 0:37:05 UTC - in response to Message 35237.


This strongly suggests that the present app would need to be modified to support these GPU's.


That's pretty annoying - it likely means that we'll not be able to use them until CUDA 6 goes public.

Matt

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35241 - Posted: 23 Feb 2014 | 1:07:07 UTC - in response to Message 35240.

Sorry for being annoying.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35270 - Posted: 23 Feb 2014 | 13:47:49 UTC - in response to Message 35238.

I bought these cards to run GPUGRID and I'm not to happy that they can't.

Well, it's a new architecture (or more precisely: significantly tweaked and rebalanced) so some "unexpected problems" can almost be expected. Be a bit patient, I'm sure this can be fixed. Maxwell is the new architecture for all upcoming nVidia chips in the next 1 - 2 years, after all.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 35272 - Posted: 23 Feb 2014 | 14:14:01 UTC - in response to Message 35270.


I bought these cards to run GPUGRID and I'm not to happy that they can't.


Don't worry, support is coming just as soon as possible. These new cards are very exciting for us! Unfortunately, because of the way we build our application, we need to wait for the next version of CUDA which contains explicit Maxwell support.
The other GPU-using projects don't have this limitation because they build their applications in a different way. The other side of that is that they aren't able to make GPU-specific optimisations the way we do.

Matt

ChelseaOilman
Send message
Joined: 6 Jan 14
Posts: 2
Credit: 1,351,352,961
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 35275 - Posted: 23 Feb 2014 | 14:50:43 UTC - in response to Message 35272.

Don't worry, support is coming just as soon as possible. These new cards are very exciting for us! Unfortunately, because of the way we build our application, we need to wait for the next version of CUDA which contains explicit Maxwell support.

I hope Nvidia comes out with the new version of CUDA soon. I expect GPUs that use much less electricity will become very popular quickly. I'll be switching back to GPUGRID when you get the Maxwell compatible client out.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37914 - Posted: 15 Sep 2014 | 13:07:42 UTC

The new Haswell E 6-core is available in the Netherlands, but pricy. Any idea when the real Maxwell is launched. I read "soon" on the net in some articles, but did not find any date.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37915 - Posted: 15 Sep 2014 | 13:58:15 UTC - in response to Message 37914.
Last modified: 15 Sep 2014 | 14:03:28 UTC

The new Haswell E 6-core is available in the Netherlands, but pricy. Any idea when the real Maxwell is launched. I read "soon" on the net in some articles, but did not find any date.


Rumor has it- GTX980 (or whatever board will be called) will be showcased (or released) at NVidia's Game24 event on September 18th, along with a 343 branch driver. GTX 970/960 could be released by early/mid October. Leaked benchmarks (if there not fake) show GM204 Maxwell to be at reference GTX780ti (5teraFlops) performance levels with a lower TDP. Maxwell's Integer/256AES/TMU/ROP performance is higher then Kelper's core. GTX 980 will have 256bit memory interface. Float (double/single) will be similar to a disabled DP core GK110 (GTX780/780ti) cards. A Titan with 64DP core SMX enabled for double precision tasks won't be replaced until another Maxwell stack is created for Titan's market position. A dual Maxwell board with 11/12 single teraflops' and 3/4 Teraflops for double would be an ultimate board.

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,140,895,172
RAC: 19,729
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37919 - Posted: 16 Sep 2014 | 14:35:32 UTC


[/url][/url]

So here is deciding which card would be best for GPUgrid
Boost clock for gtx 980 looks very good, even the 64 ROPS.. But 780Ti have 2880 cuda cores...?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37920 - Posted: 16 Sep 2014 | 14:51:45 UTC - in response to Message 37919.
Last modified: 16 Sep 2014 | 14:54:53 UTC

Thanks for this Jozef J.
No hurry for me now to build a new rig as this is still not the "real" Maxwell with the 20nm chip.
Despite more energy used the 780Ti is still the best card to my opinion.

PS: I have a factory overclocked EVGA 780Ti and runs at 1137MHz.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37921 - Posted: 16 Sep 2014 | 15:02:19 UTC - in response to Message 37919.
Last modified: 16 Sep 2014 | 15:02:50 UTC

So here is deciding which card would be best for GPUgrid
Boost clock for gtx 980 looks very good, even the 64 ROPS.. But 780Ti have 2880 cuda cores...?

The GTX 780Ti is superscalar, so not all of the 2880 CUDA cores can be utilized by the GPUGrid client. The actual number of the utilized CUDA cores of the GTX 780Ti is somewhere between 1920 and 2880 (most likely near the lower end). This could be different for each workunit batch. If they really manufacture the GM204 on 28nm lithography, than this is only a half step towards a new GPU generation. The performance per power ratio will be slightly better of the new GPUs, and (if the data in this chart are correct) I expect the GTX980 could be 15~25% faster than the GTX780Ti (here at GPUGrid). When we'll have the real GPUGrid performance of the GTX980, we'll know how much of the 2880 CUDA cores of the GTX780Ti is actually utilized by the GPUGrid client. But as NVidia choose to move back to scalar architecture, I expect that the superscalar architecture of the Keplers (and the later Fermis) wasn't as successful as expected.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37923 - Posted: 16 Sep 2014 | 15:53:33 UTC - in response to Message 37921.

So here is deciding which card would be best for GPUgrid
Boost clock for gtx 980 looks very good, even the 64 ROPS.. But 780Ti have 2880 cuda cores...?

The GTX 780Ti is superscalar, so not all of the 2880 CUDA cores can be utilized by the GPUGrid client. The actual number of the utilized CUDA cores of the GTX 780Ti is somewhere between 1920 and 2880 (most likely near the lower end). This could be different for each workunit batch. If they really manufacture the GM204 on 28nm lithography, than this is only a half step towards a new GPU generation. The performance per power ratio will be slightly better of the new GPUs, and (if the data in this chart are correct) I expect the GTX980 could be 15~25% faster than the GTX780Ti (here at GPUGrid). When we'll have the real GPUGrid performance of the GTX980, we'll know how much of the 2880 CUDA cores of the GTX780Ti is actually utilized by the GPUGrid client. But as NVidia choose to move back to scalar architecture, I expect that the superscalar architecture of the Keplers (and the later Fermis) wasn't as successful as expected.


Is NVidia skipping 20nm for 16nm? After couple years of development, TSMC is struggling badly to find proper Die size(s) for 20nm. Nvidia changes lithography every two years or so. Now after Two and half years, boards are still at 28nm, after three series releases(600,700,800m) of 28nm generations, while GTX980 will be the fourth 28nm released. What could be the problem with finding a pattern to fit cores on 20nm? The change from superscalar to scalar?

How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times. Is Maxwell's cache sub system architecture, TMU rendering that much better than Kelper's, running GPUGRID code? Maxwell's core architecture may be more efficient than Kepler's, but is Maxwell's really more advanced, when Float processing is similar to Kelper? Maxwell Integer performance is higher, due to having more integer cores in SMM vs. SMX, and the added barrel shifter, which is missing in Kepler.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37924 - Posted: 16 Sep 2014 | 19:53:45 UTC - in response to Message 37923.

How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times.

That's very easy to answer:
The SMXes of the GTX650Ti and the GTX660 are superscalar, so only (approximately) 2/3rd of their cores can be utilized (512 and 640, respectively).

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37925 - Posted: 16 Sep 2014 | 20:42:34 UTC - in response to Message 37924.
Last modified: 16 Sep 2014 | 20:46:10 UTC

How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times.

That's very easy to answer:
The SMXes of the GTX650Ti and the GTX660 are superscalar, so only (approximately) 2/3rd of their cores can be utilized (512 and 640, respectively).


If this is the case, then why do GPU utilization (MSI afterburner, eVGA precision) programs show +90% for most GPUGRID tasks? Are these programs not accounting for type of (scalar or superscalar) architecture? If only 2/3rd of cores are active, won't GPU utilization be at ~66%, instead of the typical 90%? These programs are capable of monitoring Bus usage, memory control (frame buffer), Video processing, amount of power, and much more.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37926 - Posted: 16 Sep 2014 | 23:26:25 UTC - in response to Message 37919.

I estimate that the new GM204 will be about 45% faster than a 780ti.

Matt

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37928 - Posted: 17 Sep 2014 | 0:05:37 UTC - in response to Message 37925.

How does a 5 SMM, 640core/40TMU/60W-TDP GTX750ti perform (7%~) better than a 4SMX, 768 core/ 110/130W-TDP Kelper with more TMU(64), while smashing GTX650ti/boost compute time/power consumption ratios? Core/memory speed differences'? GTX 750ti is close (~5%) to GTX660 (5SMX/960Core/140w-TDP) compute times.

That's very easy to answer:
The SMXes of the GTX650Ti and the GTX660 are superscalar, so only (approximately) 2/3rd of their cores can be utilized (512 and 640, respectively).

If this is the case, then why do GPU utilization (MSI afterburner, eVGA precision) programs show +90% for most GPUGRID tasks? Are these programs not accounting for type of (scalar or superscalar) architecture? If only 2/3rd of cores are active, won't GPU utilization be at ~66%, instead of the typical 90%? These programs are capable of monitoring Bus usage, memory control (frame buffer), Video processing, amount of power, and much more.

The "GPU utilization" is not equivalent of the "CUDA cores utilization". These monitoring utilities are right in showing that high GPU utilization, as they showing the utilization of the untis which feeding the CUDA cores with work. I think the actual CUDA cores utilization can't be monitored.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37929 - Posted: 17 Sep 2014 | 0:22:59 UTC - in response to Message 37926.

I estimate that the new GM204 will be about 45% faster than a 780ti.

That's a bit optimistic estimation as (1216/928)*(16/15)=1.3977.
but...
1. my GTX780Ti is always boosting to 1098MHz,
2. the 1219MHz boost clock seems to be a bit high, as the GTX750Ti's boost clock is only 1085MHz, and it's a lesser chip.

We'll see it soon.

BTW there's an error in the chart, as the GTX780Ti has 15*192 CUDA cores.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37931 - Posted: 17 Sep 2014 | 9:44:20 UTC
Last modified: 17 Sep 2014 | 9:45:22 UTC

[url]http://images.anandtech.com/doci/7764/SMMrecolored_575px.png
[/url]http://images.anandtech.com/doci/7764/SMX_575px.png

Here are Maxwell "crossbar", "dispatch", "issue", differences compared to Kelper feeding CUDA cores and LD/ST units, SFU.

[url] http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/3[/url]

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37933 - Posted: 17 Sep 2014 | 14:04:39 UTC - in response to Message 37931.

http://images.anandtech.com/doci/7764/SMMrecolored_575px.png
http://images.anandtech.com/doci/7764/SMX_575px.png

Here are Maxwell "crossbar", "dispatch", "issue", differences compared to Kelper feeding CUDA cores and LD/ST units, SFU.

http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/


Same text but made it clickable.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37934 - Posted: 17 Sep 2014 | 15:01:53 UTC - in response to Message 37933.

http://images.anandtech.com/doci/7764/SMMrecolored_575px.png
http://images.anandtech.com/doci/7764/SMX_575px.png

Here are Maxwell "crossbar", "dispatch", "issue", differences compared to Kelper feeding CUDA cores and LD/ST units, SFU.

http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/


Same text but made it clickable.



Thank you for fixing links.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37941 - Posted: 19 Sep 2014 | 8:32:28 UTC

The GTX980 does quite well in the Folding@home benchmarks.

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20


Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37942 - Posted: 19 Sep 2014 | 11:09:52 UTC - in response to Message 37941.

The GTX980 does quite well in the Folding@home benchmarks.

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20

Wow! Than it's possible that the GTX980's performance improvement over the GTX780Ti will be in the 25-45% range.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37943 - Posted: 19 Sep 2014 | 15:52:04 UTC - in response to Message 37942.
Last modified: 19 Sep 2014 | 15:52:22 UTC

The GTX980 does quite well in the Folding@home benchmarks.

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20

Wow! Than it's possible that the GTX980's performance improvement over the GTX780Ti will be in the 25-45% range.

Well if I read the graph correct then its 6.2 and the 780Ti is 11.

The GTX 980 is available in the Netherlands and about €80 cheaper then a GTX 780Ti. However no EVGA boards are available yet. I am anxious to see the results of GTX 980 here.
____________
Greetings from TJ

popandbob
Send message
Joined: 18 Jul 07
Posts: 67
Credit: 41,551,724
RAC: 93,729
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37946 - Posted: 20 Sep 2014 | 1:46:55 UTC - in response to Message 37943.

The GTX980 does quite well in the Folding@home benchmarks.

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20

Wow! Than it's possible that the GTX980's performance improvement over the GTX780Ti will be in the 25-45% range.

Well if I read the graph correct then its 6.2 and the 780Ti is 11.

Double precision performance is lower but SP which is most common for folding is higher.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37948 - Posted: 20 Sep 2014 | 15:13:33 UTC

OK guys, how's got the first one up and running? I'd like to pull the trigger on a GTX970, but would prefer to know beforehand that it works as good, or better than expected, over here.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37949 - Posted: 20 Sep 2014 | 16:12:42 UTC - in response to Message 37948.
Last modified: 20 Sep 2014 | 16:16:51 UTC

Well I am waiting on the 20nm Maxwell. However I will buy a GTX980 as soon as there is one from EVGA with 1 radial fan, to replace my GTX770. As soon as I have it installed I will let you all know.

Aha you changed the name of the thread ETA?
____________
Greetings from TJ

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 460
Credit: 841,024,447
RAC: 1,557,949
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37951 - Posted: 20 Sep 2014 | 20:00:37 UTC

Hopefully they develop an XP Driver too for these.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37952 - Posted: 20 Sep 2014 | 21:18:39 UTC - in response to Message 37949.

Aha you changed the name of the thread ETA?

Yes :)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37953 - Posted: 20 Sep 2014 | 22:26:32 UTC - in response to Message 37951.
Last modified: 20 Sep 2014 | 22:27:40 UTC

Hopefully they develop an XP Driver too for these.

The 344.11 driver is released for Windows XP (and for x64 too), and there are GTX 980 and 970 in it (I've checked the nv_dispi.inf file).
However if you search for drivers on the NVidia homepage, it won't display any results for WinXP / GTX 980.

ext2097
Send message
Joined: 3 Jul 14
Posts: 5
Credit: 5,618,275
RAC: 0
Level
Ser
Scientific publications
watwat
Message 37954 - Posted: 21 Sep 2014 | 4:55:25 UTC

GTX970 with driver 344.11
http://www.gpugrid.net/result.php?resultid=13109131

Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)
</message>
<stderr_txt>
# GPU [GeForce GTX 970] Platform [Windows] Rev [3301M] VERSION [60]
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r343_98 : 34411
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 520

</stderr_txt>
]]>

I also tried with latest beta driver 344.16, but had same error.

Is that a problem with my computer, or is GTX970 not supported yet?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37956 - Posted: 21 Sep 2014 | 7:40:50 UTC - in response to Message 37954.

GTX970 with driver 344.11
http://www.gpugrid.net/result.php?resultid=13109131

Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -59 (0xffffffc5)
</message>
<stderr_txt>
# GPU [GeForce GTX 970] Platform [Windows] Rev [3301M] VERSION [60]
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r343_98 : 34411
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 520

</stderr_txt>
]]>

I also tried with latest beta driver 344.16, but had same error.

Is that a problem with my computer, or is GTX970 not supported yet?

I think that the problem is the Compute Capability 5.2 is not supported yet.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37958 - Posted: 21 Sep 2014 | 9:42:33 UTC - in response to Message 37956.

I've sent Matt a PM. Hopefully this is easy to fix!

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37962 - Posted: 21 Sep 2014 | 14:28:49 UTC

ext2097, did you also try other projects like Einstein@Home?

MrS
____________
Scanning for our furry friends since Jan 2002

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37963 - Posted: 21 Sep 2014 | 15:01:01 UTC - in response to Message 37962.

It's a new compute capability. CC 5.2

ext2097
Send message
Joined: 3 Jul 14
Posts: 5
Credit: 5,618,275
RAC: 0
Level
Ser
Scientific publications
watwat
Message 37965 - Posted: 21 Sep 2014 | 16:04:02 UTC - in response to Message 37962.

SETI@home - http://setiathome.berkeley.edu/results.php?hostid=7376773
SETI@home Beta - http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=72657
Einstein@Home - http://einstein.phys.uwm.edu/results.php?hostid=11669559

GTX970 working those projects seems ok.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37966 - Posted: 21 Sep 2014 | 18:34:51 UTC

It's a bit off topic, but it's a quite interesting Maxwell advertisement:
GAME24: Debunking Lunar Landing Conspiracies with Maxwell and VXGI

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37969 - Posted: 21 Sep 2014 | 20:40:35 UTC - in response to Message 37966.

It's a bit off topic, but it's a quite interesting Maxwell advertisement:
GAME24: Debunking Lunar Landing Conspiracies with Maxwell and VXGI


Cool.

However, the folks at the flat earth society are not impressed with Nvidia's effort. :)

http://forum.tfes.org/index.php?topic=1914.0


While we are waiting for GPUgrid GTX980/970 numbers, F@H performance looks encouraging.

http://forums.evga.com/Someone-needs-to-post-980-andor-970-folding-numbers-here-when-they-get-one-m2218148.aspx

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37972 - Posted: 22 Sep 2014 | 9:36:48 UTC - in response to Message 37963.

Ok gang,

Looks like we'll need an app update - let's see if I can push one out later today. Who's got the cards, do I need to do Linux or Windows first?


Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37974 - Posted: 22 Sep 2014 | 9:49:35 UTC

..or not. The current CUDA release seems not to support that architecture yet.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37975 - Posted: 22 Sep 2014 | 10:37:10 UTC - in response to Message 37974.
Last modified: 22 Sep 2014 | 10:37:31 UTC

Have you tried this yet?? [url] https://developer.nvidia.com/cuda-downloads-geforce-gtx9xx Driver 343.98 is included offering support for C.C 5.2 cards (GTX980/970) [/url]

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37976 - Posted: 22 Sep 2014 | 10:51:04 UTC - in response to Message 37975.

Yeah, looked straight through that.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37978 - Posted: 22 Sep 2014 | 12:07:33 UTC - in response to Message 37969.

It's a bit off topic, but it's a quite interesting Maxwell advertisement:
GAME24: Debunking Lunar Landing Conspiracies with Maxwell and VXGI


Cool.

However, the folks at the flat earth society are not impressed with Nvidia's effort. :)

http://forum.tfes.org/index.php?topic=1914.0


While we are waiting for GPUgrid GTX980/970 numbers, F@H performance looks encouraging.

http://forums.evga.com/Someone-needs-to-post-980-andor-970-folding-numbers-here-when-they-get-one-m2218148.aspx


NVidia showing off VX Global illuminati... err illumination power. The Metro series Redux games utilize this tech with Kepler and above generations. (Even X-box/PS4 ports)

GM204 technical advances compared to Kelper is rather striking. Fermi 480/580 compared to 680: GPU-GRID performance jump won't be nearly the GTX980 jump compared to GTX 680. Filtering and sorting performance for images, or to create programs with atoms and DNA/RNA strands- are higher than Kelper. Maxwell also has more internal memory bandwidth, and Third generation Color Compression enhancements offers more ROP performance, with better Cache latency.

A single GM204 CUDA core is 25-40% faster compared to a single CUDA core Gk104, from the added atomic memory enhancements, and more registers for a thread. For Gaming: a GTX 980 is equal to [2]GTX 680.

In every area GP(GPU) performance jump from GK104 to GM204 is significant. GK110 only now offers higher TMU performance, but without new type of filtering offer by GM204, unless Nvidia offers this for the Kelper Generation. GK110 has higher Double precision. A full fat Maxwell (?20nm/16nm FinFET ?/250W/3072~CUDA cores)that offers 1/3 SP/DP core ratio like GK110 is going to be a card for the ages. (Will first be a Tesla, Quadro, Titan variant?) Maybe AMD will come out with board shortly to challenge GM204 with a similar power consumption. Continuous performance competition should raise standards for each company. These next few years are key.

GM204 replaces the GK104 stack, not GK110 [GTX 780](ti)disabled 64DP core SMX stack.(Driver allows 8 be active in SMX) GM204 per nm transistor density is only few percent more than GK104 (3.54B transistors/294mm2) and (7.1B transistors/551mm2) for GK110.

Kepler's GK110 (4500-6000GFLOPS Desktop card are near [Single]20GFOLPS/W, while GK104 (2500-3500GFLOPS) is 15-18GFLOPS/W depending on clock speeds and voltage. Maxwell GM107(1400~FLOPS) is 20-23GFLOPS/W. Maxwell GM204(5000~FLOPS) is 30-35GFLOPS/W, depending on voltage and GDDR5/CORE speeds. GK104 highest rated mobile (GTX880m/2900~FLOPS) card is 27-29GFLOPS/W.

GM204 compute time/ power usage ratios with a new app will be world class compared to more power hungry cards. Crunchers whose States/Countries higher taxes rates for power bills- a GTX970/980 is top notch choice.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 37979 - Posted: 22 Sep 2014 | 12:10:56 UTC - in response to Message 37976.
Last modified: 22 Sep 2014 | 12:22:43 UTC

Yeah, looked straight through that.


6.5.19 for 343.98 driver. Are 344 drivers the same? Updated Documents for PTX, programming guide, many others are included with 343.98/6.5 CUDA SDK.
Before updating to 6.5.19 driver , I completed GPU-GRID tasks was CUDA 6.5.12

Linux, also has a new CUDA 6.5 driver also for download.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37981 - Posted: 22 Sep 2014 | 13:35:19 UTC - in response to Message 37979.
Last modified: 22 Sep 2014 | 14:26:11 UTC

Right. New app version cuda65 for acemdlong. Windows only, needs driver version 344.

Matt

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37983 - Posted: 22 Sep 2014 | 15:04:49 UTC

I ordered a GTX980 from Newegg. Zotac and gigabyte were my only options so I went with gigabyte. All other manufacturers cards were "out of stock".

tomba
Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37984 - Posted: 22 Sep 2014 | 15:30:25 UTC - in response to Message 35238.
Last modified: 22 Sep 2014 | 16:28:10 UTC

Time to replace my trusty GTX 460, which has been GPUGrid-ing for years! At ~£100 the GTX 750Ti fits my budget nicely but I need some guidance.

1. Is the power feed only from the mobo enough for 24/7 or should I go for one with a 6-pin connection?
2. One fan or two?
3. WHICH 750Ti do you recommend?

I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds.

This quote is from January. I hope the problem has been fixed!

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 37985 - Posted: 22 Sep 2014 | 15:35:16 UTC - in response to Message 37984.

The 750tis are great and work just fine.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37986 - Posted: 22 Sep 2014 | 16:08:13 UTC - in response to Message 37983.

I ordered a GTX980 from Newegg. Zotac and gigabyte were my only options so I went with gigabyte. All other manufacturers cards were "out of stock".

Good luck with your card biodoc.

In the Netherlands only Asus and MSI, but I will wait for EVGA. They are not out of stock, but just in production. A few weeks more is no problem. Moreover I first want to see some results.
____________
Greetings from TJ

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,140,895,172
RAC: 19,729
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37989 - Posted: 22 Sep 2014 | 19:15:34 UTC

http://www.techpowerup.com/

On this webpage is massive summary reviews for 970/980 graphics cards.

But this is my insider tip-)
http://www.zotac.com/products/graphics-cards/geforce-900-series/gtx-980/product/gtx-980/detail/geforce-gtx-980-amp-extreme-edition/sort/starttime/order/DESC/amount/10/section/specifications.html

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37990 - Posted: 22 Sep 2014 | 19:52:31 UTC
Last modified: 22 Sep 2014 | 19:53:48 UTC

Does anybody already have a working GTX 980 or 970?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37991 - Posted: 22 Sep 2014 | 20:13:24 UTC

ext2097, would you mind giving GPU-Grid another try?

And regarding those Einstein@Home results: did you run 1 WUs at a time, which is the default setting?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37992 - Posted: 22 Sep 2014 | 20:43:29 UTC
Last modified: 22 Sep 2014 | 20:50:45 UTC

I had a CUDA6.5 task on one of my GTX680s, but it's failed after 6 sec with the following error:

# The simulation has become unstable. Terminating to avoid lock-up (1)
40x35-NOELIA_5bisrun2-2-4-RND5486_0
Does anybody had a successful CUDA6.5 task on any older card?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37993 - Posted: 22 Sep 2014 | 20:49:32 UTC - in response to Message 37992.

And there's another failed CUDA6.5 task on my GTX780Ti:
I4R6-SDOERR_BARNA5-32-100-RND1539_0
It has failed after 2 sec.

# Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37994 - Posted: 22 Sep 2014 | 21:11:41 UTC - in response to Message 37993.

There were two more CUDA6.5 workunits on my GTX780Ti, both of them failed the same way:

# Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2)

I16R23-SDOERR_BARNA5-32-100-RND7031_0
I12R83-SDOERR_BARNA5-32-100-RND2687_0
Now this host received a CUDA6.0 task, so I don't want to try again, but I think that the CUDA6.5 app has a bug.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37995 - Posted: 22 Sep 2014 | 22:46:37 UTC - in response to Message 37986.

I ordered a GTX980 from Newegg. Zotac and gigabyte were my only options so I went with gigabyte. All other manufacturers cards were "out of stock".

Good luck with your card biodoc.

In the Netherlands only Asus and MSI, but I will wait for EVGA. They are not out of stock, but just in production. A few weeks more is no problem. Moreover I first want to see some results.


Yes, my preference would have been EVGA or PNY (lifetime warranty but fixed core voltage). This will be my first Gigabyte card so I hope it works out.

The F@H numbers sold me. I think Nvidia GPU performance on F@H generally translates to GPUGrid. Besides, I usually spend the month of December folding, so the GTX980 will be a nice companion to my 780Ti.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37996 - Posted: 22 Sep 2014 | 23:07:12 UTC - in response to Message 37994.

I have more failed CUDA6.5 tasks on my GTX680:
19x56-NOELIA_5bisrun2-2-4-RND7637_2
I14R24-SDOERR_BARNA5-25-100-RND2569_0
20mgx12-NOELIA_20MG2-9-50-RND2493_0
I12R11-SDOERR_BARNA5-24-100-RND0763_0
I2R35-SDOERR_BARNA5-31-100-RND8916_0

# The simulation has become unstable. Terminating to avoid lock-up (1)

Actually all CUDA6.5 tasks are failing on my GTX680 (OC) and GTX780Ti (Non-OC)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37997 - Posted: 22 Sep 2014 | 23:20:25 UTC - in response to Message 37996.
Last modified: 22 Sep 2014 | 23:33:22 UTC

I haven't found any successfully finished CUDA6.5 tasks (obviously these are too fresh).
But there are five which failed on other's hosts the same way they failed on mine:
19x56-NOELIA_5bisrun2-2-4-RND7637_1 (GTX780Ti OC)
I7R106-SDOERR_BARNA5-32-100-RND7602_1 (GTX TITAN non-OC)
19x64-NOELIA_5bisrun2-3-4-RND8765_0 (GTX TITAN non-OC)

# Simulation unstable. Flag 11 value 1 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2)

I2R31-SDOERR_BARNA5-30-100-RND8191_0 (GTX770 OC)
I11R57-SDOERR_BARNA5-32-100-RND3266_0 (GTX770 non-OC)
# The simulation has become unstable. Terminating to avoid lock-up (1)

Matt
Avatar
Send message
Joined: 11 Jan 13
Posts: 216
Credit: 846,538,252
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 37998 - Posted: 23 Sep 2014 | 3:22:02 UTC
Last modified: 23 Sep 2014 | 3:38:55 UTC

I just had one of these errors on a 780Ti as well.

20mgx76-NOELIA_20MG2-6-50-RND7695_0

Edit: Now three Cuda65 in a row have failed.

20mgx76-NOELIA_20MG2-6-50-RND7695_0

I8R16-SDOERR_BARNA5-29-100-RND7841_2

Jozef J
Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,140,895,172
RAC: 19,729
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 37999 - Posted: 23 Sep 2014 | 4:41:12 UTC

I7R110-SDOERR_BARNA5-32-100-RND5097_1 10082727 181608 22 Sep 2014 | 17:59:10 UTC 23 Sep 2014 | 0:04:53 UTC Error while computing 6.07 2.70 --- Long runs (8-12 hours on fastest card) v8.41 (cuda65)
I11R16-SDOERR_BARNA5-29-100-RND4924_1 10097574 179755 22 Sep 2014 | 16:41:21 UTC 23 Sep 2014 | 0:45:03 UTC Error while computing 10.75 3.53 --- Long runs (8-12 hours on fastest card) v8.41 (cuda65)
I12R49-SDOERR_BARNA5-30-100-RND2140_0 10097949 181608 22 Sep 2014 | 17:59:10 UTC 23 Sep 2014 | 0:04:53 UTC Error while computing 13.20 2.86
19x48-NOELIA_5bisrun2-3-4-RND9893_0 10097397 176407 22 Sep 2014 | 11:03:22 UTC 22 Sep 2014 | 11:08:37 UTC Error while computing 2.05 0.13
I6R67-SDOERR_BARNA5-31-100-RND9535_1 10089657 176407 22 Sep 2014 | 6:36:59 UTC 22 Sep 2014 | 11:08:37 UTC Error while computing 171.94 27.72 --- Long runs (8-12 hours on fastest card) v8.41 (cuda60)
43x63-NOELIA_5bisrun2-3-4-RND0357_0 10096302 176407 22 Sep 2014 | 1:39:43 UTC 22 Sep 2014 | 11:03:22 UTC Error while computing 14,762.32 3,420.89

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38000 - Posted: 23 Sep 2014 | 7:11:20 UTC - in response to Message 37999.

Jozef, there's a CUDA 6.0 task among these. Maybe you need to reboot the host after those CUDA 6.5 failures? Or is it running normally again?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38001 - Posted: 23 Sep 2014 | 7:21:03 UTC

CUDA 65s should only have been going to the new Maxwells, sorry about that.
You shouldn't see any new ones going to older cards - please should if you do, it'll be a scheduler bug.

Matt

ext2097
Send message
Joined: 3 Jul 14
Posts: 5
Credit: 5,618,275
RAC: 0
Level
Ser
Scientific publications
watwat
Message 38002 - Posted: 23 Sep 2014 | 8:00:24 UTC - in response to Message 37991.

I had two CUDA65 tasks, but same "FATAL: cannot find image for module [.nonbonded.cu.] for device version 520" error.
I3R87-SDOERR_BARNA5-32-100-RND2755_1
I16R41-SDOERR_BARNA5-29-100-RND8564_0

Two CUDA60 task had "ERROR: file mdioload.cpp line 81: Unable to read bincoordfile".
I957-SANTI_p53final-15-21-RND0451_5
I864-SANTI_p53final-17-21-RND8261_5

And I'm running only 1 WU at a time except setiathome_v7 tasks.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38003 - Posted: 23 Sep 2014 | 8:11:13 UTC - in response to Message 38001.

Right. The new 65 app is failing for non-obvious reasons, so I've moved it to the acemdbeta queue. If you have a GTX9x0, please get some work from that queue.

ext2097
Send message
Joined: 3 Jul 14
Posts: 5
Credit: 5,618,275
RAC: 0
Level
Ser
Scientific publications
watwat
Message 38004 - Posted: 23 Sep 2014 | 9:46:33 UTC - in response to Message 38003.

I have GTX970 and checked at "Run test applications?" and "ACEMD beta", but BOINC says "No tasks are available for ACEMD beta version".
Is there something I need to do in order to get beta tasks?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38005 - Posted: 23 Sep 2014 | 9:52:15 UTC - in response to Message 38004.

I have GTX970 and checked at "Run test applications?" and "ACEMD beta", but BOINC says "No tasks are available for ACEMD beta version".
Is there something I need to do in order to get beta tasks?

On the GPUGRID preference page, about in the middle is a option called: Run test applications? You have to set it to yes as well.
But perhaps you did then sorry about this post.
____________
Greetings from TJ

ext2097
Send message
Joined: 3 Jul 14
Posts: 5
Credit: 5,618,275
RAC: 0
Level
Ser
Scientific publications
watwat
Message 38006 - Posted: 23 Sep 2014 | 11:10:14 UTC - in response to Message 38005.

Use NVIDIA GPU : yes
Run test applications? yes
ACEMD beta: yes
Use Graphics Processing Unit (GPU) if available : yes

No beta tasks downloaded yet.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38016 - Posted: 23 Sep 2014 | 17:07:13 UTC

Should I open that ESD bag? :)

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38017 - Posted: 23 Sep 2014 | 17:32:34 UTC - in response to Message 38016.

These are two GK104/GK110 time from you're hosts to compared with new GM204.

20mgx58-NOELIA_20MG2-6-50-RND1410_0
Time per step (avg over 5000000 steps): 5.600 ms
Approximate elapsed time for entire WU: 27997.766 s
Device clock : 1201MHz
Memory clock : 3004MHz
Memory width : 256bit
Driver version : r343_98 : 34411
GeForce GTX 680 Capability 3.0





20mgx81-NOELIA_20MG2-1-50-RND0462_1 Ti
time per step (avg over 5000000 steps): 3.306 ms
Approximate elapsed time for entire WU: 16531.969 s
Device clock : 928MHz
Memory clock : 3500MHz
Driver version : r340_00 : 34043
GeForce GTX 780 Ti Capability 3.5

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38018 - Posted: 23 Sep 2014 | 17:45:12 UTC

The good news is that I've successfully installed the GTX980 under Windows XP x64.
The bad news is that I could not get beta work for it.

23/09/2014 19:38:23 | GPUGRID | Requesting new tasks for NVIDIA 23/09/2014 19:38:25 | GPUGRID | Scheduler request completed: got 0 new tasks 23/09/2014 19:38:25 | GPUGRID | No tasks sent 23/09/2014 19:38:25 | GPUGRID | No tasks are available for ACEMD beta version 23/09/2014 19:38:25 | GPUGRID | No tasks are available for the applications you have selected. 23/09/2014 19:41:53 | GPUGRID | update requested by user 23/09/2014 19:41:55 | GPUGRID | Sending scheduler request: Requested by user. 23/09/2014 19:41:55 | GPUGRID | Requesting new tasks for NVIDIA 23/09/2014 19:41:57 | GPUGRID | Scheduler request completed: got 0 new tasks

Before you ask: I did all the necessary settings.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38019 - Posted: 23 Sep 2014 | 18:03:15 UTC

Seems like Matt has to fill the beta queue, or already got enough failed results from the batch he submitted :p

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38022 - Posted: 23 Sep 2014 | 18:08:57 UTC - in response to Message 38019.
Last modified: 23 Sep 2014 | 18:09:32 UTC

Seems like Matt has to fill the beta queue, or already got enough failed results from the batch he submitted :p

MrS

According to the server status page, there are 100 unsent beta workunits, and the application page shows that there is only the v8.42 CUDA6.5 beta app. Somehow these didn't bound together. I think this could be another scheduler issue.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38024 - Posted: 23 Sep 2014 | 18:40:22 UTC - in response to Message 38016.

Nice card you have there Zoltan. Would love to see the results as soon as Matt has got it working.
I hope you will have good luck with this card right from the start.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38026 - Posted: 23 Sep 2014 | 18:46:26 UTC - in response to Message 38024.

Nice card you have there Zoltan. Would love to see the results as soon as Matt has got it working.
I hope you will have good luck with this card right from the start.

I second that! BTW: what are you currently running on the card? Any results from other projects to share? :)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38027 - Posted: 23 Sep 2014 | 18:49:27 UTC - in response to Message 38024.
Last modified: 23 Sep 2014 | 18:56:12 UTC

Thank you TJ & ETA!
It's already crunching for Einstein@home.
As this card is a standard NVidia design, there's a good chance it won't have such problems as my Gigabyte GTX780Ti OC....

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38028 - Posted: 23 Sep 2014 | 19:01:27 UTC - in response to Message 38027.

Thank you TJ & ETA!
It's already crunching for Einstein@home.
As this card is a standard NVidia design, there's a good chance it won't have such problems as my Gigabyte GTX780Ti OC....


Any comment on you're phenomenal card's wattage usage for tasks, or temps?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38030 - Posted: 23 Sep 2014 | 19:22:05 UTC - in response to Message 38028.
Last modified: 23 Sep 2014 | 19:35:39 UTC

Any comment on you're phenomenal card's wattage usage for tasks, or temps?

It's awesome! :)
The Einstein@home app is CUDA3.2 - ancient in terms of GPU computing, as this version is released for the GTX 2xx series - so the data you've asked for is almost irrelevant, but here it is:
Ambient temperature: 24.8°C
Task: p2030.20140610.G63.60-00.95.S.b6s0g0.00000_3648_1 Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-cuda32-nv301)
GPU temperature: 53°C
GPU usage: 91-92% (muhahaha)
GPU wattage: 90W (the difference between the idle GPU and the GPU in use, but the CPU is consuming a little to keep the GPU busy)
GPU clock: 1240MHz
GPU voltage: 1.218V
GPU power 55%

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38031 - Posted: 23 Sep 2014 | 19:55:51 UTC - in response to Message 38030.

Any comment on you're phenomenal card's wattage usage for tasks, or temps?

It's awesome! :)
The Einstein@home app is CUDA3.2 - ancient in terms of GPU computing, as this version is released for the GTX 2xx series - so the data you've asked for is almost irrelevant, but here it is:
Ambient temperature: 26°C
Task: p2030.20140610.G63.60-00.95.S.b6s0g0.00000_3648_1 Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-cuda32-nv301)
GPU temperature: 53°C
GPU usage: 91-92% (muhahaha)
GPU wattage: 90W (the difference between the idle GPU and the GPU in use, but the CPU is consuming a little to keep the GPU busy)


Integer task? 90 watts (91-92%) for 1024 cores at 3.2 CUDA API shows Maxwell(2) GM204 internal core structure enhancements. Other components will be under less "stress" from energy usage drop.
Percentage of taxes risen is off. Any efficiency updates help.

Running 24/7 for weeks/months/years at time-- 250TDP card or 175TDP? 50W-105W TDP GM204 wattage change compared to 225W/250W GK110? 145TDP for 1664 core GTX970. GTX980 TDP 30 watts away from a 6/8 core Haswell-E @140watts. (A few 6/8 core E5 Haswell Xeons are 85W) Having multiple cards- energy savings add up. Higher MB/PSU efficiency, included.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38032 - Posted: 23 Sep 2014 | 20:07:04 UTC

I think I know why we don't receive beta tasks.
The acemd.841-65.exe file is 3.969.024 bytes long, but the acemd.842-65.exe is only 1.112.576 bytes long, so something went wrong with the latter.

Profile tito
Send message
Joined: 21 May 09
Posts: 22
Credit: 1,916,690,043
RAC: 5,534,947
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 38036 - Posted: 24 Sep 2014 | 5:26:47 UTC

Zoltan - how many WU are You crunching at once at Einstein?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38039 - Posted: 24 Sep 2014 | 8:48:23 UTC - in response to Message 38036.

Zoltan - how many WU are You crunching at once at Einstein?

Only one.
Now I've changed my settings to run two simultaneously, but the power consumption haven't changed, only the GPU usage risen to 97%.

Profile tito
Send message
Joined: 21 May 09
Posts: 22
Credit: 1,916,690,043
RAC: 5,534,947
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 38040 - Posted: 24 Sep 2014 | 8:56:34 UTC

May I quote all this data at Einstein forum?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38041 - Posted: 24 Sep 2014 | 9:40:05 UTC - in response to Message 38040.

May I quote all this data at Einstein forum?

Sure.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38042 - Posted: 24 Sep 2014 | 9:43:32 UTC - in response to Message 38030.

Any comment on you're phenomenal card's wattage usage for tasks, or temps?

It's awesome! :)
The Einstein@home app is CUDA3.2 - ancient in terms of GPU computing, as this version is released for the GTX 2xx series - so the data you've asked for is almost irrelevant, but here it is:
Ambient temperature: 24.8°C
Task: p2030.20140610.G63.60-00.95.S.b6s0g0.00000_3648_1 Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-cuda32-nv301)
GPU temperature: 53°C
GPU usage: 91-92% (muhahaha)
GPU wattage: 90W (the difference between the idle GPU and the GPU in use, but the CPU is consuming a little to keep the GPU busy)
GPU clock: 1240MHz
GPU voltage: 1.218V
GPU power 55%


The Einstein numbers look great. Congrats on the new card Zoltan!

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38045 - Posted: 24 Sep 2014 | 9:58:01 UTC
Last modified: 24 Sep 2014 | 10:53:32 UTC

What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,376,466,723
RAC: 19,051,824
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38048 - Posted: 24 Sep 2014 | 13:43:24 UTC - in response to Message 38045.

What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS.

Could somebody running a Maxwell-aware version of BOINC check and report this, please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42

We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38049 - Posted: 24 Sep 2014 | 13:58:43 UTC

No idea why the scheduler wasn't giving out the 842 beta app. Look out for 843 now.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38050 - Posted: 24 Sep 2014 | 13:59:39 UTC - in response to Message 38032.


The acemd.841-65.exe file is 3.969.024 bytes long, but the acemd.842-65.exe is only 1.112.576 bytes long, so something went wrong with the latter.


no, that's deliberate. It's a Maxwell-only build

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38051 - Posted: 24 Sep 2014 | 14:28:56 UTC - in response to Message 38050.

There's now a linux build on acemdbeta. You'll definitely be needing to use a Linux client that reports the right driver version.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38052 - Posted: 24 Sep 2014 | 16:16:18 UTC - in response to Message 38049.

No idea why the scheduler wasn't giving out the 842 beta app. Look out for 843 now.

I still could no get beta work.
24/09/2014 18:16:35 | GPUGRID | update requested by user 24/09/2014 18:16:38 | GPUGRID | Sending scheduler request: Requested by user. 24/09/2014 18:16:38 | GPUGRID | Requesting new tasks for NVIDIA 24/09/2014 18:16:41 | GPUGRID | Scheduler request completed: got 0 new tasks 24/09/2014 18:16:41 | GPUGRID | No tasks sent 24/09/2014 18:16:41 | GPUGRID | No tasks are available for ACEMD beta version 24/09/2014 18:16:41 | GPUGRID | No tasks are available for the applications you have selected.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38054 - Posted: 24 Sep 2014 | 17:24:00 UTC
Last modified: 24 Sep 2014 | 17:27:43 UTC

I gave Folding@home a try, and the power consumption risen by 130W when I started folding on the GPU (GTX980) only. When I started folding on the CPU also, the power consumption went up by 68W (Core i7-870@3.2GHz, 7 threads).
GPU usage: 90-95%
GPU power 64-66%
GPU temperature: 56°C
GPU voltage: 1.218V
GPU core clock: 1240MHz

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38055 - Posted: 24 Sep 2014 | 18:49:25 UTC
Last modified: 24 Sep 2014 | 18:52:57 UTC

Thanks Zoltan! Those numbers are really encouraging and show GM204 power consumption to be approximately where we expected them to be. This is in stark contrast to the ~250 W THG has measured under "some GP-GPU load". Maybe it was FurMark? With these results we can rest assured that the cards won't draw more than their power target to run GPU-Grid.

And while we're at it: what about memory controller load? It should be comparably high at Einstein and will limit unbalanced cards badly. For reference:

GT640: 99% at Einstein, ~60% at GPU-Grid
GTX660Ti: ~60% at Einstein, ~40% at GPU-Grid

Edit: concerning the Einstein tasks. About 1740 s for Arecibo tasks is running 1 WU at a time (RAC -> 50k), whereas 2740 s was achieved running 2 of them concurrently (RAC -> 63k)?

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38056 - Posted: 24 Sep 2014 | 18:52:59 UTC - in response to Message 38048.

What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS.

Could somebody running a Maxwell-aware version of BOINC check and report this, please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42

We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA.


GPU info in sched_request/(projects)file/ or slot init_data file. Also, client_state provides working size?

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 762,770,636
RAC: 85,619
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38057 - Posted: 24 Sep 2014 | 19:49:16 UTC - in response to Message 37984.

Time to replace my trusty GTX 460, which has been GPUGrid-ing for years! At ~£100 the GTX 750Ti fits my budget nicely but I need some guidance.

1. Is the power feed only from the mobo enough for 24/7 or should I go for one with a 6-pin connection?
2. One fan or two?
3. WHICH 750Ti do you recommend?

I installed a pair of 750Ti cards in my computer yesterday and tried to run GPUGRID. No go, instant fail within a couple seconds.

This quote is from January. I hope the problem has been fixed!


Something you didn't mention: If possible, get one that blows the hot air out of the case rather than blowing it around within the case. That should reduce the temperature for both the graphics board and the CPU, and therefore make both of them last longer.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 762,770,636
RAC: 85,619
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38058 - Posted: 24 Sep 2014 | 19:56:15 UTC

Something I'm having trouble finding: How well do the new cards using PCIE3 work if the motherboard has only PCIE2 sockets?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,376,466,723
RAC: 19,051,824
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38059 - Posted: 24 Sep 2014 | 20:45:42 UTC - in response to Message 38058.

Something I'm having trouble finding: How well do the new cards using PCIE3 work if the motherboard has only PCIE2 sockets?

Physically, the sockets are the same. Electrically, they're compatible.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38060 - Posted: 24 Sep 2014 | 22:18:25 UTC - in response to Message 38058.

Something I'm having trouble finding: How well do the new cards using PCIE3 work if the motherboard has only PCIE2 sockets?

We'll find out when there will be a working GPUGrid app, as I will move my GTX 980 to another host wich has PCIe3.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38061 - Posted: 24 Sep 2014 | 22:55:59 UTC - in response to Message 38055.

And while we're at it: what about memory controller load?

Folding@home: 23%
Einstein@home 2 tasks: 62-69% (Perseus arm survey/BRP5 & Arecibo, GPU/BRP4G)
Einstein@home 1 task : 46-48% (Perseus arm survey/BRP5)
Einstein@home 1 task : 58-64% (Arecibo, GPU/BRP4G)

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38062 - Posted: 25 Sep 2014 | 0:21:49 UTC

I've got my GTX980 running on linux.

I'm also unable to get any beta work on GPUGrid.


Profile @tonymmorley
Send message
Joined: 10 Mar 14
Posts: 24
Credit: 1,215,128,812
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 38066 - Posted: 25 Sep 2014 | 7:54:38 UTC

Just got two GTX 980's, will install tomorrow. Should be interesting to see how we go!

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38067 - Posted: 25 Sep 2014 | 8:47:13 UTC - in response to Message 38048.

What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS.

Could somebody running a Maxwell-aware version of BOINC check and report this, please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42

We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA.


Here's what boinc 7.4.22 (64bit-linux version) is reporting:

Starting BOINC client version 7.4.22 for x86_64-pc-linux-gnu
CUDA: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, CUDA version 6.5, compute capability 5.2, 4096MB, 3557MB available, 4979 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, device version OpenCL 1.1 CUDA, 4096MB, 3557MB available, 4979 GFLOPS peak)

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38068 - Posted: 25 Sep 2014 | 10:17:13 UTC - in response to Message 38067.

What does Boinc say, about amount of (peak) FLOPS in event log for GTX980? Near 5TeraFLOPS? Over at Mersenne trial-factoring--- a GTX980 is listed @ 1,126GHz and 4,710 GFLOPS.

Could somebody running a Maxwell-aware version of BOINC check and report this, please, and do a sanity-check of whether BOINC's figure is correct from what you know of the card's SM count, cores per SM, shader clock, flops_per_clock etc. etc? We got the figures for the 'baby Maxwell' 750/Ti into BOINC on 24 February (3edb124ab4b16492d58ce5a6f6e40c2244c97ed6), but I think that was just too late to catch v7.2.42

We're in a similar position this time, with v7.4.22 at release-candidate stage - I'd say that one was safe to test with, if nobody here has upgraded yet. TIA.


Here's what boinc 7.4.22 (64bit-linux version) is reporting:

Starting BOINC client version 7.4.22 for x86_64-pc-linux-gnu
CUDA: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, CUDA version 6.5, compute capability 5.2, 4096MB, 3557MB available, 4979 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, device version OpenCL 1.1 CUDA, 4096MB, 3557MB available, 4979 GFLOPS peak)


OpenCl 1.1 ! Spec from 2010 (Fermi)
2.0 OpenCL spec been released for almost a year. This is Nvidia telling Intel and AMD, they don't give a hoot about OpenCL, because of CUDA.

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38069 - Posted: 25 Sep 2014 | 10:43:59 UTC

............. Any more thoughts on when we might see a revised app for the 980 - mine looks very nice, but I'd like to put it to work!!

Thanks,
P.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38070 - Posted: 25 Sep 2014 | 12:21:07 UTC - in response to Message 38066.
Last modified: 25 Sep 2014 | 12:21:27 UTC

Just got two GTX 980's, will install tomorrow. Should be interesting to see how we go!

We're waiting for a working app, so prepare a spare project for awhile.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38071 - Posted: 25 Sep 2014 | 13:49:27 UTC - in response to Message 38051.

There's now a linux build on acemdbeta. You'll definitely be needing to use a Linux client that reports the right driver version.


I've got the latest boinc client for linux but am still getting no tasks for my GTX 980.

Thu 25 Sep 2014 07:29:53 AM EDT | | Starting BOINC client version 7.4.22 for x86_64-pc-linux-gnu
Thu 25 Sep 2014 07:29:53 AM EDT | | log flags: file_xfer, sched_ops, task
Thu 25 Sep 2014 07:29:53 AM EDT | | Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Thu 25 Sep 2014 07:29:53 AM EDT | | Data directory: /home/mark/BOINC
Thu 25 Sep 2014 07:29:53 AM EDT | | CUDA: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, CUDA version 6.5, compute capability 5.2, 4096MB, 3566MB available, 4979 GFLOPS peak)
Thu 25 Sep 2014 07:29:53 AM EDT | | OpenCL: NVIDIA GPU 0: GeForce GTX 980 (driver version 343.22, device version OpenCL 1.1 CUDA, 4096MB, 3566MB available, 4979 GFLOPS peak)
Thu 25 Sep 2014 09:48:27 AM EDT | GPUGRID | Sending scheduler request: Requested by user.
Thu 25 Sep 2014 09:48:27 AM EDT | GPUGRID | Requesting new tasks for NVIDIA GPU
Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | Scheduler request completed: got 0 new tasks
Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | No tasks sent
Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | No tasks are available for ACEMD beta version
Thu 25 Sep 2014 09:48:29 AM EDT | GPUGRID | No tasks are available for the applications you have selected.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38073 - Posted: 25 Sep 2014 | 18:43:47 UTC
Last modified: 25 Sep 2014 | 18:48:29 UTC

Regarding the power consumption of the new BigMaxwell:
I had a different GPU workunit from folding@home (project 7621) on my GTX980, and it had different readouts:
GPU usage: 99-100% (this WU had a much lower CPU thread utilization ~1%)
GPU power 87-88% (~150W increase measured at the wall outlet)
GPU temperature: 63°C (ambient: 24°C)
GPU memory controller load: 26%
GPU memory used: 441MB
GPU voltage: 1.218V
GPU core clock: 1240MHz

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38074 - Posted: 25 Sep 2014 | 19:45:29 UTC - in response to Message 38073.

Regarding the power consumption of the new BigMaxwell:
I had a different GPU workunit from folding@home (project 7621) on my GTX980, and it had different readouts:
GPU usage: 99-100% (this WU had a much lower CPU thread utilization ~1%)
GPU power 87-88% (~150W increase measured at the wall outlet)
GPU temperature: 63°C (ambient: 24°C)
GPU memory controller load: 26%
GPU memory used: 441MB
GPU voltage: 1.218V
GPU core clock: 1240MHz


Project 7621 uses the GPU "core 15" (Fahcore:0x15) version which is the oldest GPU client and runs exclusively on Windows machines. Those Wus generally run hot and use very little CPU as you've noticed. They are fixed credit WUs so PPD is low.

Core 17 WUs are more efficient since they use a more recent version of openMM and are distributed to both windows and linux via an OpenCL app. They generally use 100% of a core on machines with an Nvidia card. These WUs offer a quick return bonus (QRB) and are very popular because the faster the card the higher the bonus.

My GTX980 has finished several 9201 project (core17) WUs and is averaging 330,000 ppd. Amazing.

Linux users have an advantage in that only core 17 WUs are delivered to linux machines.

There are core 18 WUs now available to windows users. I don't know anything about them yet.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38075 - Posted: 26 Sep 2014 | 0:18:05 UTC - in response to Message 38050.

The acemd.841-65.exe file is 3.969.024 bytes long, but the acemd.842-65.exe is only 1.112.576 bytes long, so something went wrong with the latter.

no, that's deliberate. It's a Maxwell-only build

I've made my BOINC manager to start this acemd.842-65.exe as an acemd.841-60.exe by overwriting the latter and setting <dont_check_file_sizes> in the cc_config.xml, and I've modified the client_state.xml to copy the cudart32_65.dll and the cufft32_65.dll to the slot with the app, but I've got the same result as before with the 841-65 client.
#SWAN: FATAL: cannot find image for module [.nonbonded.cu.] for device version 520

http://www.gpugrid.net/result.php?resultid=13130843
http://www.gpugrid.net/result.php?resultid=13132835
http://www.gpugrid.net/result.php?resultid=13135543

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38079 - Posted: 26 Sep 2014 | 8:46:35 UTC

I have data comparing my 780Ti with the 980 at Folding@home.

hardware:
780Ti (only gpu in system) in 2600K, PCIE2 slot, 64-bit linux mint 17 LTS, nvidia driver 343.22
980 (only gpu in system) in 3930K, PCIE2 slot, 64-bit linux mint 17 LTS, nvidia driver 343.22

Project 9201 (core 17)
780Ti@1106MHz (+100MHz OC): TPF=117 seconds, PPD=255,750
980@1352MHz (+100MHz OC): TPF=93 seconds, PPD=360,510

Looks like a 20.5% reduction in TPF. Seems to correlate with difference in clock speed (22%)?

It's not a perfect comparison but it's the best I can do.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38080 - Posted: 26 Sep 2014 | 9:46:19 UTC - in response to Message 38075.

Well, I've not fixed the scheduler, but would you like to try that trick again with the new version 844?

Matt

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38081 - Posted: 26 Sep 2014 | 10:39:07 UTC - in response to Message 38080.

Well, I've not fixed the scheduler, but would you like to try that trick again with the new version 844?

Matt

At once, sire. :)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38082 - Posted: 26 Sep 2014 | 10:48:33 UTC - in response to Message 38080.
Last modified: 26 Sep 2014 | 10:56:35 UTC

Well, I've not fixed the scheduler, but would you like to try that trick again with the new version 844?

Matt

...aaaaand we have a lift-off!
It's crunching.
# GPU [GeForce GTX 980] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:03:00.0 # Device clock : 1215MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU 0 : 41C # GPU 0 : 43C # GPU 0 : 44C # GPU 0 : 46C # GPU 0 : 47C # GPU 0 : 49C # GPU 0 : 50C # GPU 0 : 52C # GPU 0 : 53C # GPU 0 : 54C # GPU 0 : 55C # GPU 0 : 56C # GPU 0 : 57C # GPU 0 : 58C # GPU 0 : 59C # GPU 0 : 60C # GPU 0 : 61C # GPU 0 : 62C # GPU 0 : 63C


709-NOELIA_20MGWT-1-5-RND4766_0
GPU usage: 93-97% (CPU 100%, PCIe2.0x16)
GPU power 93% (~160W increase measured at the wall outlet)
GPU temperature: 64°C (ambient: 24°C)
GPU memory controller load: 50%
GPU memory used: 825MB
GPU voltage: 1.218V
GPU core clock: 1240MHz

I estimate it will take 19.200 sec to finish this workunit (5h20m), which is more than it takes on a GTX780Ti (16.712), so I really should move this card to another host with PCIe3.0.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38083 - Posted: 26 Sep 2014 | 11:34:32 UTC

Good news that the app is working but disappointing performance.

Time to move the windows and (linux?) app to "non-beta"?

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38084 - Posted: 26 Sep 2014 | 12:09:02 UTC - in response to Message 38083.
Last modified: 26 Sep 2014 | 12:17:33 UTC

Good news that the app is working but disappointing performance.

Time to move the windows and (linux?) app to "non-beta"?


Disappointing compared to GK110? Or GK104 boards? GTX980 (64DP cores/4DPperSMM/1DPper32coreblock) is replacement for GTX680 (64DP/8DPperSMX), NOT 96DPcore GTX780 or 120DPcore GTX780ti. Titan(Black)250TDP have 896/960 DP cores (64DPperSMX)
Compared to GTX680, I'd say GTX980 is an excellent performer, other than Double Float.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38085 - Posted: 26 Sep 2014 | 12:24:04 UTC - in response to Message 38084.

Good news that the app is working but disappointing performance.

Time to move the windows and (linux?) app to "non-beta"?


Disappointing compared to GK110? Or GK104 boards? GTX980 (64DP cores/4DPperSMM/1DPper32coreblock) is replacement for GTX680 (64DP/8DPperSMX), NOT 96DPcore GTX780 or 120DPcore GTX780ti. Titan(Black)250TDP have 896/960 DP cores (64DPperSMX)
Compared to GTX680, I'd say GTX980 is an excellent performer, other than Double Float.


I believe the GPUGrid app uses SP floating point calculations.

F@H also uses SP.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38086 - Posted: 26 Sep 2014 | 12:59:37 UTC - in response to Message 38085.
Last modified: 26 Sep 2014 | 13:02:20 UTC

Good news that the app is working but disappointing performance.

Time to move the windows and (linux?) app to "non-beta"?

Disappointing compared to GK110? Or GK104 boards? GTX980 (64DP cores/4DPperSMM/1DPper32coreblock) is replacement for GTX680 (64DP/8DPperSMX), NOT 96DPcore GTX780 or 120DPcore GTX780ti. Titan(Black)250TDP have 896/960 DP cores (64DPperSMX)
Compared to GTX680, I'd say GTX980 is an excellent performer, other than Double Float.

I believe the GPUGrid app uses SP floating point calculations.

F@H also uses SP.

You're right about GPUGrid.
I'll swap my GTX670 and GTX980 and we'll see how's its performance in a PCIe3.0x16 slot.
I expect that a GTX980 should be faster than a GTX780Ti at least by 10%.
Maybe it won't be faster in the beginning, but in time the GPUGrid app could be refined for Maxwells. Besides different workunit batches will gain different performance (it could be even a loss of performance).

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38087 - Posted: 26 Sep 2014 | 13:09:14 UTC - in response to Message 38086.
Last modified: 26 Sep 2014 | 13:10:13 UTC

Good news that the app is working but disappointing performance.

Time to move the windows and (linux?) app to "non-beta"?

Disappointing compared to GK110? Or GK104 boards? GTX980 (64DP cores/4DPperSMM/1DPper32coreblock) is replacement for GTX680 (64DP/8DPperSMX), NOT 96DPcore GTX780 or 120DPcore GTX780ti. Titan(Black)250TDP have 896/960 DP cores (64DPperSMX)
Compared to GTX680, I'd say GTX980 is an excellent performer, other than Double Float.

I believe the GPUGrid app uses SP floating point calculations.

F@H also uses SP.

You're right about GPUGrid.
I'll swap my GTX670 and GTX980 and we'll see how's its performance in a PCIe3.0x16 slot.
I expect that a GTX980 should be faster than a GTX780Ti at least by 10%.
Maybe it won't be faster in the beginning, but in time the GPUGrid app could be refined for Maxwells. Besides different workunit batches will gain different performance (it could be even a loss of performance).


What do you think difference between PCIe2x16/PCIe3x16 is for GPUGRID, and similar programs? Also, do have idea how many of those "scalar" GM204 cores are cooking? Earlier in this thread-- You estimated 1920-2880 cores are being utilized for "superscalar" GK110.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38088 - Posted: 26 Sep 2014 | 14:12:00 UTC - in response to Message 38082.

Could you crop me the Performance information the *0_0 output file, please?

Matt

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,736,673,079
RAC: 572,603
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38090 - Posted: 26 Sep 2014 | 15:30:31 UTC - in response to Message 38085.

biodoc, I send you a off topic PM in this very moment.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38091 - Posted: 26 Sep 2014 | 16:23:14 UTC - in response to Message 38088.

Could you crop me the Performance information the *0_0 output file, please?

Matt

It's already finished, and uploaded.
I'll swap my cards when I get home.
709-NOELIA_20MGWT-1-5-RND4766_0 18,458.00 sec

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38094 - Posted: 26 Sep 2014 | 21:40:24 UTC - in response to Message 38088.

Could you crop me the Performance information the *0_0 output file, please?

Matt

I've successfully swapped my GTX670 ans GTX980 and hacked this client, so now I have another workunit in progress.

The workunit is 13.103% completed at 40 minutes, the estimated total computing time is 18.316 sec (5h5m)
A similar workunit took 16.616 sec (4h37m) to finish on my GTX780Ti (@1098MHz)

CPU: Core i7-4770K @4.3GHz, 8GB DDR3 1866MHz
GPU usage: 98% (CPU thread 100%, PCIe3.0x16)
GPU Temperature: 62°C
GPU Memory Controller load: 52%
GPU Memory usage: 804MB
GPU Voltage: 1.218V
GPU Power: 95% (Haven't measured at the wall outlet)
GPU Core Clock: 1240MHz

# Simulation rate 83.10 (ave) 83.10 (inst) ns/day. Estimated completion Sat Sep 27 11:06:30 2014 # Simulation rate 88.80 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 10:19:41 2014 # Simulation rate 91.00 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 10:03:10 2014 # Simulation rate 92.05 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:55:35 2014 # Simulation rate 92.69 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:51:02 2014 # Simulation rate 93.18 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:47:33 2014 # Simulation rate 93.49 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:45:27 2014 # Simulation rate 93.76 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:43:32 2014 # Simulation rate 93.94 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:42:21 2014 # Simulation rate 94.03 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:41:41 2014 # Simulation rate 94.19 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:40:38 2014 # Simulation rate 94.28 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:39:59 2014 # Simulation rate 94.39 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:39:13 2014 # Simulation rate 94.46 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:38:46 2014 # Simulation rate 94.49 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:38:33 2014 # Simulation rate 94.57 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:38:02 2014 # Simulation rate 94.64 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:37:34 2014 # Simulation rate 94.68 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:18 2014 # Simulation rate 94.73 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014 # Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:43 2014 # Simulation rate 94.79 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:32 2014 # Simulation rate 94.83 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:15 2014 # Simulation rate 94.86 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:06 2014 # Simulation rate 94.88 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:35:58 2014 # Simulation rate 94.88 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:35:57 2014 # Simulation rate 94.85 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:09 2014 # Simulation rate 94.82 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:20 2014 # Simulation rate 94.84 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:12 2014 # Simulation rate 94.84 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:11 2014 # Simulation rate 94.83 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:15 2014 # Simulation rate 94.81 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:25 2014 # Simulation rate 94.67 (ave) 90.65 (inst) ns/day. Estimated completion Sat Sep 27 09:37:20 2014 # Simulation rate 94.57 (ave) 91.40 (inst) ns/day. Estimated completion Sat Sep 27 09:38:01 2014 # Simulation rate 94.51 (ave) 92.55 (inst) ns/day. Estimated completion Sat Sep 27 09:38:26 2014 # Simulation rate 94.52 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:38:21 2014 # Simulation rate 94.54 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:38:12 2014 # Simulation rate 94.56 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:38:03 2014 # Simulation rate 94.59 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:55 2014 # Simulation rate 94.60 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:47 2014 # Simulation rate 94.61 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:44 2014 # Simulation rate 94.63 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:37 2014 # Simulation rate 94.65 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:30 2014 # Simulation rate 94.66 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:24 2014 # Simulation rate 94.68 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:18 2014 # Simulation rate 94.68 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:15 2014 # Simulation rate 94.71 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014 # Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014 # Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014 # Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:54 2014 # Simulation rate 94.74 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:52 2014 # Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014 # Simulation rate 94.75 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:49 2014 # Simulation rate 94.74 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:54 2014 # Simulation rate 94.72 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014 # Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014 # Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:49 2014 # Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014 # Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:43 2014 # Simulation rate 94.74 (ave) 93.33 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014 # Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014 # Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:51 2014 # Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:47 2014 # Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:46 2014 # Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014 # Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:44 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:40 2014 # Simulation rate 94.77 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:42 2014 # Simulation rate 94.76 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:46 2014 # Simulation rate 94.74 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:36:52 2014 # Simulation rate 94.72 (ave) 93.33 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014 # Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:58 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:58 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014 # Simulation rate 94.72 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014 # Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014 # Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014 # Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:58 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014 # Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.72 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014 # Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014 # Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014 # Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014 # Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014 # Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.71 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014 # Simulation rate 94.71 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014 # Simulation rate 94.71 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.70 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:37:08 2014 # Simulation rate 94.70 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014 # Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014 # Simulation rate 94.70 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014 # Simulation rate 94.70 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:08 2014 # Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014 # Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.72 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014 # Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:01 2014 # Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:03 2014 # Simulation rate 94.71 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014 # Simulation rate 94.70 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014 # Simulation rate 94.70 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:07 2014 # Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:05 2014 # Simulation rate 94.71 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:06 2014 # Simulation rate 94.71 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:04 2014 # Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014 # Simulation rate 94.72 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:37:02 2014 # Simulation rate 94.72 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014 # Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:37:00 2014 # Simulation rate 94.72 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:59 2014 # Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:57 2014 # Simulation rate 94.73 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014 # Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:54 2014 # Simulation rate 94.74 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014 # Simulation rate 94.73 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:56 2014 # Simulation rate 94.73 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:55 2014 # Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014 # Simulation rate 94.74 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:53 2014 # Simulation rate 94.74 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:51 2014 # Simulation rate 94.74 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:52 2014 # Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:50 2014 # Simulation rate 94.75 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014 # Simulation rate 94.75 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014 # Simulation rate 94.75 (ave) 94.52 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014 # Simulation rate 94.75 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:48 2014 # Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:46 2014 # Simulation rate 94.76 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014 # Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:44 2014 # Simulation rate 94.76 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:44 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:42 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:41 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:39 2014 # Simulation rate 94.76 (ave) 92.55 (inst) ns/day. Estimated completion Sat Sep 27 09:36:45 2014 # Simulation rate 94.76 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:43 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:41 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:40 2014 # Simulation rate 94.77 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:38 2014 # Simulation rate 94.78 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:38 2014 # Simulation rate 94.78 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:37 2014 # Simulation rate 94.78 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:35 2014 # Simulation rate 94.79 (ave) 95.34 (inst) ns/day. Estimated completion Sat Sep 27 09:36:34 2014 # Simulation rate 94.79 (ave) 95.75 (inst) ns/day. Estimated completion Sat Sep 27 09:36:31 2014 # Simulation rate 94.79 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:31 2014 # Simulation rate 94.79 (ave) 94.12 (inst) ns/day. Estimated completion Sat Sep 27 09:36:33 2014 # Simulation rate 94.78 (ave) 93.72 (inst) ns/day. Estimated completion Sat Sep 27 09:36:35 2014 # Simulation rate 94.77 (ave) 93.33 (inst) ns/day. Estimated completion Sat Sep 27 09:36:39 2014 # Simulation rate 94.77 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:39 2014 # Simulation rate 94.78 (ave) 94.93 (inst) ns/day. Estimated completion Sat Sep 27 09:36:38 2014

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38097 - Posted: 27 Sep 2014 | 9:10:11 UTC

My GTX980 is crunching fine, a little slower than a GTX780Ti, while consuming much less power. So probably the GPUGrid client can use more than 1920 CUDA cores of the GTX780Ti (or it can't use all CUDA cores in Maxwell).
Here is the task list of this host.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38098 - Posted: 27 Sep 2014 | 11:24:11 UTC
Last modified: 27 Sep 2014 | 11:27:32 UTC

Great work, Zoltan!

For somparison my GTX660Ti with "Eco-tuning" running a NOELIA_20MGWT, which yields the same credits as the WUs you used:
GPU usage: 93% (CPU 1-2% of an 8-threaded i7 3770K, PCIe3.0x16)
GPU power 100% of its 110 W limit (-> ~121W at the wall outlet, increase over idle ~105 W)
GPU temperature: 64°C (ambient: 22°C)
GPU memory controller load: 39%
GPU memory used: 978MB
GPU voltage: 1.05V
GPU core clock: 1084MHz

Runtime will be ~39000s, as usual. Taking a Win 8.1 tax of ~7% for my system into account you achieve just about double the performance. The cards power consumption is 110 W vs. 165*0.93=153.5 W, i.e. your card consumes only about 40% more! (not taking PSU efficiencies into account here)

I'd be interested in how your numbers change, if you eco-tune your card to ~1.1 V by reducing the power target. If you don't want to run such tests don't worry, I'll probalby measure this myself soon with a GTX970 ;)

biodoc wrote:
Good news that the app is working but disappointing performance.

I would say it's only disappointing if your expectations were set really high. So far GM204 is not performing miracles here, but it's performing solidly at almost the performance level of GK110 for far less power used.

biodoc wrote:
I believe the GPUGrid app uses SP floating point calculations.

Correct.

eXaPower wrote:
Also, do have idea how many of those "scalar" GM204 cores are cooking? Earlier in this thread-- You estimated 1920-2880 cores are being utilized for "superscalar" GK110.

It was always hard for GPU-Grid to use the superscalar shaders, which amounts to 1/3 of all shaders in "all but the high-end Fermis" and all Keplers. That's where this number comes from. Maxwell has no such restrictions, hence all shaders can be used in principle. This says nothing about other potential bottlenecks, however: PCIe bus, memory bandwidth, CPU support etc. Translating these limitations into statments along the lines of "can only use xxxx shaders" would be misleading.

Edit: BTW, what's the memory controller load for GTX780Ti running such tasks?

MrS
____________
Scanning for our furry friends since Jan 2002

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38099 - Posted: 27 Sep 2014 | 12:15:46 UTC

There are more potential variables in Zoltan's tests so far:

Cuda 6.5 vs 6.0
Zoltan's 780Ti cards: Are they reference cards or overclocked?

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38100 - Posted: 27 Sep 2014 | 12:46:34 UTC
Last modified: 27 Sep 2014 | 12:47:02 UTC

[url]http://www.anandtech.com/show/8568/the-geforce-gtx-970-review-feat-evga/13 [/url]

Very interesting comments about GTX970 GPC partition(s), requiring further investigation.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1626
Credit: 9,376,466,723
RAC: 19,051,824
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38101 - Posted: 27 Sep 2014 | 12:55:44 UTC - in response to Message 38100.

http://www.anandtech.com/show/8568/the-geforce-gtx-970-review-feat-evga/13

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38102 - Posted: 27 Sep 2014 | 12:57:04 UTC

@Biodoc: valid points. Regarding the clockspeed Zoltan said his GTX780Ti was running at 1098MHz, so it's got a "typical" overclock. And the new app claims to be CUDA 6.5. However, I don't think Matt changed the actual crunching code for this release, so any differences would come from changes in built-in functions. During the last few CUDA releases we haven't seen any large changes of GPU-Grid performance, so I don't expect it this time either. Anyway, for the best comparison both cards should run the new version.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38104 - Posted: 27 Sep 2014 | 13:23:13 UTC - in response to Message 38102.

@Biodoc: valid points. Regarding the clockspeed Zoltan said his GTX780Ti was running at 1098MHz, so it's got a "typical" overclock. And the new app claims to be CUDA 6.5. However, I don't think Matt changed the actual crunching code for this release, so any differences would come from changes in built-in functions. During the last few CUDA releases we haven't seen any large changes of GPU-Grid performance, so I don't expect it this time either. Anyway, for the best comparison both cards should run the new version.

MrS


Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature.
Quoted from newest CUDA programming guide-- "new managed memory space in which all processors see a single coherent memory image with a common address space. A processor refers to any independent execution unit with a dedicated MMU. This includes both CPUs and GPUs of any type and architecture. "

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38106 - Posted: 27 Sep 2014 | 13:32:39 UTC

I posted some power consumption data for my GTX980 (+/- overclock) at the F@H forum.

Also, there's some early numbers for a GTX970 in the same thread.

https://foldingforum.org/viewtopic.php?f=38&t=26757&p=269043#p269043

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38108 - Posted: 27 Sep 2014 | 17:21:01 UTC - in response to Message 38104.

Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature.

Dynamic parallelism: no. It would break compatibility with older cards or require two separate code paths. Besides, GPU-Grid doesn't have much of a problem occupying all shader multiprocessors (SM, SMX etc.).

Unified memory: this is only meant to ease programming for new applications, at the cost of some performance. For any existing code with optimized manual memory management (e.g. GPU-Grid) this would actually be a drawback.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38109 - Posted: 27 Sep 2014 | 18:13:10 UTC - in response to Message 38108.
Last modified: 27 Sep 2014 | 18:19:46 UTC

Has dynamic parallelism (C.C 3.5/5.0/5.2) been introduced to ACEMD? Or Unified Memory from CUDA 6.0? Unified memory is a C.C 3.0+ feature.

Dynamic parallelism: no. It would break compatibility with older cards or require two separate code paths. Besides, GPU-Grid doesn't have much of a problem occupying all shader multiprocessors (SM, SMX etc.).

Unified memory: this is only meant to ease programming for new applications, at the cost of some performance. For any existing code with optimized manual memory management (e.g. GPU-Grid) this would actually be a drawback.

MrS


In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38110 - Posted: 27 Sep 2014 | 20:20:48 UTC - in response to Message 38109.

That is a good question. One which I can unfortunately not answer. I'm just a forum mod and long-term user, not a GPU-Grid developer :)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38111 - Posted: 27 Sep 2014 | 20:40:32 UTC - in response to Message 38109.


In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions.



We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need.

I expect the GM204 performance will be marked improved once I have my hands on one.

Matt

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38112 - Posted: 27 Sep 2014 | 20:58:20 UTC - in response to Message 38111.

I expect the GM204 performance will be marked improved once I have my hands on one.

Matt

I can give you remote access to my GTX980 host, if you want to.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38113 - Posted: 27 Sep 2014 | 21:08:28 UTC - in response to Message 38109.
Last modified: 27 Sep 2014 | 21:10:26 UTC

In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions.

There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2.
I think this huge change doesn't come very often.
I think the GM204 can run older code more efficiently than the Fermi or the Kepler based GPUs, that's why other projects benefit more than GPUGrid, as this project had this at the transition for CUDA3.1 to CUDA4.2.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38115 - Posted: 27 Sep 2014 | 21:43:11 UTC - in response to Message 38111.


In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions.



We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need.

I expect the GM204 performance will be marked improved once I have my hands on one.

Matt


I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative.

Profile @tonymmorley
Send message
Joined: 10 Mar 14
Posts: 24
Credit: 1,215,128,812
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 38118 - Posted: 28 Sep 2014 | 1:40:01 UTC

Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38121 - Posted: 28 Sep 2014 | 11:03:35 UTC - in response to Message 38113.
Last modified: 28 Sep 2014 | 11:11:26 UTC

You don't see these jumps often. A 32core block with an individual warp scheduler, rather Kelper Flat design (sharing all cores with warp scheduler) ) is contributing to better core management, as is Maxwell redesigned crossbar, dispatch, issue.
Even so, GM204 (2048c/1664c) is providing performance levels close to (~1600s) 2880 core GK110, while ~2Hr faster than a GTX780 (2304core). I think GM204, once tuned properly- will excel.

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38122 - Posted: 28 Sep 2014 | 11:50:13 UTC - in response to Message 38118.

http://www.gpugrid.net/forum_thread.php?id=3603&nowrap=true#38075
If this doesn't make sense then I would suggest waiting until the project can update the scheduler, etc. as the details of what Retvari did are a bit twisty.
____________
Thanks - Steve

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38123 - Posted: 28 Sep 2014 | 11:50:33 UTC - in response to Message 38113.


There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2.
I think this huge change doesn't come very often.


That change marked the transition to a new code base. The improvement wasn't down to the change in CUDA version, so much as us introducing developing improved algorithms.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38124 - Posted: 28 Sep 2014 | 11:51:30 UTC - in response to Message 38118.


Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed.


It's not ready just yet...

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38125 - Posted: 28 Sep 2014 | 11:54:08 UTC - in response to Message 38115.


I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative.


I'm doing a bit of work to improve the performance of the code for Maxwell hardware - expect an update before the end of the year.

Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38126 - Posted: 28 Sep 2014 | 11:55:02 UTC - in response to Message 38112.


I can give you remote access to my GTX980 host, if you want to.


Most kind, but I've got some on order already. Just waiting for the slow boat from China to wend its way across the Med.

Matt

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38128 - Posted: 28 Sep 2014 | 11:58:45 UTC - in response to Message 38118.
Last modified: 28 Sep 2014 | 19:03:51 UTC

Not had the time to look into this in great detail but my tuppence worth:

The GTX980 and GTX970 are GM204 (non-super-scalar) but not GM210, so they are really the latest mid range GPU's and very much aimed at the gaming community (1/32 FP32 and 4GB).

These are generational updates to the GK104 models and both the big brother and a revision of the GM107 (GTX750 and GTX750Ti).
As such they should be seen as gaming replacements/upgrades to GPU's such as the GTX670 and even the GTX770.

As usual there is some naming inconsistency; the GTX980 is GM204 while the GTX780 is GK110, so it's not a straight comparison or upgrade there. However, if you go back to a GTX680 the comparison is somewhat more limier (GK104 vs GM204). Note that the GM107 trailblazed Maxwell.

GPU Memory Controller load: 52%

That is very high and I expect its impacting on performance, and I don't think its simply down to Bus Width (256bit) but also down to architectural changes. Was hoping this would not be the case and it's somewhat surprising saying as the GTX900's have 2MB L2 cache.

That said, some of Noelia's WU's are more memory intensive than other WU's, and on my slightly underclocked GTX770 a 147-NOELIA_20MGWT WU's load is presently 30% (which is higher than other WU's).

This suggests the GTX970 is a better choice than the GTX980, certainly when you consider the ~50% price difference (UK) for ~80% performance. That said, I would want to know what the memory controllers utilization is on a GTX970 before concluding that it is definitely a problem and recommending the 970 over the 980 (which will still do more work despite the constraints). For the 790 it might be ~43% which isn't great and suggests another problem (architecture/code) besides the 256bit limitation.

Any readings for other WU's?

In terms of performance these GPU's appear to only be on par with the high-ish end GTX700's, and performance is basically in line with the number of Cuda Cores. Again suggesting that there is some potential for app improvement.

It's possible that if the apps are recompiled with new CUDA Development Tools the new drivers will inherently offer improvements for the GTX900 series, but given that these are GM204 I'm not expecting miracles.

The big question was always going to be, What's the performance per Watt like for here?
Apparently, when gaming a GTX970 uses up to 30W less than a GTX770, and significantly outperforms it (on reviewed games) but the TDP's are 145W and 230W. So a GTX970 might use ~63% of a GTX770's power and at first glance appears to outperform it by ~10%. Thus I'm expecting the performance/Watt to be about 1.75 times that of a GTX770 (ball park). So from that point of view it's a winner, and maybe app tweaks can increase that further.

PS. My GTX770's Memory Controller load is only 22% for a trphisx3-NOELIA_SH2 WU, so I'm guessing the same type of WU would have a 38% load on a GTX980.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38136 - Posted: 28 Sep 2014 | 20:54:54 UTC

Trying to fix the scheduler now - if you have a 980, please sub to the acemdbeta app, accept beta work, and try again. It won't work, but I'm logging the problems now.

Matt

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38138 - Posted: 28 Sep 2014 | 22:06:44 UTC - in response to Message 38136.
Last modified: 28 Sep 2014 | 22:16:04 UTC

Did as you requested and it's now crunching what looks to be a test WU: MJHARVEY_TEST

EDIT: The scheduler worked!

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38140 - Posted: 29 Sep 2014 | 0:28:07 UTC

My GTX980 has finished 2 of the beta WUs successfully.

http://www.gpugrid.net/results.php?hostid=142719

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38144 - Posted: 29 Sep 2014 | 7:37:44 UTC - in response to Message 38140.
Last modified: 29 Sep 2014 | 7:38:36 UTC

My GTX980 has finished 2 of the beta WUs successfully.

http://www.gpugrid.net/results.php?hostid=142719

The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38147 - Posted: 29 Sep 2014 | 9:37:28 UTC - in response to Message 38144.

My GTX980 has finished 2 of the beta WUs successfully.

http://www.gpugrid.net/results.php?hostid=142719

The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too.


I'm getting "no tasks available" for either the beta or the short run WUs.

9/29/2014 5:36:32 AM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
9/29/2014 5:36:33 AM | GPUGRID | Scheduler request completed: got 0 new tasks
9/29/2014 5:36:33 AM | GPUGRID | No tasks sent
9/29/2014 5:36:33 AM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38149 - Posted: 29 Sep 2014 | 9:58:02 UTC

It looks like Matt has just added a new beta app (version 8.45).

I'll keep my preferences for both beta (test applications) and short runs for now unless he requests just beta.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38159 - Posted: 29 Sep 2014 | 11:17:41 UTC - in response to Message 38149.

Just got a test WU with the new beta app (8.45).

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38160 - Posted: 29 Sep 2014 | 11:43:02 UTC - in response to Message 38144.


The 8.44 CUDA65 application is available for the short queue


Not any more. the CUDA65 error rate is suspiciously high for non GM204 cards.

Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38347 - Posted: 7 Oct 2014 | 14:28:54 UTC

Hello fellow crunchers,

are there any promising results for comparing performance GTX980 to GTX780Ti, or have we to wait for the GTX980Ti (what Jacob is doing too)?

I am still hoping for a "real" Maxwell at 20nm but seems to be not this year anymore.
____________
Greetings from TJ

eXtreme Warhead
Send message
Joined: 19 Nov 12
Posts: 2
Credit: 25,526,400
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 38349 - Posted: 7 Oct 2014 | 17:32:25 UTC

if the results from now are relative the same to older ones from early 2014, the performance with cuda65 doesn't look very well?

i'm only at 23% of a ,long one and time needed: 94min with a 970gtx, so that would be about 409mins for the whole wu...

that would be much much more, than a 660ti would need, because the long ones lasts about 345min...so whats the problem?




Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38351 - Posted: 7 Oct 2014 | 18:39:46 UTC - in response to Message 38349.

if the results from now are relative the same to older ones from early 2014, the performance with cuda65 doesn't look very well?

I'm finding cuda65 to be just a bit faster than cuda60 on all my cards (Win7-64), including the Maxwell 750Ti. Could be the updated NV driver (344.11) or the app...

eXtreme Warhead
Send message
Joined: 19 Nov 12
Posts: 2
Credit: 25,526,400
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 38358 - Posted: 8 Oct 2014 | 4:16:43 UTC
Last modified: 8 Oct 2014 | 4:19:31 UTC

i cannot test the old 660ti with actual wu, but i will test both at the same time and we will see.

http://www.gpugrid.net/result.php?resultid=13178859 last wu finished today at the morning, where the clock of the card is wrong. the card runs everytime @1266mhz because of full @stock and constant boost clock

the last long wu i ran with the 660ti gives only about 70k points, so i think there will be other things atm to run than earlier these year

PS: whats the problem with the short ones btw? http://www.gpugrid.net/result.php?resultid=13146577

a bit more than half of the time of a long one, but only granted less than 20% of the points?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38360 - Posted: 8 Oct 2014 | 6:35:10 UTC - in response to Message 38347.



are there any promising results for comparing performance GTX980 to GTX780Ti,



The 980 is less than 10% faster than a 780ti.

Matt

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38362 - Posted: 8 Oct 2014 | 7:50:52 UTC - in response to Message 38360.



are there any promising results for comparing performance GTX980 to GTX780Ti,



The 980 is less than 10% faster than a 780ti.

Matt

That is not what I found. When checking Retvari Zoltans's results on his GTX980 and his GTX780TI, then the 780Ti is about 2000 seconds faster. Zoltan runs both on XP and he knows how to set up a rig. Proof is that his rigs are among the fattest.
Would like to see some results with Win7.

I'm interested as I am still undecided to by another mighty 780Ti or a new 980, the price difference is not that huge, but the 980 uses less energy. But I think I wait for a 980Ti that is expected before the end of this year according to rumors.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38371 - Posted: 8 Oct 2014 | 20:48:44 UTC
Last modified: 8 Oct 2014 | 20:53:10 UTC

TJ, don't buy a GTX780Ti now! In fact, nobody else should do that for GPU-Grid. Even if you're not paying german electricity prices, the far more efficient Maxwell will save you money soon irregardless of actual performance.

Having said that I'm not sure you should buy a GTX980 either. The performance advantage over the GTX970 does not seem worth it. I haven't had time to follow things recently, but I glanced over some data which showed GTX970 performance at GPU-Grid very close to GTX980. Do you guys have solid numbers for this?

@eXtreme Warhead: the WUs at GPU-Grid can differ considerably in the amount of work they contain, so make sure to only compare WUs which yield the same credits. Even better would be comparisons within the same batch of work, as indicated by the WU names. Anyway, when compared properly the GTX660Ti is about half as fast as a GTX980.

Regarding the short runs: it's normal that they yield less credits per day. This is a bonus being applied to the long runs, because only fairly beefy GPUs can run them, and return them quickly enough for GPU-Grid. And the probability of failure is higher for these longer WUs. Both aspects are being compensated by giving bonus credits for long-runs.

Edit:

TJ wrote:
Matt wrote:
The 980 is less than 10% faster than a 780ti.

That is not what I found. When checking Retvari Zoltans's results on his GTX980 and his GTX780TI, then the 780Ti is about 2000 seconds faster.

Actually these don't contradict: if GTX980 is ~10% slower than GTX780Ti, it is also "less than 10% faster". I obviously see your point, though, that Matts formulation sounds much more euphemistic towards the new card ;)

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38374 - Posted: 8 Oct 2014 | 23:42:40 UTC - in response to Message 38371.
Last modified: 8 Oct 2014 | 23:43:54 UTC

Thanks for the input ETA. I indeed want to see some real data as well.
So far I checked Zoltan's results and the 780Ti is still fastest.

I am not buying anything at the moment, I wait for the real 20nm Maxwell for a new Haswell-E based rig, that can crunch and I can use to work from home with. Will be a little expensive rig but should last for at least five years so put in the best parts.

But I have a nice quad core with a 550Ti and there could be a GTX970 or 980 in but this is an extra buy, so I want to see some numbers first.

By the way I want EVGA GPU's and they are not in stock yet in the Netherlands.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38375 - Posted: 9 Oct 2014 | 11:09:07 UTC - in response to Message 38374.
Last modified: 9 Oct 2014 | 12:08:52 UTC

Thanks for the input ETA. I indeed want to see some real data as well.
So far I checked Zoltan's results and the 780Ti is still fastest.

I am not buying anything at the moment, I wait for the real 20nm Maxwell for a new Haswell-E based rig, that can crunch and I can use to work from home with. Will be a little expensive rig but should last for at least five years so put in the best parts.

But I have a nice quad core with a 550Ti and there could be a GTX970 or 980 in but this is an extra buy, so I want to see some numbers first.

By the way I want EVGA GPU's and they are not in stock yet in the Netherlands.

Some numbers:
Delta of ----------------------GPU---------------------- PC Power ---------------core---------------- ----Memory--- GPU Workunit Consumption usage clock voltage power temp MCL usage 980 NOELIA_5MG 167W 98% 1240MHz 1.218V 98% 66°C 52% 778MB 980 SDOERR_BARNA5 164W 99% 1227MHz 1.193V 92% 66°C 46-52% 946MB 780Ti NOELIA_5MG 249W 98% 993MHz 1.125V 99% 72°C 30-31% 765MB 780Ti SDOERR_BARNA5 247W 99% 993MHz 1.125V 96% 72°C 30-31% 925MB 780Ti NOELIA_5MG 260W 98% 1098MHz 1.125V 102% 74°C 33-34% 765MB 780Ti SDOERR_BARNA5 263W 98% 1098MHz 1.125V 102% 74°C 33% 925MB

CPU: Core i7-4770K @4.3GHz, 66-72°C
RAM: 8GB DDR3 1866MHz 1.5V
PSU: Enermax Modu 87+ 850W
MB: Gigabyte GA-Z87X-OC
Ambient temperature: 24.6°C
I tried to raise the 780Ti's GPU clock and the voltage a little more, but as I raise the GPU clock, the GPU voltage is lowered automatically to stay within the power limit, which is endangers the stability of the calculation.

Conclusion:
The GTX780Ti is faster by 8-10% than the GTX980, but the GTX980 consumes only the 2/3rd of the GTX780Ti.
I don't recommend to buy GTX780Ti, only if it's a very cheap 2nd hand card.
I think it's safe to buy a GTX980 or a GTX970 made on 28nm technology, as the 20nm version isn't around the corner, and I think the first chips made on 20nm wouldn't be much more energy efficient (probably they will be a little cheaper).

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38376 - Posted: 9 Oct 2014 | 14:40:49 UTC - in response to Message 38375.

Thank you for the numbers Zoltan. I will replace the 550Ti by a 970. And for the rest I wait.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38382 - Posted: 9 Oct 2014 | 21:36:44 UTC

Thanks Zoltan, those seem to be very solid numbers!

Personally I want to buy a GTX970, but would like to see a roundup first to be sure to get a sufficiently quiet and powerful cooler. The cards on my list are:

1. Asus Strix: the cooler should be excellent, with zero noise at idle enhancing the resale value. The card is not too large. I also like the single 8-pin power connector, as it means I can get away with fewer power adaptors. It's not in stock, though, and starts at 350€.

2. Gigabyte Gaming: the cooler looks really powerful, but the card is very long, which might limit its resale value. It's not in stock either and starts at 340€.

3. Galax EXOC: I like the cooling solution of my Galax GTX660Ti and this one looks pretty similar. I suspect it could stand the additional power consumption of the GTX970 while still being quiet enough for me. It's not in stock and starts at 330€.

4. A card with stock cooler, onto which I mount my Thermalright Shaman. It should provide cooling & noise equivalent to the best open air coolers. The problem: it seems like it won't fit the short PCBs, and I haven't found any of these cards with a long PCB (not so easy to find out). The Galax is said to fit that cooler, but probably doesn't need it. They start at 305€ and some are in stock.

Currently option 3 is my favorite, unless I can find a card for option 4. I'd also gladly choose option 1, if it was less expensive.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38383 - Posted: 10 Oct 2014 | 1:38:19 UTC - in response to Message 38375.

Some numbers:
Delta of ----------------------GPU---------------------- PC Power ---------------core---------------- ----Memory--- GPU Workunit Consumption usage clock voltage power temp MCL usage 980 NOELIA_5MG 167W 98% 1240MHz 1.218V 98% 66°C 52% 778MB 980 SDOERR_BARNA5 164W 99% 1227MHz 1.193V 92% 66°C 46-52% 946MB 780Ti NOELIA_5MG 249W 98% 993MHz 1.125V 99% 72°C 30-31% 765MB 780Ti SDOERR_BARNA5 247W 99% 993MHz 1.125V 96% 72°C 30-31% 925MB 780Ti NOELIA_5MG 260W 98% 1098MHz 1.125V 102% 74°C 33-34% 765MB 780Ti SDOERR_BARNA5 263W 98% 1098MHz 1.125V 102% 74°C 33% 925MB

CPU: Core i7-4770K @4.3GHz, 66-72°C
RAM: 8GB DDR3 1866MHz 1.5V
PSU: Enermax Modu 87+ 850W
MB: Gigabyte GA-Z87X-OC

Wow, great information. The 980 looks like a winner. Question, are the above power draw figures for the GPU alone or for the system as a whole? If for the system are there any CPU WUs running? Thanks for the info!

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38385 - Posted: 10 Oct 2014 | 7:06:04 UTC

http://www.anandtech.com/show/8223/an-introduction-to-semiconductor-physics-technology-and-industry

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38386 - Posted: 10 Oct 2014 | 8:01:51 UTC - in response to Message 38375.


The GTX780Ti is faster by 8-10% than the GTX980, but the GTX980 consumes only the 2/3rd of the GTX780Ti.


RZ, what is your metric for performance? The best measure of performance is to look in the output of tasks completed by the cuda65 app. You'll see the line:


# PERFORMANCE: 70208 Natoms 27.553 ns/day 0.000 ms/step 0.000 us/step/atom


The "ns/day" figure gives the rate of the simulation - the higher the better. The "Natoms" figure gives the size of the system - the greater the number of atoms, the slower the simulation, in a not-quite linear relationship (cf our performance estimator at http://www.acellera.com/products/molecular-dynamics-software-gpu-acemd/ ).

Matt

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38388 - Posted: 10 Oct 2014 | 9:03:48 UTC

The output of the linux cuda6.5 app doesn't provide performance information.

http://www.gpugrid.net/result.php?resultid=13184193

<stderr_txt>
# SWAN Device 0 :
# Name : GeForce GTX 980
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Time per step (avg over 3750000 steps): 4.600 ms
# Approximate elapsed time for entire WU: 17250.095 s
03:26:48 (21843): called boinc_finish

</stderr_txt>

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38389 - Posted: 10 Oct 2014 | 10:17:15 UTC - in response to Message 38386.

Taken from RZ GTX980 host---Let's see if I have this straight- NOELIA_5MG for a GTX980- # PERFORMANCE: 70205 Natoms 3.624 ns/day 0.000 ms/step 0.000 us/step/atom
Approximate elapsed time for entire WU: 18118.868 s

GT650m-NOELIA_5MG-PERFORMANCE: 70208 Natoms 27.553 ns/day 0.000 ms/step 0.000 us/step/atom
Approximate elapsed time for entire WU: 137765.137 s

[1] GTX980 completes an average of 8 long tasks in the time it takes [1] GT650m to finish a single task. How does a (GT650m/27.553ns/day) in one task worth, have nearly the same ns/day (GTX980/3.624ns/day*8tasks=28.992ns/day) for 8 tasks?

Comparing these cards-- 8/1 task completion ratio leads to similar number of ns/day?

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38390 - Posted: 10 Oct 2014 | 10:35:23 UTC - in response to Message 38389.

Actually, there's a bug there.

The time reported isn't the daily rate, but the iteration time. That's inversely proportional to the daily rate, so use 1000/[value] instead.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38392 - Posted: 10 Oct 2014 | 11:51:58 UTC - in response to Message 38388.

biodoc


The output of the linux cuda6.5 app doesn't provide performance information.


Quite right. Slightly different code-base. I'm not going to rev the applications just to include that. If you are really keen, you can get the rates by looking for the "Simulation rate" lines in the 0_0 output file in the task's slot directory.


Matt

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38393 - Posted: 10 Oct 2014 | 11:54:21 UTC - in response to Message 38385.

http://www.anandtech.com/show/8223/an-introduction-to-semiconductor-physics-technology-and-industry


That's a nice article. If you want to get really technical, dive in to the International Technology Roadmap for Semiconductors reports.

http://www.itrs.net/Links/2013ITRS/Summary2013.htm

MJH

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38394 - Posted: 10 Oct 2014 | 13:02:08 UTC - in response to Message 38386.
Last modified: 10 Oct 2014 | 13:02:48 UTC


The GTX780Ti is faster by 8-10% than the GTX980, but the GTX980 consumes only the 2/3rd of the GTX780Ti.


RZ, what is your metric for performance? The best measure of performance is to look in the output of tasks completed by the cuda65 app. You'll see the line:


# PERFORMANCE: 70208 Natoms 27.553 ns/day 0.000 ms/step 0.000 us/step/atom


The "ns/day" figure gives the rate of the simulation - the higher the better. The "Natoms" figure gives the size of the system - the greater the number of atoms, the slower the simulation, in a not-quite linear relationship (cf our performance estimator at http://www.acellera.com/products/molecular-dynamics-software-gpu-acemd/ ).
Matt


If the higher the better for ns/day then is the GTX780Ti still better. But uses more energy and produces more heat.

Have you already your hands on a 980 Matt, to improve the app?
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38395 - Posted: 10 Oct 2014 | 13:26:50 UTC - in response to Message 38390.
Last modified: 10 Oct 2014 | 13:47:06 UTC

70205 Natoms NOELIA_5MG tasks With 1000/[value]-- GTX980= 275.938ns/day -- GT650m= 36.293ns/day

[1]GT650m ns/day rate renders 13.1% of [1]GTX980. It would take [7.60] GT650m cards to match GM204 ns/day.

70205 Natoms NOELIA_5MG tasks---RZ faster GTX780ti (time per step 3.327ms) rate at 300.571ns/day compared to his GTX980 (time per step 3.624ms) is 275.938ns/day

RZ GTX980ns/day rate is around 91.8% of his faster GTX780ti.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38396 - Posted: 10 Oct 2014 | 13:43:41 UTC - in response to Message 38395.


[1]GT650m ns/day rate renders 13.1% of [1]GTX980. It would take [7.60] GT650m cards to match GM204 ns/day.


Those numbers are only directly comparable if the Natoms for the two tasks are very similar, remember.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38397 - Posted: 10 Oct 2014 | 13:52:31 UTC - in response to Message 38396.


[1]GT650m ns/day rate renders 13.1% of [1]GTX980. It would take [7.60] GT650m cards to match GM204 ns/day.


Those numbers are only directly comparable if the Natoms for the two tasks are very similar, remember.


Yes, all mentioned ns/day were same task type- I edited 70205 Natoms NOELIA_5MG tasks accordingly.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38398 - Posted: 10 Oct 2014 | 14:18:10 UTC - in response to Message 38386.

The GTX780Ti is faster by 8-10% than the GTX980, but the GTX980 consumes only the 2/3rd of the GTX780Ti.
RZ, what is your metric for performance?

My metric for performance is the data could be find under the "performance" tab, which is based on the time it takes to complete a WU from the same batch by different GPUs (hosts).
GTX-980 GTX-780Ti GTX-780TiOC SDOERR_BARNA5 15713~15843 14915~15019 14892~14928 NOELIA_5MG 18026~18165 16601~16713 16826~16924 NOELIA_20MGWT 18085~18099 16849 17034 NOELIA_20MGK36I 16617~16779 16844~17049 NOELIA_20MG2 16674~16831 NOELIA_UNFOLD 16533 15602

As it takes more time for the GTX-980 to complete similar workunits as it takes for the GTX780Ti, I consider the GTX-980 slower (the motherboard, CPU, RAM are similar, actually my host with the GTX 980 has slightly faster CPU and RAM).

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38399 - Posted: 10 Oct 2014 | 14:26:15 UTC - in response to Message 38398.


As it takes more time for the GTX-980 to complete similar workunits as it takes for the GTX780Ti,


Yes, I can see that now looking at individual runs on your two machines. That is rather surprising, my testing in more controlled circumstances shows the opposite. I'll have to looking into that a bit more, and see if it's peculiar to your systems or if it reflects a general trend.

Matt

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38400 - Posted: 10 Oct 2014 | 14:28:31 UTC - in response to Message 38383.

Wow, great information. The 980 looks like a winner. Question, are the above power draw figures for the GPU alone or for the system as a whole?

The heading of that column reads of "Delta of PC power consumption", which is the difference of the whole PC's power consumption between the GPU is crunching and not crunching.

If for the system are there any CPU WUs running? Thanks for the info!

There were 6 SIMAP CPU workunits running on that host, the total power consumption is 321W using the GTX-980.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38401 - Posted: 10 Oct 2014 | 14:38:47 UTC - in response to Message 38399.

Yes, I can see that now looking at individual runs on your two machines. That is rather surprising, my testing in more controlled circumstances shows the opposite.

I'd like to have a pair of those circumstance controllers you use. :)

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38402 - Posted: 10 Oct 2014 | 15:16:00 UTC - in response to Message 38347.

Hello fellow crunchers,

are there any promising results for comparing performance GTX980 to GTX780Ti, or have we to wait for the GTX980Ti (what Jacob is doing too)?

I am still hoping for a "real" Maxwell at 20nm but seems to be not this year anymore.


You and me both. I upgrade based on this usually (CPU and GPU).

Cheers.
____________

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38409 - Posted: 11 Oct 2014 | 11:01:05 UTC

On linux, using the cuda 6.5 app the 980 is a bit slower than the 780Ti. I only have enough data on the long SDOERR_BARNA5 WUs. The students T test gives a very low p value so it appears to be a statistically significant difference.

Time (sec) SDOERR_BARNA5 980 13188827 17379 SDOERR_BARNA5 980 13188792 17677 SDOERR_BARNA5 980 13188118 17309 SDOERR_BARNA5 980 13186657 17318 SDOERR_BARNA5 980 13186376 16934 SDOERR_BARNA5 980 13184193 17253 SDOERR_BARNA5 980 13183699 17361 SDOERR_BARNA5 980 13183697 17209 SDOERR_BARNA5 980 13182886 17455 SDOERR_BARNA5 980 13182196 17201 average 17310 std dev 191 SDOERR_BARNA5 780Ti 13189221 16503 SDOERR_BARNA5 780Ti 13187759 16546 SDOERR_BARNA5 780Ti 13187315 16562 SDOERR_BARNA5 780Ti 13186024 16571 SDOERR_BARNA5 780Ti 13185027 16597 SDOERR_BARNA5 780Ti 13183827 16544 SDOERR_BARNA5 780Ti 13183437 16904 SDOERR_BARNA5 780Ti 13182225 17374 SDOERR_BARNA5 780Ti 13181484 17380 average 16776 std dev 361


P value 0.00095 using student's T test

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38411 - Posted: 11 Oct 2014 | 14:05:16 UTC

Thanks for the information. You're (Linux) GTX980 average time is 97% as fast(16667/17310) compared to you're (Linux) GTX780ti. You're cards performance closer to each other then RZ (windowsXP) who's GTX980 is around 92% compared to his quicker GTX780ti.

NVidia is suspending [4000-6300 S/GLOPS] GK110 (GTX780[ti]) shipments- if afforded- this a good time to pick a GK110 up. After much ACEMD testing- GM204 [3400-5500 S/GLOPS] on par with GK110 performance depending upon OS factors. All GTX780 are priced near (329-379usd) GTX970. GTX780ti prices under 450usd. GK110 may have higher energy consumption- eco-tuning will very easily lower GK110 wattage usage at expense of slightly lower runtimes.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38431 - Posted: 12 Oct 2014 | 16:05:42 UTC - in response to Message 38411.

Thanks for the information. You're (Linux) GTX980 average time is 97% as fast(16667/17310) compared to you're (Linux) GTX780ti. You're cards performance closer to each other then RZ (windowsXP) who's GTX980 is around 92% compared to his quicker GTX780ti.

NVidia is suspending [4000-6300 S/GLOPS] GK110 (GTX780[ti]) shipments- if afforded- this a good time to pick a GK110 up. After much ACEMD testing- GM204 [3400-5500 S/GLOPS] on par with GK110 performance depending upon OS factors. All GTX780 are priced near (329-379usd) GTX970. GTX780ti prices under 450usd. GK110 may have higher energy consumption- eco-tuning will very easily lower GK110 wattage usage at expense of slightly lower runtimes.


eXaPower, if you try to eco-tune the less efficient GK110 down to GM204 power consumption, you either loose the performance advantage or you still consume more power.

Let's start with a GTX780Ti with a mild OC: 1100 MHz @ 1.1 V, 250 W. Some cards go higher, but let's not discuss extreme cases. And significantly higher clocked ones exceed 250 W.

Maximum stable frequency scales approximately linearly with voltage, whereas power consumption scales approximately quadratic with voltage (let's neglect leakage). Hence we could get GK110 down to these operation points:

- 1000 MHz @ 1.0 V -> 187 W, at 91% the performance
- 900 MHz @ 0.9 V -> 137 W, at 81.8% the performance

While these numbers looks far nicer and are indeed more energy efficient than running stock, the first one illustrates my point: less performance and still higher power consumption than GTX980.

Trying the same with GTX980 with a nice OC, starting from 1300 MHz @ 1.2 V, 165 W, I get the following:

- 1192 MHz @ 1.1 V -> 127 W, at 91.7% the performance

... at this point it probably doesn't make sense to eco-tune further, since you just spent 500$/€ on that fast card. Summarizing this, I'm not saying everyone should rush out and buy a GTX980. At least consider the GTX970, but certainly don't buy GTX780/Ti anymore for sustained GP-GPU! Even if you don't pay anything for your electricity it doesn't hurt to run the more energy-efficient setup.

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38434 - Posted: 12 Oct 2014 | 16:22:18 UTC - in response to Message 38409.


Time (sec) SDOERR_BARNA5 980 13188827 17379 SDOERR_BARNA5 980 13188792 17677 SDOERR_BARNA5 980 13188118 17309 SDOERR_BARNA5 980 13186657 17318 SDOERR_BARNA5 980 13186376 16934 SDOERR_BARNA5 980 13184193 17253 SDOERR_BARNA5 980 13183699 17361 SDOERR_BARNA5 980 13183697 17209 SDOERR_BARNA5 980 13182886 17455 SDOERR_BARNA5 980 13182196 17201 average 17310 std dev 191


Some numbers for GTX970 from this linux host
Time (sec) SDOERR_BARNA5 970 13193684 19850 SDOERR_BARNA5 970 13191852 19534 SDOERR_BARNA5 970 13189355 19650 SDOERR_BARNA5 970 13188418 19544 SDOERR_BARNA5 970 13187452 19567 SDOERR_BARNA5 970 13185793 19548 SDOERR_BARNA5 970 13185723 19559 SDOERR_BARNA5 970 13184931 19586 SDOERR_BARNA5 970 13184168 19552 SDOERR_BARNA5 970 13182398 19627 average 19602 std dev low :D

That's 88% of the throughput of Zoltan's GTX980. The reported clock speed of 1250 MHz is relatively high for that host, but Zoltan's card isn't running at stock speeds either. Overall that's pretty strong performance from a card with just 81% the raw horse power per clock!

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38463 - Posted: 13 Oct 2014 | 15:37:28 UTC - in response to Message 38434.


Time (sec) SDOERR_BARNA5 980 13188827 17379 SDOERR_BARNA5 980 13188792 17677 SDOERR_BARNA5 980 13188118 17309 SDOERR_BARNA5 980 13186657 17318 SDOERR_BARNA5 980 13186376 16934 SDOERR_BARNA5 980 13184193 17253 SDOERR_BARNA5 980 13183699 17361 SDOERR_BARNA5 980 13183697 17209 SDOERR_BARNA5 980 13182886 17455 SDOERR_BARNA5 980 13182196 17201 average 17310

Some numbers for GTX970 from this linux host
Time (sec) SDOERR_BARNA5 970 13193684 19850 SDOERR_BARNA5 970 13191852 19534 SDOERR_BARNA5 970 13189355 19650 SDOERR_BARNA5 970 13188418 19544 SDOERR_BARNA5 970 13187452 19567 SDOERR_BARNA5 970 13185793 19548 SDOERR_BARNA5 970 13185723 19559 SDOERR_BARNA5 970 13184931 19586 SDOERR_BARNA5 970 13184168 19552 SDOERR_BARNA5 970 13182398 19627 average 19602

For comparison here's the last number of SDOERR_BARNA5 times form one of my factory OCed PNY cards (no additional OC):

SDOERR_BARNA5 750Ti 43,562.77 SDOERR_BARNA5 750Ti 43,235.24 SDOERR_BARNA5 750Ti 43,313.90 SDOERR_BARNA5 750Ti 43,357.37 SDOERR_BARNA5 750Ti 43,400.42 SDOERR_BARNA5 750Ti 43,392.66 SDOERR_BARNA5 750Ti 43,525.33 Average = 43398

This on Windows 7-64 that if I remember correctly is about 11% slower than Linux on GPUGRID. That would make the above 970 about 2x faster at least on SDOERR_BARNA5 WUs than the 750Ti (19602 x 2 x 1.11 = 43,516.44) if I haven't forgotten some important factor :-)

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 13,920,977,393
RAC: 467,961
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38468 - Posted: 13 Oct 2014 | 19:16:47 UTC - in response to Message 38463.


.....
This on Windows 7-64 that if I remember correctly is about 11% slower than Linux on GPUGRID. That would make the above 970 about 2x faster at least on SDOERR_BARNA5 WUs than the 750Ti (19602 x 2 x 1.11 = 43,516.44) if I haven't forgotten some important factor :-)



So, roughly, a 750Ti produces half of the points of a 970, its TDP is about a half of the 970 and its price is about a half of the price of a 970... is there a winner?

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38469 - Posted: 13 Oct 2014 | 20:08:52 UTC - in response to Message 38468.

This on Windows 7-64 that if I remember correctly is about 11% slower than Linux on GPUGRID. That would make the above 970 about 2x faster at least on SDOERR_BARNA5 WUs than the 750Ti (19602 x 2 x 1.11 = 43,516.44) if I haven't forgotten some important factor :-)

So, roughly, a 750Ti produces half of the points of a 970, its TDP is about a half of the 970 and its price is about a half of the price of a 970... is there a winner?

Personal preference. I personally like running more low power boxes, and gold/platinum power supplies in the 450 watt range are often available on sale. I also like running CPU projects so again more machines equals more CPU power. Just bought 750Ti number 11, the Asus OC Edition which will be my first ASUS GPU in the last few years. With discounts and rebate ended up being $103, hard to beat.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38476 - Posted: 13 Oct 2014 | 21:30:30 UTC - in response to Message 38468.

is there a winner?

Not a clear one. Beyond made some good points for more of the smaller cards. I tend towards the larger ones for the following reasons:

- They will be able to finish long run WUs (which yield the most credits per day here) within the time for maximum bonus for longer. The time per WU is increased slowly over time, as the average computing power increases.

- You can run more of them with less overhead, by which I mean "system needed to run the GPUs in". This improves power efficiency and, if you don't go for extremely dense systems, purchase cost. This argument is actually the exact opposite of what Beyond likes with his many machines for CPU projects.

- Resale value: once a GPU is not energy efficient enough any more to run 24/7 GP-GPU it can still provide a decent gaming experience.Finding interested gamers is easier if the GPU in question is 2-3 times as fast. This might not necessarily get you more money, since you're selling fewer cards, but IMO it makes things easier.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38488 - Posted: 14 Oct 2014 | 10:07:28 UTC

Looking at performance tab- someone has finally equaled RZ GTX780ti host time. Host 168841 [3] GTX980 with same OS as RZ (WinXP) is competing tasks as fast. (RZ GTX780ti been the fastest card for awhile)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38507 - Posted: 14 Oct 2014 | 16:19:14 UTC - in response to Message 38488.
Last modified: 14 Oct 2014 | 16:22:09 UTC

Looking at performance tab- someone has finally equaled RZ GTX780ti host time. Host 168841 [3] GTX980 with same OS as RZ (WinXP) is competing tasks as fast. (RZ GTX780ti been the fastest card for awhile)

That GTX980 is an overclocked one, so its performance/power ratio must be lower than the standard GTX980's. However it's still better than a GTX780Ti.

<core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> # GPU [GeForce GTX 980] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:04:00.0 # Device clock : 1342MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r343_98 : 34411 # GPU 0 : 79C # GPU 1 : 74C # GPU 2 : 78C # GPU 1 : 75C # GPU 1 : 76C # GPU 1 : 77C # GPU 1 : 78C # GPU 1 : 79C # GPU 1 : 80C # GPU 0 : 80C # Time per step (avg over 3750000 steps): 4.088 ms # Approximate elapsed time for entire WU: 15331.500 s # PERFORMANCE: 87466 Natoms 4.088 ns/day 0.000 ms/step 0.000 us/step/atom 00:19:43 (3276): called boinc_finish </stderr_txt> ]]>

1342/1240=1.082258, so this card is overclocked by 8.2% which equal to the performance gap between a GTX780Ti and the GTX980.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38511 - Posted: 14 Oct 2014 | 18:11:18 UTC - in response to Message 38507.

1342/1240=1.082258, so this card is overclocked by 8.2% which equal to the performance gap between a GTX780Ti and the GTX980.

The base clock may not correspond to the real clock, with Maxwell more so than ever before. Still, it's safe to say that this card is significantly overclocked :)

BTW: your GTX780Ti is (factory-)overclocked as well, isn't it?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38515 - Posted: 14 Oct 2014 | 18:42:25 UTC - in response to Message 38511.

BTW: your GTX780Ti is (factory-)overclocked as well, isn't it?

I have two GTX780Ti's: one standard, and one factory overclocked. I had to lower the memory clock of the overclocked one to 3.1GHz...

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38551 - Posted: 16 Oct 2014 | 20:24:28 UTC - in response to Message 38515.
Last modified: 16 Oct 2014 | 21:17:52 UTC

The GTX970 Maxwell is only about 10% more energy efficient than a GTX750Ti Maxwell. Considering efficiency scales well with core count this suggests an issue with the GTX900's.

WRT the GTX980 and the GTX970, for most people the GTX970 is the better choice; it's significantly cheaper than the GTX980 (started out at half the price) and as pointed out comes close to matching performance (initially thought to be 80% but looks more like 88% for here). Why? Both are Memory Controller constricted but more so the 980. The 750Ti does not have such Memory Controller issues. We've seen this Memory Controller Factor before especially with smaller Kepler GPU's.
This obviously suggests better performance would come from OC'ing the GTX900's GDDR5, and it might even be worth while researching which memory chips various cards use before purchasing. It could also hint at what's to come, one way or another...
In the UK the GTX970 has increased in price from ~£250 at release to ~£263 (5% rise) while the GTX980 has fallen in price from ~£500 to £419.99 (19% drop). This mostly reflects the relative gaming value. It wouldn't surprise me if we found that the actual performance/Watt for the GTX970 here was slightly better than the GTX980 (2% or so)...
Anyway, unless you need quad Sli, the GTX980 is too pricey.
Presently in the UK three GTX970's would cost £789, while two GTX980's would cost £840. The three 970's would do 32% more work (assuming they actually perform at 88% of a GTX980 for here) and cost £51 less.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38552 - Posted: 16 Oct 2014 | 21:36:59 UTC - in response to Message 38551.

Also at Einstein the GTX750Ti is slightly more efficient than GM204. Einstein is known to be very memory-bandwidth hungry. Compared to GTX750Ti it looks like this:

GTX970: 2.6 times the shaders, 2.5 times the bandwidth
GTX980: 3.2 times the shaders, 2.5 times the bandwidth

There's also the L2 cache size, which helps avoid memory accesses. It's 2 MB for all of them, with the bigger chips keeping many more "threads" in flight. This devalues the cache size for them compared to the smaller chip.

So far GTX970 seems to be able to make better use of its raw horse power than GTX980. Energy efficiency may be about equal, though, as the TDP of GTX980 is hardly any higher.

Regarding GM204 energy efficiency: German THG published a very good article, scaling GTX970's power target. It's obvious that the sweet spot is between 125 and 150 W, which is not by coincidence close to nVidias default setting. Most custom cards use higher power targets, though.

Especially when we consider that both current GM204 cards may be at least somewhat restricted by memory bandwidth, it may make a lot of sense to lower the power target for high efficiency (as the GPU couldn't make all that good use of higher core boost clocks anyway).

And regarding different memory chips: from what I've seen they seem to all be using Samsung 7 GHz chips. They can take up to 8 GHz (at least in games), sometimes less.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38554 - Posted: 16 Oct 2014 | 23:16:10 UTC
Last modified: 16 Oct 2014 | 23:26:36 UTC

L2 cache KB amount/ total number of core ratio for Maxwell and Kepler (desktop) cards from best to worse

1. GM107- GTX750 (2048KB/512cores) 4-1
2. GM107- GTX750Ti (2048KB/640cores) 3-1
3. GM204- GTX970 (2048KB/1664cores) 1.23- 1
4. GM204- GTX980 (2048KB/2048cores) 1-1
5. GK110- GTX780 (1536KB/2304cores) 0.66-1
5. GK107- GTX650 (256KB/384cores) 0.66-1
6. GK110- Titan (1536KB/2688cores) 0.57-1
7. GK110- GTX780ti (1536KB/2880cores) 0.53-1
8. GK106- GTX650tiB (384KB/768cores) 0.50-1
9. GK104- GTX760 (512KB/1152cores) 0.44-1
10. GK106- GTX660 (384KB/960cores) 0.40-1
11. GK104- GTX670 (512KB/1344cores) 0.38-1
12. GK106- GTX650Ti(256KB/768cores) 0.33-1
12. GK104- GTX680 (512KB/1536cores) 0.33-1

Top end Maxwell and Kelper Mobile

GTX970m (2048KB/1280cores) 1.6-1
GTX980m (2048KB/1356cores) 1.33-1
GTX880m 9512KB/1536cores) 0.33-1

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38562 - Posted: 18 Oct 2014 | 8:11:02 UTC - in response to Message 38554.

I have some more numbers from my hardware

GK104-GTX770 (2048KB/1536shaders) 1.33-1
GK110-GTX780Ti (3072KB/2880shaders) 1.06-1

However the GTX780Ti is a lot faster then the 770 despite its better ratio.
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38565 - Posted: 18 Oct 2014 | 21:10:50 UTC - in response to Message 38554.
Last modified: 18 Oct 2014 | 21:15:12 UTC

L2 cache KB amount/ total number of core ratio for Maxwell and Kepler (desktop) cards from best to worse (added the percent of GTX 980 total cores)

1. GM107- GTX750 (2048KB/512cores) 4-1 [25%]
2. GM107- GTX750Ti (2048KB/640cores) 3-1 [31.2%]
3. GM204- GTX970 (2048KB/1664cores) 1.23- 1 [81.2%]
4. GM204- GTX980 (2048KB/2048cores) 1-1 [100%]
5. GK110- GTX780 (1536KB/2304cores) 0.66-1 [112.2%]
5. GK107- GTX650 (256KB/384cores) 0.66-1 [18.7%]
6. GK110- Titan (1536KB/2688cores) 0.57-1 [131.2%]
7. GK110- GTX780ti (1536KB/2880cores) 0.53-1 [140.6%]
8. GK106- GTX650tiB (384KB/768cores) 0.50-1 [37.5%]
9. GK104- GTX760 (512KB/1152cores) 0.44-1 [56.2%]
10. GK106- GTX660 (384KB/960cores) 0.40-1 [46.8%]
11. GK104- GTX670 (512KB/1344cores) 0.38-1 [65.6%]
12. GK106- GTX650Ti(256KB/768cores) 0.33-1 [37.5%]
12. GK104- GTX680 (512KB/1536cores) 0.33-1 [75%]

Top end Maxwell and Kelper Mobile

GTX970m (2048KB/1280cores) 1.6-1 [62.5%]
GTX980m (2048KB/1536cores) 1.33-1 [75%]
GTX880m (512KB/1536cores) 0.33-1 [75%]


I neglected to mention the best GPUGRID Kelper Power usage/runtime ratio card-- GTX660ti-- 384KB/1344cores=0.28-1

Current cards with best runtime/power usage ratio for GPUGRID. (eco-tune) Amount of core Compared to a GTX 980 (2048cores)
1. GTX970 [81.2%]
2. GTX750ti [31.2%]
3. GTX980 [100%]
4. GTX660ti [65.6%]
5. GTX780 [112.5%]

I think the reason for Kelper memory controller unit is lower (20-40%MCU usage is being reported by GPUGRID users depending upon task type) - the Maxwell GPC (4SMM/512cores/256integer/32TMU/16ROP) is revised- compared to the Kelper GPC differences (3SMX/576cores/96interger/48TMU,16ROPS) When more Maxwell GPC are being processed- the cache set up changed how info transfers.
A GTX 660ti has 3GPC/7SMX/112TMU/24ROP. A GK106/2GPC/ 3GPC/768c/64TMU/24ROP/192bitMemorybus 650ti boost has SMX disabled in different GPC. These two cards have configured GPC that shut SMX off in different GPC rather than the same GPC. Nvidia hasn't yet revealed how disabled SMM in GPC are shut off. (Anand tech wrote about this in their GTX660ti review and GTX650tiboost.) The way an SMX/SMM is disabled in GPC and how this affects GPGPU processing is not fully understood.

Nvidia Current programming Guide is included in newest CUDA toolkit release. Depending upon OS- for windows Located in Program File/Nvidia Corporation/Installer2/CUDAtoolkit6.5[ a bunch of number and letters] files can be read as html or pdf. For custom install- a option is available to install only Samples or just the Documents instead of whole tool kit.

Maxwell C.C5.0 has 64KB- C.C5.2 96KB for Maximum amount of shared memory per multiprocessor while Kelper C.C 3.0/3.5 Max amount is 48KB.

Cache working set per multiprocessor for constant memory is same for C.C 3.0/3/5 at 8KB. Maxwell C.C5.0/5.2 is 10KB.

Cache working set per multiprocessor for texture memory-- C.C3.0 is 12KB. C.C3.5/5.0/5.2 is 12KB-48KB.

If someone could post what AIDA64 is reporting for cache or memory prosperities- maybe we can discern why Maxwell MCU for GM204 is higher than GM107/GK110/GK106/GK104/GK107 GDDR5 128/192/256/384bit memory buses

Variations are seen though as AIDA64 reports for my [2] GK107--
L1 Cache / Local Data Share is 64 KB per multiprocessor
L1 Texture Cache is 48 KB per multiprocessor. (GK110 has read only 48KB cache)

Here is CUDA report for GK107 by AIDA64-


Memory Properties:
Memory Clock 2000 MHz
Global Memory Bus Width 128-bit
Total Memory 2 GB
Total Constant Memory 64 KB
Max Shared Memory Per Block 48 KB
Max Shared Memory Per Multiprocessor 48 KB
Max Memory Pitch 2147483647 bytes
Texture Alignment 512 bytes
Texture Pitch Alignment 32 bytes
Surface Alignment 512 bytes

Device Features:
32-bit Floating-Point Atomic Addition Supported
32-bit Integer Atomic Operations Supported
64-bit Integer Atomic Operations Supported
Caching Globals in L1 Cache Not Supported
Caching Locals in L1 Cache Supported
Concurrent Kernel Execution Supported
Concurrent Memory Copy & Execute Supported
Double-Precision Floating-Point Supported
ECC Disabled
Funnel Shift Supported
Host Memory Mapping Supported
Integrated Device No
Managed Memory Not Supported
Multi-GPU Board No
Stream Priorities Not Supported
Surface Functions Supported
TCC Driver No
Warp Vote Functions Supported
__ballot() Supported
__syncthreads_and() Supported
__syncthreads_count() Supported
__syncthreads_or() Supported
__threadfence_system() Supported

64-bit Integer Atomic Operations (64-bit version of atomicMin-64-bit version of atomicMax-64-bit version of atomicAnd-64-bit version of atomicOr-64-bit version of atomicXor) are suppose to be only supported for C.C3.5/5.0/5.2 boards. A theory- Nvidia harvested GK110 core to put into unknown low power Kelper- while keeping some features locked (Dynamic Parallelism/funnel shift/64DPSMX) on some GK107/GK108 cards- but allow others to have C.C 3.5 compute feature (GT730m/GT740m/GT640/GT635/Gt630)

Atomic functions operating on 64-bit integer values in shared memory are supported for C.C2.0/2.1/3.0/3.5/50/5.2


[url]http://images.anandtech.com/doci/8526/GeForce_GTX_980_Block_Diagram_FINAL_575px.png[url]

[/url]http://images.anandtech.com/doci/8526/GeForce_GTX_980_SM_Diagram_FINAL_575px.png
[/url]

[/url]http://images.anandtech.com/doci/8526/GM204DieB_575px.jpg[url]

[/url] http://images.anandtech.com/doci/7764/GeForce_GTX_680_SM_Diagram_FINAL_575px.png[url]

[/url]http://images.anandtech.com/doci/7764/GeForce_GTX_750_Ti_SM_Diagram_FINAL_575px.png[url]

[/url]http://images.anandtech.com/doci/7764/SMX_575px.png[url]

[/url]http://images.anandtech.com/doci/7764/SMMrecolored_575px.png[url]

New high and low level optimizations- NVidia secret sauce is certainly being held close to the chest. Anandtech is suppose to investigate Maxwell's GPC in future article due GTX970 64ROPS not being fully utilized.

If someone with GTX970 computing a GPUGRID task could comment on MCU usage and share any other information(cache amounts/CUDA) to compare. Once a GTX960 (12SMM/3GPC?) is released more info about the Higher GM204 MCU usage could understood more.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38566 - Posted: 18 Oct 2014 | 22:39:58 UTC

There is a new application (v8.47) distributed since yesterday.
I'd like to have some information about the changes since the previous version.
It's not faster than the previous one.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38567 - Posted: 18 Oct 2014 | 23:42:03 UTC - in response to Message 38566.

RZ,

No significant changes - 847 just rolls up sm13 support into the cuda64 build, so that i can simplify the logic in the work scheduler.

MJH

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38569 - Posted: 19 Oct 2014 | 11:40:14 UTC

http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980

Information about SMM disabling for GM204 GPC partitions is scarcely available. Looking at SMM/SMX issue/dispatch/crossbar differences and L1 cache/texture cache/L2cache could reveal answers - until a program that can measure/read out warp scheduler/TMU/ROP/cache(s)/[32]Core subset usage is created- most everything is pure speculation. Although- being clearly established: Nvidia does have different die configuration options for GK106/GK104/GK110. Notable performance differences haven't been seen with an SMX GPC.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38571 - Posted: 19 Oct 2014 | 18:14:50 UTC - in response to Message 38569.

With a 970, I’ve seen Memory Controller loads from 27% for short NOELIA_SH2 tasks to 50% for several different long task types.

Running a NOELIA_SH2 WU the reference 970 boosted to 1265MHz straight out of the box and hit 27% load with the CPU being over used (100%), with less CPU usage MC load went up to 31%.

With the GPU clocked @1354MHz MC load reached 50% running long NOELIA_20MG2, SDOERR_BARNA5 and NOELIA_UNFOLD WU's.

Unfortunately I cannot OC the GDDR using Afterburner!

When the CPU was completely saturated (100%) my stock GPU performance was 29% less than with the CPU at 50%.

@1354MHz my 970 is ~30% faster than my 770 was at stock on the same system. So I would expect 970's to generally be about 20 to 30% faster than 770's at reference.

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38572 - Posted: 19 Oct 2014 | 19:52:52 UTC - in response to Message 38571.
Last modified: 19 Oct 2014 | 20:03:26 UTC

Nvidia Inspector 1.9.7.3 supports GM204 boards.(release notes) Inspector shows the brand name of GDDR5. I suggest a clean uninstall of afterburner so driver doesn't become conflicted by having two program settings. I wish a way existed to monitor non-standard internal working's of GPU- other than typical reading all monitoring programs show.

http://www.guru3d.com/files-details/nvidia-inspector-download.html

Given that a GTX970 has 6% more cores than a GTX770- a 20-30% GPUGRID performance increase at reference or above speeds is certainly a decent generational improvement with less power consumption than a GTX770. What was the improvement for GTX 680 compared to GTX580? Similar to Kelper GK104~~~> Maxwell GM204?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38573 - Posted: 19 Oct 2014 | 21:25:04 UTC - in response to Message 38572.
Last modified: 19 Oct 2014 | 21:26:03 UTC

What was the improvement for GTX 680 compared to GTX580?

The 680 was eventually ~42% faster and had a TDP of 195W vs 244W for the 580.
Overall that jump improved performance slightly more whereas this jump has improved performance/Watt more.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38574 - Posted: 19 Oct 2014 | 23:17:17 UTC

Reference rated TDP Wattage per Fermi 32coreSM/ Kelper 192coreSMX/ Maxwell 128coreSMM

GTX580-244TDP [16SM/512cores] 15.25 watts per SM @ 0.47 watt per core

GTX680-195TDP [8SMX/1536cores] 24.37 watts per SMX @ 0.126 watt per core

GTX780-225TDP [12SMX/2304cores] 18.75 watts per SMX @ 0.097 watt per core

GTX780Ti-250TDP [15SMX/2880cores] 16.66 watts per SMX @ 0.086 watt per core

GTX750Ti-60TDP [5SMM/640cores] 12 watts per SMM @ 0.093 watt per core

GTX970-145TDP [13SMM/1664cores] 11.15 watts per SMM @ 0.087 watt per core

GTX980-170TDP [16SMM/2048cores] 10.62 watts per SMM @ 0.829 watt per core

GDDR5/VRM variations not included.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38576 - Posted: 20 Oct 2014 | 7:09:24 UTC - in response to Message 38574.
Last modified: 20 Oct 2014 | 7:11:38 UTC

Reference rated TDP Wattage per Fermi 32coreSM/ Kelper 192coreSMX/ Maxwell 128coreSMM

GTX580-244TDP [16SM/512cores] 15.25 watts per SM @ 0.47 watt per core

GTX680-195TDP [8SMX/1536cores] 24.37 watts per SMX @ 0.126 watt per core

GTX780-225TDP [12SMX/2304cores] 18.75 watts per SMX @ 0.097 watt per core

GTX780Ti-250TDP [15SMX/2880cores] 16.66 watts per SMX @ 0.086 watt per core

GTX750Ti-60TDP [5SMM/640cores] 12 watts per SMM @ 0.093 watt per core

GTX970-145TDP [13SMM/1664cores] 11.15 watts per SMM @ 0.087 watt per core

GTX980-170TDP [16SMM/2048cores] 10.62 watts per SMM @ 0.083 watt per core

GDDR5/VRM variations not included.

Reflects efficiency (GFlops/Watt) quite accurately and goes some way to explaining the design rational.

Can boost the 970 core to 1400MHz but just cant shift the GDDR5 which for here would be more productive (with most tasks)!
Can lower core and week for efficiency; dropping the Power and Temp target results in an automatic Voltage drop. Even @1265 can drop the Power and Temp target to 90% without reducing throughput.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38578 - Posted: 20 Oct 2014 | 8:14:08 UTC - in response to Message 38576.
Last modified: 20 Oct 2014 | 8:16:46 UTC

Do you have a GTX970 reference or custom partner board?
From you're card core clocks (3858-4128GFLOPS) and power targets- efficiency is excellent @ 29.67-33 S/GFLOPS/watt depending upon clock/temp target/voltage. I knew GM204 were capable of 30~35 single/GFLOPS/watt if tuned properly. Even with a WDDM tax of ~6% the work unit completion is still at lower tier runtime of GTX780ti (59% more cores) and top tier times of GTX 780. (looking at top host list- you're 970 is faster than GTX 780 by 1%) with 31% less cores.
I'd say Maxwell for GPUGRID is proven- being the runtime/power consumption winner.(further app refinement could help depending upon work unit step length and step amounts)

As mentioned by ETA earlier in this thread- even an eco-tuned GK110 can't match (or come close) to runtime/power consumption ratio of GM204 Maxwell.

BTW- thanks for editing the zero I left out.

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38580 - Posted: 20 Oct 2014 | 9:30:39 UTC - in response to Message 38571.

With a 970, I’ve seen Memory Controller loads from 27% for short NOELIA_SH2 tasks to 50% for several different long task types.


I see very similar numbers for my 980. The memory clock seems "locked" at 6000 MHz when running GPUGrid tasks. It doesn't respond to overclocking inputs. It does jump to 7000 MHz when running Heaven stress testing however.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38581 - Posted: 20 Oct 2014 | 9:48:00 UTC - in response to Message 38578.
Last modified: 20 Oct 2014 | 9:49:58 UTC

It's a Palit NE5X970014G2-2041F (1569) GM204-A Rev A1 with a default core clock of 1051MHz.
It uses an exhaust fan (blower), so while it's a Palit shell it's basically of reference design. Don't know of any board alterations from reference designs.
My understanding is that Palit support GDDR5 from Elpida, Hynix and Samsung. This model has the Samsung GDDR5 and like other Palit models is supposed to operate at 3505MHz (7000MHz effectively). However it seems fixed at 3005MHz. While I can set the clock to 3555MHz the current clock remains at 3005MHz. Raising or lowering it does not change the MCL (so it appears that my settings are being ignored).
So while it can run at ~110% power @ 1.212V (1406MHz) @64C Fan@75% I cannot reduce the MCL bottleneck (53% @1406MHz) which I would prefer to do.

http://www.palit.biz/palit/vgapro.php?id=2406
PN : NE5X970014G2-2041F
Memory : 4096MB / 256bit
DRAM Type : GDDR5
Clock : 1051MHz (Base) / 1178MHz (Boost)
Memory Clock : 3500 MHz (DDR 7000 MHz)
mHDMI / DVI / DisplayPort

biodoc, thanks for letting us know you are experiencing the same GDDR5 issue. Anyone else seeing this (or not)?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38582 - Posted: 20 Oct 2014 | 10:35:16 UTC - in response to Message 38581.
Last modified: 20 Oct 2014 | 10:37:24 UTC

It's a Palit NE5X970014G2-2041F (1569) GM204-A Rev A1 with a default core clock of 1051MHz.
It uses an exhaust fan (blower), so while it's a Palit shell it's basically of reference design. Don't know of any board alterations from reference designs.
My understanding is that Palit support GDDR5 from Elpida, Hynix and Samsung. This model has the Samsung GDDR5 and like other Palit models is supposed to operate at 3505MHz (7000MHz effectively). However it seems fixed at 3005MHz. While I can set the clock to 3555MHz the current clock remains at 3005MHz. Raising or lowering it does not change the MCL (so it appears that my settings are being ignored).

The same applies to my Gigabyte GTX-980.

So while it can run at ~110% power @ 1.212V (1406MHz) @64C Fan@75% I cannot reduce the MCL bottleneck (53% @1406MHz) which I would prefer to do.

Is 53% MCL really a bottleneck? Shouldn't this bottleneck lower the GPU usage? Did you try to lower the memory clock to measure the effect of this 'bottleneck'?

I've tried Furmark, and it seems to be limited by memory bandwith, while GPUGrid seems to be limited by GPU speed:



The history of the graph is:
GPUGrid -> Furmark (1600x900) -> Furmark (1920x1200 fullscreen) -> GPUGrid

biodoc, thanks for letting us know you are experiencing the same GDDR5 issue. Anyone else seeing this (or not)?

It's hard to spot, (3005MHz instead of 3505MHz), but my GTX980 does the same, but I don't think that this is an error.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38583 - Posted: 20 Oct 2014 | 10:44:33 UTC - in response to Message 38581.

Mysteries of Maxwell continue- Here are some GTX970/980 board layout photos. Note: not all are reference. Maybe something changed concerning the GDDR5 circuitry or overclocking utilities haven't accounted all Maxwell PCB's ?

http://cdn.videocardz.com/1/2014/09/GeForce-GTX-970-vs-GTX-760-974x1000.jpg

http://koolance.com/image/content_pages/product_help/video_card_pcb_layouts/pcb_nvidia_geforce_gtxtitan_reference.jpg

http://www.3dnews.ru/assets/external/illustrations/2014/09/30/902762/sm.board_back.800.jpg


[url]http://forums.evga.com/EVGA-GTX-970-ACX-20-quality-concerns-m2219546.aspx[url]

http://www.3dnews.ru/assets/external/illustrations/2014/09/30/902762/sm.board_front.800.jpg

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38584 - Posted: 20 Oct 2014 | 10:56:22 UTC - in response to Message 38582.
Last modified: 20 Oct 2014 | 12:37:03 UTC

Is 53% MCL really a bottleneck?

That's the question I started out trying to find the answer to - is the increased MCL really a bottleneck?
Our point of reference is that we know it was with some Kepler's. While that picture was complicated by cache variations the GTX650TiBoost allowed us to determine that cache wasn't the only bottleneck and the MCL was definitely a bottleneck in itself (for some other cards).

Shouldn't this bottleneck lower the GPU usage?

Depends on how GPU usage is being measured, but MCL should rise with GPU usage, as more bandwidth is required to support the GPU, and it appeared to do just that:
When I reduced CPU usage from 100% to 55% the GPU usage rose from 89% to 93% and the MCL increased from ~46% to 49%.
At 100% CPU usage both the GPU usage and MCL were also more erratic.

Also, when I increased the GPU clock the MCL increased:
1126MHz GPU - 45% MCL
1266MHz GPU - 49% MCL
1406MHz GPU - 53% MCL

So the signs are there.

Being able to OC or boost the GDDR5 should offset the increase in MCL (it did with Kepler's).

Did you try to lower the memory clock to measure the effect of this 'bottleneck'?

I tried but I cant change the memory clock - the Current Clock remains at 3005MHz (the default clock). It appears that NVidia Inspector, GPUZ (and previously MSI Afterburner) recognised that I've asked that the GDDR5 clocks are increased, but they don't actually rise.

I've tried Furmark, and it seems to be limited by memory bandwith, while GPUGrid seems to be limited by GPU speed:

I'm wondering if the measured MCL is actually measuring usage of the new compression system and if this actually reflects a bottleneck or not. Increasing the GDDR5 would be the simple test, but that's a non-starter, which is another question in itself.

The only way to confirm if the MCL increase is really a bottleneck is to run similar WU's at different GPU frequencies and plot the results looking for diminishing returns. You would still expect to gain plenty from a GPU OC, but should see less gain as a result of MCL increases at higher GPU frequencies. Even with a frequency difference of 1406 vs 1126 (280MHz) the MCL difference is just 18% (53% vs 45% load), but six or seven points down to around 1051MHz might be enough to spot the effect of a MCL bottleneck, if it exists.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38604 - Posted: 21 Oct 2014 | 12:10:59 UTC - in response to Message 38600.
Last modified: 21 Oct 2014 | 12:56:20 UTC

Using NVIDIA Inspector you can make sure the Current GDDR5 clocks are high, but you have to match the P-State value on the Overclocking panel to the state shown on the left. For me the P-State is P2, so in order to ensure 3505MHz is used I have to set the overclocking Performance Level to P2. Then I can push the Memory CLock Offset to 3505MHz.
When I did this with the GPU clock at 1406MHz-ish, the MCU load dropped to 45%
While I can select to unlock the clocks I cannot increase past 3505MHz - it just reverts. Hopefully this will allow for better performance and tuning...

For those with this issue, you might want to create a batch file setting your required (command line) values, and getting it to run at startup or by create a clocks shortcut from NVIDIA Inspector and either just double-clicking on it every time you restart or get it to automatically run at startup.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

biodoc
Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 106,804
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38628 - Posted: 22 Oct 2014 | 8:44:43 UTC - in response to Message 38604.

Using NVIDIA Inspector you can make sure the Current GDDR5 clocks are high, but you have to match the P-State value on the Overclocking panel to the state shown on the left. For me the P-State is P2, so in order to ensure 3505MHz is used I have to set the overclocking Performance Level to P2. Then I can push the Memory CLock Offset to 3505MHz.
When I did this with the GPU clock at 1406MHz-ish, the MCU load dropped to 45%
While I can select to unlock the clocks I cannot increase past 3505MHz - it just reverts. Hopefully this will allow for better performance and tuning...

For those with this issue, you might want to create a batch file setting your required (command line) values, and getting it to run at startup or by create a clocks shortcut from NVIDIA Inspector and either just double-clicking on it every time you restart or get it to automatically run at startup.



Thanks skgiven. Do you see an increase in performance on GPUGrid WUs when the memory clock is increased to 3500 MHz?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38629 - Posted: 22 Oct 2014 | 9:28:30 UTC - in response to Message 38628.
Last modified: 22 Oct 2014 | 9:54:10 UTC

Thanks skgiven. Do you see an increase in performance on GPUGrid WUs when the memory clock is increased to 3500 MHz?

I think so but I don't have much to go on so far. I was really just looking for a MCL drop, which I found (~53% to ~45%).
To confirm actual runtime improvement (if any) that results solely from the memory freq. increase I would really need to run several long same type WU's at 3505MHz, then several at 3005MHz, all with the same GPU clock and Boinc settings. Ideally others would do the same to confirm findings.
That will take two or three days as there are a mixture of Long task types and each take 5 or 6h to run...
I think you would be less likely to spot a small performance change from running short WU's as those only have a MCL of around about 27%; it's not like we are overclocking here, just making sure the GDDR5 runs at the speed it's supposed to. Most of us run the Long WU's anyway, so that's what we should focus on.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38677 - Posted: 25 Oct 2014 | 14:08:39 UTC - in response to Message 38629.
Last modified: 25 Oct 2014 | 15:53:34 UTC

The improvement was only 2 or 3% for me with the GPU at around 1406MHz.
There is a new compression algorithm which presumably runs on the GPU so any increase in GPU rate also increases compression rate. Anyway it appears there isn't a lot to be gained from increasing the GDDR frequency. That said, I just had a quick look at one GPU clock setting on one system and it's not like I actually overclocked the memory; just forced it to work at 3500MHz (as far as I can tell). It might be the case that at higher GPU clocks a memory frequency increase would result in greater performance, or that increasing the memory (as opposed to the GPU) allows you to run more efficiently (or not).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38682 - Posted: 25 Oct 2014 | 22:20:35 UTC

Regarding the question of whether ~50% MCL could be a bottleneck: the memory accesses don't have to be distributed homogenously over time. So even if the memory controllers may be idle about 50% of the time, the chip may be waiting for new data to continue processing at other times.

Another point to consider is reduced latency, which simple clock speed increases will yield. These are nto terribly important for a GPU, as it's made to mask latency by keeping tons of threads in flight. But still, lower latency will certainly not perform worse.

SK, have you tried EVGA precision to set memory clocks?

I have to confess I'm a bit jealous: I ordered my GTX970 over a week ago, but so far no signs of it being shipped. The shop is trustworthy, though. I went for the Galax EXOC model, because it's the cheapest one which can take my Thermalright Shaman :D

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38706 - Posted: 26 Oct 2014 | 19:51:04 UTC - in response to Message 38682.
Last modified: 26 Oct 2014 | 21:00:45 UTC

I have not tried EVGA precision (yet) but I did eventually work out how to increase GDDR5 past 3505MHz using NV Inspector.

You have to set the P-states when they are not in use!
Presently working at 3605MHz...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38720 - Posted: 27 Oct 2014 | 21:58:08 UTC - in response to Message 38706.

Give that card a good spanking :D

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38811 - Posted: 4 Nov 2014 | 23:05:22 UTC

I finally got my GTX970 yesterday!

And I have the same problem as SK: by default the card crunches in power state P2, whereas my OC utility (EVGA precision) manipulates state P0. This works well for games and surprisingly the GPU core clock as well, but not for the memory. Using nVidia inspector I can increase the memory clock for P2 to the default 3500 MHz.

However, when I do this I quickly get blue screens! The card runs at that memory clock just fine in Unigine Heaven, so I'm not completely sure it's this setting which causes the BS. If I don't use inspector things seem fine, though. Will test further.

Do you guys know anything about this? Does the memory use a lower voltage in P2? Can I force P0? Setting the power management to maximum performance had no effect. I also tried

nvidiainspector.exe -forcepstate:0,0

to put my single nVidia GPU into P0. But all this did was setting it to P2 with fixed clocks as set in inspector.

MrS
____________
Scanning for our furry friends since Jan 2002

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38812 - Posted: 4 Nov 2014 | 23:45:25 UTC - in response to Message 38811.

Are you using the latest Nvidia Inspector? They mention a couple of changes that might be of interest.
http://www.guru3d.com/files_details/nvidia_inspector_download.html

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38816 - Posted: 5 Nov 2014 | 7:55:49 UTC - in response to Message 38812.

Yes, I'm using the current 1.9.7.3.

Thanks,
MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38820 - Posted: 5 Nov 2014 | 19:31:30 UTC - in response to Message 38811.

Congratulations ETA.

Quite a coincidence, I ordered an EVGA GTX970 OC last week, but its shipment was delayed a few times, and I got it yesterday too.
I put it immediately in my system to replace the GTX770.

I notice that it can run very warm. I use PrecisionX for all setting and when the temperature is allowed to run to 80°C, it quickly does. So warmer then my GTX780Ti's.
When I allow to 80°C the clock boost to 1341,6, and TDP is ~95.3

Setting maximum temperature to 74°C (prioritize) the clock boost to 1303.2MHz, and TDP ~81.1.

From what I have read I thought that the 9xx cards ran cooler then the 780Ti, but that is not what I see.

By the way this is the version with only one radial fan, that is what I wanted.

____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38825 - Posted: 5 Nov 2014 | 21:01:06 UTC

Thanks! Have fun with your as well :)

Mine is currently running at manually set 1315 MHz @ 1.075 V. From a quick Unigine Heaven test I know it should be able to take about 40 MHz more at that clock speed, which is really amazing for 28 nm. Power consumption is currently about 140 W at moderate 87% GPU utilization. I hope I'll be able to increase this further when I get the memory clock up to where it should be.

Well, "cooler running" is only half of the story. It does consume much less power, for sure, which means it'S far easier to keep cool. But if you pair it with a weak cooler it will run just as hot as any other card. Mine is the Galax with an open air cooler, which handles the card nicely (quieter than my case fans at moderate speed, 66°C with 22°C ambient temperature).

But getting back to my question from yesterday: is your card also crunching with the memory running at 1500 / 3000 / 6000 MHz? Is it also in power state P2? (NV inspector can show this). So far I have left my card's memory clock at stock 1.5 GHz and have not faced a single blue screen since then.

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38827 - Posted: 5 Nov 2014 | 21:18:57 UTC

BTW: the bitcoin folks are seeing the same problem. Some can't OC the memory, while one can. They don't get blue screen at stock memory clock, though.

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38835 - Posted: 5 Nov 2014 | 23:35:01 UTC

OK, I'm getting closer: when I have GPU-Grid crunching in the background not even running Heaven triggers P0! It just stays there at P2, in the same way as everything I've tried with inspector.

This is so strange. Maybe it's related to running CUDA?

BTW: 344.60 WHQL behaves the same.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38855 - Posted: 6 Nov 2014 | 15:17:38 UTC

I installed nvidia inspector and it shows the card runs in P2. The clock is at 1304MHz (I have set with PresicionX the max temp to 73°C and the power target to 105). The clock can run higher if I set the temperature higher.
Voltage is 1125mV and usage 90%, TDP 84%.


I am waiting for the real Maxwell's but in the meantime I want to have one GTX980 also (or a GTX980Ti). What I noticed however that EVGA cards are nowhere in stock, the price increased from EUR 585 last Friday to EUR705 since this Monday. I check daily if it is available to order. That is more expensive then the better GTX780Ti.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38859 - Posted: 6 Nov 2014 | 22:14:00 UTC - in response to Message 38855.
Last modified: 6 Nov 2014 | 22:18:12 UTC

I suppose your memory clock is set to 1500 MHz (in GPU-Z) as well? I'm trying again to manually run 1750 MHz. I passed "memtestcl" at that and since a few minutes everything looks fine. I could live with manually forcing the clock, while staying in P2, as long as I don't get BS again!

Edit: don't hold your breath for a GTX980Ti. Rumors say "big Maxwell" is relatively far in production, but after the very successful launch of GM204 there's no reason for nVidia to bring it out yet. Or to price it at some sane level. I expect they'll use those valuable chips for HPC, Tesla and Quadro first, just as they did with GK110.

And I don't think GTX980 is worth the added money, certainly not at those EVGA-scarcity prices.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38862 - Posted: 7 Nov 2014 | 1:42:06 UTC - in response to Message 38820.

The 1920 DP64 core (5760 total cores/450WattTDP?) Titan-Z is currently priced @ 1399-1499usd equaling cost of [2]GTX980(4096totalcore/128DP64/340wattTDP)
Big Maxwell is around the corner. Maybe 6-12 weeks. What will be AMD response to GM204- never mind GM2*0? A GTX970 equals AMD 290X @ 1440p/2160p gaming- that's NVidia 5th Total core count board compared to AMD top 2816 core card. GM204 higher clocks have an edge.
Three years GK110 lasted without a replacement. Speaks to how well engineered Kelper is. Read Nvidia Dev Boards: there are "some" who think C.C 3.5 code arch is better than C.C 5.2. Not Including efficiency. (DP64 cores use more wattage then 32bit cores. This one reasons Maxwell TDP is lower- the loss of DP64 cores and change from superscalar to Scalar? Struggle for DP64 - C.C5.0/5.2 was trimmed even further away from Fermi's 1/2or1/8 and Kelper 1/3or1/16 64 bit core ratio. (I'm curious to see how many DP64 are in an GM2*0 SMM and TDP. C.C 3.5 64DP core per SMX @ 1/3rd total SMX core count.(driver limited to 8DP SMX for GTX780[ti] @ 1/16th total SMX) Maxwell C.C 5.0/5.2 is 1/32 total SMM.

Not specific to GPUGRID: C.C 5.0/5.2 L1/L2 cache/global and shared memory behavior is different. Maxwell lacks [32] bank--64bit addressing mode. Kelper 3.0/3.5 64bit/8bytewide (if chosen code path) default is 4bytes. Maxwell faster at hashing- as Kelper is for DP64.

CUDA Programming guide states:

"section by reading it using the __ldg() function (see Read-Only Data Cache Load Function). When the compiler detects that the read-only condition is satisfied for some data, it will use __ldg() to read it. The compiler might not always be able to detect that the read-only condition is satisfied for some data. Marking pointers used for loading such data with both the const and __restrict__ qualifiers increases the likelihood that the compiler will detect the read-only condition.

Data that is not read-only for the entire lifetime of the kernel cannot be cached in the unified L1/texture cache for devices of compute capability 5.0. For devices of compute capability 5.2, it is, by default, not cached in the unified L1/texture cache, but caching may be enabled using the following mechanisms: •Perform the read using inline assembly with the appropriate modifier as described in the PTX reference manual;
• Compile with the -Xptxas -dlcm=ca compilation flag, in which case all reads are cached, except reads that are performed using inline assembly with a modifier that disables caching;
• Compile with the -Xptxas -fscm=ca compilation flag, in which case all reads are cached, including reads that are performed using inline assembly regardless of the modifier used.
When caching is enabled using some the three mechanisms listed above, devices of compute capability 5.2 will cache global memory reads in the unified L1/texture cache for all kernel launches except for the kernel launches for which thread blocks consume too much of the multiprocessor's resources. These exceptions are reported by the profiler."

C.C 3.0/3.5 shared memory:
"Shared memory has 32 banks with two addressing modes that are described below.
The addressing mode can be queried using cudaDeviceGetSharedMemConfig() and set using cudaDeviceSetSharedMemConfig() (see reference manual for more details). Each bank has a bandwidth of 64 bits per clock cycle."

64-Bit Mode
Successive 64-bit words map to successive banks.
"A shared memory request for a warp does not generate a bank conflict between two threads that access any sub-word within the same 64-bit word (even though the addresses of the two sub-words fall in the same bank): In that case, for read accesses, the 64-bit word is broadcast to the requesting threads and for write accesses, each sub-word is written by only one of the threads (which thread performs the write is undefined). In this mode, the same access pattern generates fewer bank conflicts than on devices of compute capability 2.x for 64-bit accesses and as many or fewer for 32-bit accesses."

32-Bit Mode
Successive 32-bit words map to successive banks.
"A shared memory request for a warp does not generate a bank conflict between two threads that access any sub-word within the same 32-bit word or within two 32-bit words whose indices i and j are in the same 64-word aligned segment (i.e., a segment whose first index is a multiple of 64) and such that j=i+32 (even though the addresses of the two sub-words fall in the same bank): In that case, for read accesses, the 32-bit words are broadcast to the requesting threads and for write accesses, each sub-word is written by only one of the threads (which thread performs the write is undefined).
In this mode, the same access pattern generates as many or fewer bank conflicts than on devices of compute capability 2.x."

C.C 5.0/5.2 shared Memory:
"Shared memory has 32 banks that are organized such that successive 32-bit words map to successive banks. Each bank has a bandwidth of 32 bits per clock cycle."

C.C 2.0:
"Shared memory has 32 banks that are organized such that successive 32-bit words map to successive banks. Each bank has a bandwidth of 32 bits per two clock cycles."

can be applied to C.C 2.0/3.0/3.5/5.0/5.2:
"A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank): In that case, for read accesses, the word is broadcast to the requesting threads and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined)."

"A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank): In that case, for read accesses, the word is broadcast to the requesting threads (and unlike for devices of compute capability 1.x, multiple words can be broadcast in a single transaction) and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined).

This means, in particular, that unlike for devices of compute capability 1.x, there are no bank conflicts if an array of char is accessed as follows, for example:
extern __shared__ char shared[];
char data = shared[BaseIndex + tid];

Also, unlike for devices of compute capability 1.x, there may be bank conflicts between a thread belonging to the first half of a warp and a thread belonging to the second half of the same warp.

32-Bit Strided Access
"A common access pattern is for each thread to access a 32-bit word from an array indexed by the thread ID tid and with some stride s:
extern __shared__ float shared[];
float data = shared[BaseIndex + s * tid];

In this case, threads tid and tid+n access the same bank whenever s*n is a multiple of the number of banks (i.e., 32) or, equivalently, whenever n is a multiple of 32/d where d is the greatest common divisor of 32 and s. As a consequence, there will be no bank conflict only if the warp size (i.e., 32) is less than or equal to 32/d, i.e., only if d is equal to 1, i.e., s is odd."

Larger Than 32-Bit Access
"64-bit and 128-bit accesses are specifically handled to minimize bank conflicts as described below.

Other accesses larger than 32-bit are split into 32-bit, 64-bit, or 128-bit accesses. The following code, for example: struct type {
float x, y, z;
};
extern __shared__ struct type shared[];
struct type data = shared[BaseIndex + tid];
results in three separate 32-bit reads without bank conflicts since each member is accessed with a stride of three 32-bit words.
64-Bit Accesses: For 64-bit accesses, a bank conflict only occurs if two threads in either of the half-warps access different addresses belonging to the same bank. Unlike for devices of compute capability 1.x, there are no bank conflicts for arrays of doubles accessed as follows, for example:
extern __shared__ double shared[];
double data = shared[BaseIndex + tid];

128-Bit Accesses: The majority of 128-bit accesses will cause 2-way bank conflicts, even if no two threads in a quarter-warp access different addresses belonging to the same bank. Therefore, to determine the ways of bank conflicts, one must add 1 to the maximum number of threads in a quarter-warp that access different addresses belonging to the same bank."




eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38863 - Posted: 7 Nov 2014 | 15:11:14 UTC - in response to Message 38862.
Last modified: 7 Nov 2014 | 15:21:45 UTC

Clarification for New Maxwell card owners and Kelper owners looking to upgrade. What will Maxwell GM2*0 bring to it's brethren ? We know of few proven GM107/GM204 changes from Kelper: refined crossbar/dispatch/issue - Better power efficiency per SMM compared to SMX- runs cooler- lower DP64 performance from loss of 64bit Banks and 8byte shared banking mode (less DP core per SMM) - Higher Integer (more cores per SMM) - different TMU/CUDA core ratio - revised filtering/cache/memory algorithms - a barrel shifter (left out of Kelper)- and enhanced SIP core block for video encoding/decoding.(Kelper is first generation)

Version 1.1 Maxwell Tuning Guide for further comparisons:

The Maxwell Streaming Multiprocessor, SMM, is similar in many respects to the Kepler architecture's SMX. The key enhancements of SMM over SMX are geared toward improving efficiency without requiring significant increases in available parallelism per SM from the application.

1.4.1.1. Occupancy
The maximum number of concurrent warps per SMM remains the same as in SMX (i.e., 64), and factors influencing warp occupancy remain similar or improved over SMX: •The register file size (64k 32-bit registers) is the same as that of SMX.
•The maximum registers per thread, 255, matches that of Kepler GK110. As with Kepler, experimentation should be used to determine the optimum balance of register spilling vs. occupancy, however.
•The maximum number of thread blocks per SM has been increased from 16 to 32. This should result in an automatic occupancy improvement for kernels with small thread blocks of 64 or fewer threads (shared memory and register file resource requirements permitting). Such kernels would have tended to under-utilize SMX, but less so SMM.
•Shared memory capacity is increased (see Shared Memory Capacity).

As such, developers can expect similar or improved occupancy on SMM without changes to their application. At the same time, warp occupancy requirements (i.e., available parallelism) for maximum device utilization are similar to or less than those of SMX (see Instruction Latencies).

1.4.1.2. Instruction Scheduling
The number of CUDA Cores per SM has been reduced to a power of two, however with Maxwell's improved execution efficiency, performance per SM is usually within 10% of Kepler performance, and the improved area efficiency of SMM means CUDA Cores per GPU will be substantially higher vs. comparable Fermi or Kepler chips. SMM retains the same number of instruction issue slots per clock and reduces arithmetic latencies compared to the Kepler design.

As with SMX, each SMM has four warp schedulers. Unlike SMX, however, all SMM core functional units are assigned to a particular scheduler, with no shared units. Along with the selection of a power-of-two number of CUDA Cores per SM, which simplifies scheduling and reduces stall cycles, this partitioning of SM computational resources in SMM is a major component of the streamlined efficiency of SMM.

The power-of-two number of CUDA Cores per partition simplifies scheduling, as each of SMM's warp schedulers issue to a dedicated set of CUDA Cores equal to the warp width. Each warp scheduler still has the flexibility to dual-issue (such as issuing a math operation to a CUDA Core in the same cycle as a memory operation to a load/store unit), but single-issue is now sufficient to fully utilize all CUDA Cores.

1.4.1.3. Instruction Latencies
Another major improvement of SMM is that dependent math latencies have been significantly reduced; a consequence of this is a further reduction of stall cycles, as the available warp-level parallelism (i.e., occupancy) on SMM should be equal to or greater than that of SMX (see Occupancy), while at the same time each math operation takes less time to complete, improving utilization and throughput.

1.4.1.4. Instruction Throughput
The most significant changes to peak instruction throughputs in SMM are as follows: •The change in number of CUDA Cores per SM brings with it a corresponding change in peak single-precision floating point operations per clock per SM. However, since the number of SMs is typically increased, the result is an increase in aggregate peak throughput; furthermore, the scheduling and latency improvements also discussed above make this peak easier to approach.
•The throughput of many integer operations including multiply, logical operations and shift is improved. In addition, there are now specialized integer instructions that can accelerate pointer arithmetic. These instructions are most efficient when data structures are a power of two in size.

Note: As was already the recommended best practice, signed arithmetic should be preferred over unsigned arithmetic wherever possible for best throughput on SMM. The C language standard places more restrictions on overflow behavior for unsigned math, limiting compiler optimization opportunities.

1.4.2. Memory Throughput
1.4.2.1. Unified L1/Texture Cache
Maxwell combines the functionality of the L1 and texture caches into a single unit.
As with Kepler, global loads in Maxwell are cached in L2 only, unless using the LDG read-only data cache mechanism introduced in Kepler.
In a manner similar to Kepler GK110B, GM204 retains this behavior by default but also allows applications to opt-in to caching of global loads in its unified L1/Texture cache. The opt-in mechanism is the same as with GK110B: pass the -Xptxas -dlcm=ca flag to nvcc at compile time.

Local loads also are cached in L2 only, which could increase the cost of register spilling if L1 local load hit rates were high with Kepler. The balance of occupancy versus spilling should therefore be reevaluated to ensure best performance. Especially given the improvements to arithmetic latencies, code built for Maxwell may benefit from somewhat lower occupancy (due to increased registers per thread) in exchange for lower spilling.

The unified L1/texture cache acts as a coalescing buffer for memory accesses, gathering up the data requested by the threads of a warp prior to delivery of that data to the warp. This function previously was served by the separate L1 cache in Fermi and Kepler.

Two new device attributes were added in CUDA Toolkit 6.0: globalL1CacheSupported and localL1CacheSupported. Developers who wish to have separately-tuned paths for various architecture generations can use these fields to simplify the path selection process.

Note: Enabling caching of globals in GM204 can affect occupancy. If per-thread-block SM resource usage would result in zero occupancy with caching enabled, the CUDA driver will override the caching selection to allow the kernel launch to succeed. This situation is reported by the profiler.

1.4.3. Shared Memory
1.4.3.1. Shared Memory Capacity
With Fermi and Kepler, shared memory and the L1 cache shared the same on-chip storage. Maxwell, by contrast, provides dedicated space to the shared memory of each SMM, since the functionality of the L1 and texture caches have been merged in SMM. This increases the shared memory space available per SMM as compared to SMX: GM107 provides 64 KB shared memory per SMM, and GM204 further increases this to 96 KB shared memory per SMM.

This presents several benefits to application developers: •Algorithms with significant shared memory capacity requirements (e.g., radix sort) see an automatic 33% to 100% boost in capacity per SM on top of the aggregate boost from higher SM count.
•Applications no longer need to select a preference of the L1/shared split for optimal performance. For purposes of backward compatibility with Fermi and Kepler, applications may optionally continue to specify such a preference, but the preference will be ignored on Maxwell, with the full 64 KB per SMM always going to shared memory.

Note: While the per-SM shared memory capacity is increased in SMM, the per-thread-block limit remains 48 KB. For maximum flexibility on possible future GPUs, NVIDIA recommends that applications use at most 32 KB of shared memory in any one thread block, which would for example allow at least two such thread blocks to fit per SMM.

1.4.3.2. Shared Memory Bandwidth
Kepler SMX introduced an optional 8-byte shared memory banking mode, which had the potential to increase shared memory bandwidth per SM over Fermi for shared memory accesses of 8 or 16 bytes. However, applications could only benefit from this when storing these larger elements in shared memory (i.e., integers and fp32 values saw no benefit), and only when the developer explicitly opted into the 8-byte bank mode via the API.

To simplify this, Maxwell returns to the Fermi style of shared memory banking, where banks are always four bytes wide. Aggregate shared memory bandwidth across the chip remains comparable to that of corresponding Kepler chips, given increased SM count. In this way, all applications using shared memory can now benefit from the higher bandwidth, even when storing only four-byte items into shared memory and without specifying any particular preference via the API.

1.4.3.3. Fast Shared Memory Atomics
Kepler introduced a dramatically higher throughput for atomic operations to global memory as compared to Fermi. However, atomic operations to shared memory remained essentially unchanged: both architectures implemented shared memory atomics using a lock/update/unlock pattern that could be expensive in the case of high contention for updates to particular locations in shared memory.

Maxwell improves upon this by implementing native shared memory atomic operations for 32-bit integers and native shared memory 32-bit and 64-bit compare-and-swap (CAS), which can be used to implement other atomic functions with reduced overhead compared to the Fermi and Kepler methods.
Note: Refer to the CUDA C Programming Guide for an example implementation of an fp64 atomicAdd() using atomicCAS().

Source: Maxwell Tuning Guide. Version 1.1

Throughput of Native Arithmetic Instructions. (Number of Operations per Clock Cycle per Multiprocessor) For Total throughput: multiply the Operations for each SM/SMX/SMM

Compute Capability Throughput from left to right:1.1/1.2/1.3/2.0/2.1/3.0/3.5/5.x

32-bit floating-point add, multiply, multiply-add: 8/8/32/48/192/192/128
64-bit floating-point add, multiply, multiply-add: N/A/1/16/4/8/64/1
32-bit floating-point reciprocal, reciprocal square root, base-2 logarithm (__log2f), base 2 exponential (exp2f), sine (__sinf), cosine (__cosf): 2/2/4/8/32/32/32
32-bit integer add, extended-precision add, subtract, extended-precision subtract: 10/10/32/48/160/160/128
32-bit integer multiply, multiply-add, extended-precision multiply-add: Multiple instructions/Multiple instructions/16/16/32/32/Multiple instructions
24-bit integer multiply (__[u]mul24): 8/8/Multiple instructions/Multiple instructions/Multiple instructions/Multiple instructions/Multiple instructions
32-bit integer shift: 8/8/16/16/32/64/64
compare, minimum, maximum: 10/10/32/48/160/160/64
32-bit integer bit reverse, bit field extract/insert: Multiple instructions/ Multiple instructions/16/16/32/32/64
32-bit bitwise AND, OR, XOR: 8/8/32/48/160/160/128
count of leading zeros, most significant non-sign bit: Multiple instructions/ Multiple instructions/16/16/32/32/Multiple instructions
population count: Multiple instructions/Multiple instructions/16/16/32/32/32
warp shuffle: N/A N/A N/A N/A 32/32/32
sum of absolute difference: Multiple instructions/Multiple instructions/16/16/32/32/64
SIMD video instructions vabsdiff2: N/A N/A N/A N/A/160/160/Multiple instructions
SIMD video instructions vabsdiff4: N/A N/A N/A N/A/160/160/Multiple instructions
All other SIMD video instructions: N/A N/A/16/16/32/32/Multiple instructions
Type conversions from 8-bit and 16-bit integer to 32-bit types: 8/8/16/16/128/128/32
Type conversions from and to 64-bit types Multiple instructions: Multiple instructions/1/16/4/8/32/4
All other type conversions: 8/8/16/16/32/32/32

Source: CUDA C Programming Guide

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38864 - Posted: 7 Nov 2014 | 15:57:51 UTC - in response to Message 38863.

That is absolutely and exhaustively TLDR.
However - if you like - you can choose some further readings from this list.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38865 - Posted: 7 Nov 2014 | 19:10:22 UTC - in response to Message 38859.

I suppose your memory clock is set to 1500 MHz (in GPU-Z) as well? I'm trying again to manually run 1750 MHz. I passed "memtestcl" at that and since a few minutes everything looks fine. I could live with manually forcing the clock, while staying in P2, as long as I don't get BS again!

MrS

Yes indeed the memory clock runs at 1500MHz. I have now set the temperature maximum to 73°C. The GTX970 runs smooth, troughs hot air out the cast and is around 10.000 seconds faster then my GTX777. Checked via an energy monitor, the systems runs with 100W less energy then with the GTX770 in it.
So not bad at all.

What will be the effect to get it in P0 state and memory higher? Will that result in faster processing of WU's?
____________
Greetings from TJ

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38866 - Posted: 7 Nov 2014 | 22:27:19 UTC - in response to Message 38865.

I'm looking forward to Matt's analysis.
(Sorry for TLDR posts)
GPUGIRD not withstanding- Kelper still reigns in many places. C.C 5.2 has room for further improvements. C.C 5.2 struggles against C.C 3.5 and does okay versus C.C 3.0 in some paths (C.C 3.0 is better at float double and generating 64bit random numbers)
This similar to C.C2.0/C.C2.1 realignment (32SM16DP~~>48SM4DP)
A lot GK110 features were transplanted into GM107/GM204
The replacement for a Flagship is now more interesting from recent GK110/GM204 owner inquiries. C.C 3.5 is still faster in many ways. C.C 5.2 differences became notable once many sectors had time for C.C 5.2 to be test-worthy.

What will NVidia offer to replace a very successful and still meaningful flagship? A lower TDP with less apt code structure in areas is not equal as there are some CUDA C.C 5.2 limitations compared to C.C 3.5/3.0
Tesla/Quadro/Titan had a price drop lately to open up inventory.
Looking at the price for GTX980/GTX780[ti] (500-900usd) current market as- a 1000$ Titan looks to be a worth a bargain two years ago or even recently. (GTX780[ti] were at a record low 300$-500 a couple weeks after GM204 launched. What will be specs for GM2*0? Will it be a 1/2DP64 ratio with 64DPSMM/16DPper [4]32core block? More than 960DP64 for enabled GK110?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38867 - Posted: 7 Nov 2014 | 23:06:31 UTC - in response to Message 38865.

What will be the effect to get it in P0 state and memory higher? Will that result in faster processing of WU's?

It should get the memory clock up to 1750 Mhz automatically. This should be good for about a 4% performance boost according to SK. So it's not dramatic, but quite a few credits per day more. Memory controller load would reduce from almost 50% to a good 40%. Scaling with higher GPU clocks should be improved. And GTX980 might like this a lot, as with higher performance it needs even more bandwidth to feed the beast.

And I'm not sure if P2 uses a lower memory voltage. That's pretty much the point I care about most. Initially I got blue screens when I tried to run my memory at stock 1750 MHz in P2, whereas Heaven ran fine for about 2h at that clock speed in P0. When I tried it again yesterday I got a calculation error. Other people are reporting memory clocks up to 2000 MHz for these cards, which might be totally out of reach if I can not even get 1750 Mhz stable.

And finally: I'm asking myself, why is nVidia limiting the mem clock speed in P2? and why are they changing to P2 when ever CUDA or OpenCL is being used? It can't be low utilization (~50%, as in Heaven). Maybe they're using stricter timings in this mode, which might generally benefit GP-GPU more than higher throughput? This could explain why my card would take this clock speed in P0, but not in P2.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,376,317,465
RAC: 3,485,168
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38869 - Posted: 8 Nov 2014 | 0:22:32 UTC - in response to Message 38867.
Last modified: 8 Nov 2014 | 0:24:06 UTC

I'm asking myself, why is nVidia limiting the mem clock speed in P2? and why are they changing to P2 when ever CUDA or OpenCL is being used? It can't be low utilization (~50%, as in Heaven). Maybe they're using stricter timings in this mode, which might generally benefit GP-GPU more than higher throughput? This could explain why my card would take this clock speed in P0, but not in P2.

I think that stability is a more important factor than speed while using CUDA or OpenCL.
I had memory clock issues with my Gigabyte GTX780Ti OC, so perhaps NVidia is trying to avoid such issues as much as possible.
I've overclocked my GTX980 to 1407MHz @ 1.237V, the memory controller usage had risen to 56-60%, but it's still running at 3005MHz.
I'm not sure I can increase the memory clock to 3505MHz, as according to MSI afterburner my card is using 104-108% power.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38870 - Posted: 8 Nov 2014 | 11:34:49 UTC - in response to Message 38869.
Last modified: 8 Nov 2014 | 11:36:49 UTC

I was able to clock the GDDR5 to 3500MHz and over, even when OC'ing the GPU core and with power >110%. That said I did get an error at silly speeds. Initially I was just being inquisitive to understand why the 256bit bus has such a high usage for some tasks, how much of a problem it was and looking for a happy spot WRT power and efficiency.

The P-states are still a bit of a conundrum and I suspect we are looking at a simplified GUI which attempts to control more complex functions than indicated by NV Inspector. It's certainly not easy to control them.

P-states are apparently distinct from boost and it's not clear what if any impact CPU usage has on boost:
When I don't fix the P states the GPU clocks drop under certain circumstances. For example, when running a CPU MT app the GPU had been 1050MHz (no boost) and when CPU usage changed to 6 individual CPU apps the power went from 58% to 61%, GPU usage rose from 82% to 84% and the GPU clock rose to 1075MHz - still no boost (all running a long SDOERR_thrombin WU).
When I force the GPU clock to 1190MHz while using 6 CPU cores for CPU work the GPU still does not boost. Not even when I reduce the CPU usage to 5, 4 or 3 (for CPU work). By that stage GPU usage is 86 to 87% and power is 66%.
I expect a system restart would allow the GPU to boost again, but IMO boost is basically broken and the P-states don't line up properly.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38871 - Posted: 8 Nov 2014 | 14:14:09 UTC - in response to Message 38870.

skgiven:

Have you tried NVAPI for 343 branch? Might help to break through P2 memory lock and boost inconsistency or at the very least: provided new information. New data structures are included for Maxwell. Reference Documents are also available. I've seen 157 API functions with 343.98 and 163? for 344.60
NV_GPU_PERF_PSTATES20_INFO_V2
NV_GPU_CLOCK_FREQUENCIES_V1
NV_GPU_CLOCK_FREQUENCIES_V2
NV_GPU_DYNAMIC_PSTATES_INFO_EX
NV_GPU_PERF_PSTATES20_INFO_V1 Used in NvAPI_GPU_GetPstates20() interface call
NV_GPU_PERF_PSTATES20_PARAM_DELTA Used to describe both voltage and frequency deltas
NV_GPU_PERF_PSTATES_INFO_V1
NV_GPU_PERF_PSTATES_INFO_V2
NV_GPU_PSTATE20_BASE_VOLTAGE_ENTRY_V1 Used to describe single base voltage entry
NV_GPU_PSTATE20_CLOCK_ENTRY_V1 Used to describe single clock entry
NV_GPU_THERMAL_SETTINGS_V1
NV_GPU_THERMAL_SETTINGS_V2


https://developer.nvidia.com/nvapi

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38873 - Posted: 8 Nov 2014 | 14:27:54 UTC - in response to Message 38866.

(Sorry for TLDR posts)

Thanks eXaPower for those long posts. A lot of information concerning the NV architectures. Very interesting.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38877 - Posted: 9 Nov 2014 | 0:21:47 UTC

Maxwell and Kelper's: CUDA/LD/ST/SFU/TMU/ROP/warp schedulers/Instruction cache buffer/Dispatch unit/Issue/Cross Bar/Polymorph Engine"SMM" "SMX"

"SMM" 128C/32LD/ST/4ROP/32SFU/8TMU= 204
4WS/4IB/8DPU/9iss/1PME/4CrB
SMM Total 234

"SMX" 192C/16TMU/8ROP/32LD-DT/32SFU=280
1IC/4WS/8DPU/1CrB/9Issue/1PME
SMX Total 304

"SMM" equals 77% of Kelper "SMX"

Kelper GK110: 304*15= 4560
Maxwell GM204: 234*16= 3744
Kepler GK104: 304*8= 2432

GK104 65% of GM204.
GK104 53.3% of GK110
GK110/GK104 "SMX" consists of 63.1% CUDA cores
GM204 "SMM" 54.7% of CUDA cores.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38886 - Posted: 9 Nov 2014 | 21:21:04 UTC

I wrote my vendor (Galax) an email - hopefully I'm not just getting some useless standard reply.

@eXaPower: how could I "try NVAPI for 343 branch"? I know the tweak tools are using it, but have no experience on using it myself.

@SK: "but IMO boost is basically broken". Quite the contrary, I would say! It's better than ever, quicker and more powerful with a wider dynamic range. However, it's obviously not perfect yet. Besides something sometimes getting messed up (e.g. your example), there's also a problem when a card with a borderline OC is switched from a load boost state to a high one. Here the clock si raised faster than the voltage, which can leave the card in an unstable state for a short period of time. Enough of time to cause driver reset for gamers. But this could easily be fixed by changing the internal timings of boost, it's certainly not fundamentally broken.

@Zoltan: in my example it's easy to agree that the lower memory clock makes my card stable, while the higher one doesn't. But why is it stable in Heaven for hours? Running GPU-Grid I'm getting BSODs within minutes. Surely a demanding benchmark can't be this fault-tolerant and game etc. should also crash frequently.

Or put another way: if GP-GPU isn't stable at those clocks, IMO they couldn't sell the cards like this, because other software would crash / error too frequently.

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39082 - Posted: 5 Dec 2014 | 22:49:15 UTC

Finally found the time to summarize it and post at Einstein. I also sent it as bug report to nVidia.

MrS
____________
Scanning for our furry friends since Jan 2002

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39341 - Posted: 31 Dec 2014 | 23:47:06 UTC - in response to Message 38125.


I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative.


I'm doing a bit of work to improve the performance of the code for Maxwell hardware - expect an update before the end of the year.

Matt

Any information regarding the Maxwell update?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39372 - Posted: 2 Jan 2015 | 21:53:09 UTC - in response to Message 39341.
Last modified: 2 Jan 2015 | 22:00:37 UTC

In time there may well be a GTX960, 990, 960Ti, 950Ti, 950, 940, 930 &/or others, and when GM200/GM210 turns up there could be many variants in the GeForce, Quadro and Tesla ranges...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39698 - Posted: 25 Jan 2015 | 14:49:35 UTC

Nvidia released a statement about GTX970 memory allocation issues. There are reports GTX970 can't properly utilize 4GB.

For Reference: Kelper's [8] dispatch feeds {1} large crossbar into [10] issues routed to SMX CUDA/LD/ST/SFU. SMX has an issue for 32CUDA {1] 16SFU [1] for 16LD/ST units. Totaling [2] issues for 32SFU and [2} for 32LD/ST units with [6] issues for 192CUDA inside [1] SMX.

An SMM consists of 12 issues and [8] dispatch. Maxwell's crossbar split into 4 slices per SMM. [2] dispatch for each slice that feeds [3}issues: [1] for 32CUDA [1] for 8SFU [1] for 8LD/ST. SMM Totals [4] issues for 128CUDA and {4] for 32SFU and [4] for 32LD/ST units.

GTX980 consists of 64 crossbar slices for 192 issues and 128 dispatch while the 970 has 52 slices with 156 issues and 104 dispatch. A GTX780ti has 15 cross bars with 150 issues and 120 dispatch.
Keep in mind: including all resources with-in an SMX or SMM - the CUDA core percentage is higher in SMX than SMM.
GK110/GK104 "SMX" consists of 63.1% CUDA cores
GM204 "SMM" 54.7% of CUDA cores.

Nvidia states:

“The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.

We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again."

http://www.pcper.com/news/Graphics-Cards/NVIDIA-Responds-GTX-970-35GB-Memory-Issue
http://images.anandtech.com/doci/7764/SMX_575px.png
http://images.anandtech.com/doci/7764/SMMrecolored_575px.png

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 39744 - Posted: 27 Jan 2015 | 12:38:38 UTC
Last modified: 27 Jan 2015 | 12:52:26 UTC

Nvidia admits the GTX970 allows 1792KB of L2 cache and 56ROPS to be accessed.

http://www.pcper.com/reviews/Graphics-Cards/NVIDIA-Discloses-Full-Memory-Structure-and-Limitations-GTX-970

Despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980.

http://anandtech.com/show/8935/geforce-gtx-970-correcting-the-specs-exploring-memory-allocation

The benchmarking program Si software has reported 1.8MB (1792KB) of cache for GTX970 since the beginning. It was always chocked up has a bug.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40004 - Posted: 2 Feb 2015 | 12:11:03 UTC - in response to Message 39744.

For here, the MCU usage is lower on the GTX970 than on the GTX980.
Although the 970's bus width is effective 224+32 and suggestions are that 224bits are predominately used, the lower MCU usage might still be explained by the marginally favourable (by 7%) shader to bus ratio of the 970 over the 980 (1664 over 224bits vs 2048 over 256bits) and slightly lower GPU clocks.
However, and despite some interpretations, I think its possible that all 256bits of the bus are actually used for accessing up to 3.5GB of GDDR5, after which it becomes 224bits for the first 3.5GB and 32 for the next 0.5GB.

While on the whole the 2MB to 1.75MB cache doesn't appear to have any impact, it might explain some of the relative performance variation of different WU types.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40008 - Posted: 2 Feb 2015 | 21:36:56 UTC - in response to Message 40004.

Well, we can surely say that the smaller L2 cache doesn't hurt GPU-Grid performance much. But we can't say anything more specific, can we?

Regarding the "new" bandwidth: from the explanation given to Anandtech it's clear that the card can not read or write with full 256 bit. The pipe between the memory controllers and the cross bar just isn't wide enough for this. You can see it where the traffic from the memory controller with the disabled ROP/L2 is routed through the ROP/L2 from its companion. At this point 2 x 32 bit would have to share a 1 x 32 bit bus.

This bus is bidirectional, though, so you can use all memory controllers at once if the 7- and 1-partitions are performing different operations. Which I imagine is difficult to exploit efficiently via software.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 40061 - Posted: 5 Feb 2015 | 23:39:20 UTC - in response to Message 40008.

The 970 reminds me of the 660Ti, a lot.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 40077 - Posted: 6 Feb 2015 | 23:19:11 UTC - in response to Message 40061.

You're correct: Kelper disables a full memory controller and 8 ROP that comes with SMX. The GTX970 is still the best NVidia performance/cost card for amount of feature sets included.

To make matters even more confusing about SMX disabling: the GT630 (2 SMX/C.C3.5) has 64 Bit bus with 512kb cache. Another example: GTX650tiboost (4SMX/GK106) die that includes a full GK106 cache (384KB) 24ROPS and memory bus (192bit) along with 3GPC - same as 5SMX GTX660.

In the prior generation of Kepler-derived GPUs, Alben explained, any chips with faulty portions of L2 cache would need to have an entire memory partition disabled. For example, the GeForce GTX 660 Ti is based on a GK104 chip with several SMs and an entire memory partition inactive, so it has an aggregate 192-bit connection to memory, down 64 bits from the full chip's capabilities.

Nvidia's engineers built a new feature into Maxwell that allows the company to make fuller use of a less-than-perfect chip. In the event that a memory partition has a bad section of L2 cache, the firm can disable the bad section of cache. The remaining L2 cache in the memory partition can then service both memory controllers in the partition thanks to a "buddy interface" between the L2 and the memory controllers. That "buddy interface" is shown as active, in a dark, horizontal arrow, in the bottom right memory partition on the diagram. In the other three memory partitions, this arrow is grayed out because the "buddy" interface is not used.

From Damien Triolet at Hardware.fr:
The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is :
•The number of rasterizers
•The number of SMs
•The number of ROPs

On both Kepler and Maxwell each SM appears to use a 128-bit datapath to transfer pixels color data to the ROPs. Those appears to be converted from FP32 to the actual pixel format before being transferred to the ROPs. With classic INT8 rendering (32-bit per pixel) it means each SM has a throughput of 4 pixels/clock. With HDR FP16 (64-bit per pixel), each SM has a throughput of 2 pixels/clock.

On Kepler each rasterizer can output up to 8 pixels/clock. With Maxwell, the rate goes up to 16 pixels/clock (at least with the currently released Maxwell GPUs).

So the actual pixels/cycle peak rate when you look at all the limits (rasterizers/SMs/ROPs) would be :

    GTX 750 : 16/16/16
    GTX 750 Ti : 16/20/16
    GTX 760 : 32/24/32 or 24/24/32 (as there are 2 die configuration options)
    GTX 770 : 32/32/32
    GTX 780 : 40/48/48 or 32/48/48 (as there are 2 die configuration options)
    GTX 780 Ti : 40/60/48
    GTX 970 : 64/52/64
    GTX 980 : 64/64/64


Testing (forums) reveals GTX970 from different manufactures have varying results for Peak rasterization rates and Peak pixel fill rates. (at the same clock/memory speeds) Is this because GTX970 SMM/cache/ROPS/Memory structures are disabled differently form one another?

Reports include - not ALL GTX970 are affected by 224+32bit bus or the separate 512MB pool with a 20-28GB bandwidth slow down. Does this reveal NVidia changed the SMM disablement process? Is/was a second revision of GTX970 produced yet?
AnandTech decent explanations about SMX structures can be found in the GTX660ti and GTX650ti reviews to compare with SMM recent articles.
When sorting through for reliable information: Some tech forums threads misinformed 970 comments are hyperbolic - completely unprofessional and full of trolling.

Post to thread

Message boards : Graphics cards (GPUs) : Maxwell now

//