Maxwell now

Message boards : Graphics cards (GPUs) : Maxwell now
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 14 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38113 - Posted: 27 Sep 2014, 21:08:28 UTC - in response to Message 38109.  
Last modified: 27 Sep 2014, 21:10:26 UTC

In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions.

There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2.
I think this huge change doesn't come very often.
I think the GM204 can run older code more efficiently than the Fermi or the Kepler based GPUs, that's why other projects benefit more than GPUGrid, as this project had this at the transition for CUDA3.1 to CUDA4.2.
ID: 38113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38115 - Posted: 27 Sep 2014, 21:43:11 UTC - in response to Message 38111.  


In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions.



We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need.

I expect the GM204 performance will be marked improved once I have my hands on one.

Matt


I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative.
ID: 38115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile @tonymmorley

Send message
Joined: 10 Mar 14
Posts: 24
Credit: 1,215,128,812
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 38118 - Posted: 28 Sep 2014, 1:40:01 UTC

Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed.

ID: 38118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38121 - Posted: 28 Sep 2014, 11:03:35 UTC - in response to Message 38113.  
Last modified: 28 Sep 2014, 11:11:26 UTC

You don't see these jumps often. A 32core block with an individual warp scheduler, rather Kelper Flat design (sharing all cores with warp scheduler) ) is contributing to better core management, as is Maxwell redesigned crossbar, dispatch, issue.
Even so, GM204 (2048c/1664c) is providing performance levels close to (~1600s) 2880 core GK110, while ~2Hr faster than a GTX780 (2304core). I think GM204, once tuned properly- will excel.
ID: 38121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Snow Crash

Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38122 - Posted: 28 Sep 2014, 11:50:13 UTC - in response to Message 38118.  

http://www.gpugrid.net/forum_thread.php?id=3603&nowrap=true#38075
If this doesn't make sense then I would suggest waiting until the project can update the scheduler, etc. as the details of what Retvari did are a bit twisty.
Thanks - Steve
ID: 38122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38123 - Posted: 28 Sep 2014, 11:50:33 UTC - in response to Message 38113.  


There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2.
I think this huge change doesn't come very often.


That change marked the transition to a new code base. The improvement wasn't down to the change in CUDA version, so much as us introducing developing improved algorithms.

Matt
ID: 38123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38124 - Posted: 28 Sep 2014, 11:51:30 UTC - in response to Message 38118.  


Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed.


It's not ready just yet...

Matt
ID: 38124 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38125 - Posted: 28 Sep 2014, 11:54:08 UTC - in response to Message 38115.  


I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative.


I'm doing a bit of work to improve the performance of the code for Maxwell hardware - expect an update before the end of the year.

Matt
ID: 38125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38126 - Posted: 28 Sep 2014, 11:55:02 UTC - in response to Message 38112.  


I can give you remote access to my GTX980 host, if you want to.


Most kind, but I've got some on order already. Just waiting for the slow boat from China to wend its way across the Med.

Matt
ID: 38126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38128 - Posted: 28 Sep 2014, 11:58:45 UTC - in response to Message 38118.  
Last modified: 28 Sep 2014, 19:03:51 UTC

Not had the time to look into this in great detail but my tuppence worth:

The GTX980 and GTX970 are GM204 (non-super-scalar) but not GM210, so they are really the latest mid range GPU's and very much aimed at the gaming community (1/32 FP32 and 4GB).

These are generational updates to the GK104 models and both the big brother and a revision of the GM107 (GTX750 and GTX750Ti).
As such they should be seen as gaming replacements/upgrades to GPU's such as the GTX670 and even the GTX770.

As usual there is some naming inconsistency; the GTX980 is GM204 while the GTX780 is GK110, so it's not a straight comparison or upgrade there. However, if you go back to a GTX680 the comparison is somewhat more limier (GK104 vs GM204). Note that the GM107 trailblazed Maxwell.

GPU Memory Controller load: 52%

That is very high and I expect its impacting on performance, and I don't think its simply down to Bus Width (256bit) but also down to architectural changes. Was hoping this would not be the case and it's somewhat surprising saying as the GTX900's have 2MB L2 cache.

That said, some of Noelia's WU's are more memory intensive than other WU's, and on my slightly underclocked GTX770 a 147-NOELIA_20MGWT WU's load is presently 30% (which is higher than other WU's).

This suggests the GTX970 is a better choice than the GTX980, certainly when you consider the ~50% price difference (UK) for ~80% performance. That said, I would want to know what the memory controllers utilization is on a GTX970 before concluding that it is definitely a problem and recommending the 970 over the 980 (which will still do more work despite the constraints). For the 790 it might be ~43% which isn't great and suggests another problem (architecture/code) besides the 256bit limitation.

Any readings for other WU's?

In terms of performance these GPU's appear to only be on par with the high-ish end GTX700's, and performance is basically in line with the number of Cuda Cores. Again suggesting that there is some potential for app improvement.

It's possible that if the apps are recompiled with new CUDA Development Tools the new drivers will inherently offer improvements for the GTX900 series, but given that these are GM204 I'm not expecting miracles.

The big question was always going to be, What's the performance per Watt like for here?
Apparently, when gaming a GTX970 uses up to 30W less than a GTX770, and significantly outperforms it (on reviewed games) but the TDP's are 145W and 230W. So a GTX970 might use ~63% of a GTX770's power and at first glance appears to outperform it by ~10%. Thus I'm expecting the performance/Watt to be about 1.75 times that of a GTX770 (ball park). So from that point of view it's a winner, and maybe app tweaks can increase that further.

PS. My GTX770's Memory Controller load is only 22% for a trphisx3-NOELIA_SH2 WU, so I'm guessing the same type of WU would have a 38% load on a GTX980.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 38128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38136 - Posted: 28 Sep 2014, 20:54:54 UTC

Trying to fix the scheduler now - if you have a 980, please sub to the acemdbeta app, accept beta work, and try again. It won't work, but I'm logging the problems now.

Matt
ID: 38136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38138 - Posted: 28 Sep 2014, 22:06:44 UTC - in response to Message 38136.  
Last modified: 28 Sep 2014, 22:16:04 UTC

Did as you requested and it's now crunching what looks to be a test WU: MJHARVEY_TEST

EDIT: The scheduler worked!
ID: 38138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38140 - Posted: 29 Sep 2014, 0:28:07 UTC

My GTX980 has finished 2 of the beta WUs successfully.

http://www.gpugrid.net/results.php?hostid=142719
ID: 38140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38144 - Posted: 29 Sep 2014, 7:37:44 UTC - in response to Message 38140.  
Last modified: 29 Sep 2014, 7:38:36 UTC

My GTX980 has finished 2 of the beta WUs successfully.

http://www.gpugrid.net/results.php?hostid=142719

The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too.
ID: 38144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38147 - Posted: 29 Sep 2014, 9:37:28 UTC - in response to Message 38144.  

My GTX980 has finished 2 of the beta WUs successfully.

http://www.gpugrid.net/results.php?hostid=142719

The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too.


I'm getting "no tasks available" for either the beta or the short run WUs.

9/29/2014 5:36:32 AM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
9/29/2014 5:36:33 AM | GPUGRID | Scheduler request completed: got 0 new tasks
9/29/2014 5:36:33 AM | GPUGRID | No tasks sent
9/29/2014 5:36:33 AM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
ID: 38147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38149 - Posted: 29 Sep 2014, 9:58:02 UTC

It looks like Matt has just added a new beta app (version 8.45).

I'll keep my preferences for both beta (test applications) and short runs for now unless he requests just beta.
ID: 38149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
biodoc

Send message
Joined: 26 Aug 08
Posts: 183
Credit: 10,085,929,375
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38159 - Posted: 29 Sep 2014, 11:17:41 UTC - in response to Message 38149.  

Just got a test WU with the new beta app (8.45).
ID: 38159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 38160 - Posted: 29 Sep 2014, 11:43:02 UTC - in response to Message 38144.  


The 8.44 CUDA65 application is available for the short queue


Not any more. the CUDA65 error rate is suspiciously high for non GM204 cards.

Matt
ID: 38160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38347 - Posted: 7 Oct 2014, 14:28:54 UTC

Hello fellow crunchers,

are there any promising results for comparing performance GTX980 to GTX780Ti, or have we to wait for the GTX980Ti (what Jacob is doing too)?

I am still hoping for a "real" Maxwell at 20nm but seems to be not this year anymore.
Greetings from TJ
ID: 38347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXtreme Warhead

Send message
Joined: 19 Nov 12
Posts: 2
Credit: 25,526,400
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 38349 - Posted: 7 Oct 2014, 17:32:25 UTC

if the results from now are relative the same to older ones from early 2014, the performance with cuda65 doesn't look very well?

i'm only at 23% of a ,long one and time needed: 94min with a 970gtx, so that would be about 409mins for the whole wu...

that would be much much more, than a 660ti would need, because the long ones lasts about 345min...so whats the problem?




ID: 38349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 14 · Next

Message boards : Graphics cards (GPUs) : Maxwell now

©2025 Universitat Pompeu Fabra