Maxwell now

Author	Message
Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 38113 - Posted: 27 Sep 2014, 21:08:28 UTC - in response to Message 38109. Last modified: 27 Sep 2014, 21:10:26 UTC In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions. There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2. I think this huge change doesn't come very often. I think the GM204 can run older code more efficiently than the Fermi or the Kepler based GPUs, that's why other projects benefit more than GPUGrid, as this project had this at the transition for CUDA3.1 to CUDA4.2. ID: 38113 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38115 - Posted: 27 Sep 2014, 21:43:11 UTC - in response to Message 38111. In you're opinion: how can GPUGRID occupied SM/SMX/SMM be further enhanced, and refined for generational (CUDA C.C) differences? Compatibility is important, as is finding the most efficient code path from CUDA programming. How can we further advance ACEMD? CUDA 5.0/PTX3.1~~~>6.5/4.1 provides new commands/instructions. We have cc-specific optimisations for each of the most performance sensitive kernels. Generally don't use any of the features introduced post CUdA 4.2 though, nothing there we particularly need. I expect the GM204 performance will be marked improved once I have my hands on one. Matt I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative. ID: 38115 · Rating: 0 · rate: / Reply Quote

@tonymmorley Send message Joined: 10 Mar 14 Posts: 24 Credit: 1,215,128,812 RAC: 0 Level Scientific publications	Message 38118 - Posted: 28 Sep 2014, 1:40:01 UTC Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed. ID: 38118 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 38121 - Posted: 28 Sep 2014, 11:03:35 UTC - in response to Message 38113. Last modified: 28 Sep 2014, 11:11:26 UTC You don't see these jumps often. A 32core block with an individual warp scheduler, rather Kelper Flat design (sharing all cores with warp scheduler) ) is contributing to better core management, as is Maxwell redesigned crossbar, dispatch, issue. Even so, GM204 (2048c/1664c) is providing performance levels close to (~1600s) 2880 core GK110, while ~2Hr faster than a GTX780 (2304core). I think GM204, once tuned properly- will excel. ID: 38121 · Rating: 0 · rate: / Reply Quote

Snow Crash Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level Scientific publications	Message 38122 - Posted: 28 Sep 2014, 11:50:13 UTC - in response to Message 38118. http://www.gpugrid.net/forum_thread.php?id=3603&nowrap=true#38075 If this doesn't make sense then I would suggest waiting until the project can update the scheduler, etc. as the details of what Retvari did are a bit twisty. Thanks - Steve ID: 38122 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38123 - Posted: 28 Sep 2014, 11:50:33 UTC - in response to Message 38113. There was a huge jump in performance (around 40%) when the GPUGrid app was upgraded from CUDA3.1 to CUDA4.2. I think this huge change doesn't come very often. That change marked the transition to a new code base. The improvement wasn't down to the change in CUDA version, so much as us introducing developing improved algorithms. Matt ID: 38123 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38124 - Posted: 28 Sep 2014, 11:51:30 UTC - in response to Message 38118. Hey guys, I can't get any work for my two GTX 980's. Any thoughts, I'm a bit lost in the feed. It's not ready just yet... Matt ID: 38124 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38125 - Posted: 28 Sep 2014, 11:54:08 UTC - in response to Message 38115. I found one of many papers written by you and others-- "ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale" during golden days of GT200. A Maxwell update: if applicable- would be very informative. I'm doing a bit of work to improve the performance of the code for Maxwell hardware - expect an update before the end of the year. Matt ID: 38125 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38126 - Posted: 28 Sep 2014, 11:55:02 UTC - in response to Message 38112. I can give you remote access to my GTX980 host, if you want to. Most kind, but I've got some on order already. Just waiting for the slow boat from China to wend its way across the Med. Matt ID: 38126 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 38128 - Posted: 28 Sep 2014, 11:58:45 UTC - in response to Message 38118. Last modified: 28 Sep 2014, 19:03:51 UTC Not had the time to look into this in great detail but my tuppence worth: The GTX980 and GTX970 are GM204 (non-super-scalar) but not GM210, so they are really the latest mid range GPU's and very much aimed at the gaming community (1/32 FP32 and 4GB). These are generational updates to the GK104 models and both the big brother and a revision of the GM107 (GTX750 and GTX750Ti). As such they should be seen as gaming replacements/upgrades to GPU's such as the GTX670 and even the GTX770. As usual there is some naming inconsistency; the GTX980 is GM204 while the GTX780 is GK110, so it's not a straight comparison or upgrade there. However, if you go back to a GTX680 the comparison is somewhat more limier (GK104 vs GM204). Note that the GM107 trailblazed Maxwell. GPU Memory Controller load: 52% That is very high and I expect its impacting on performance, and I don't think its simply down to Bus Width (256bit) but also down to architectural changes. Was hoping this would not be the case and it's somewhat surprising saying as the GTX900's have 2MB L2 cache. That said, some of Noelia's WU's are more memory intensive than other WU's, and on my slightly underclocked GTX770 a 147-NOELIA_20MGWT WU's load is presently 30% (which is higher than other WU's). This suggests the GTX970 is a better choice than the GTX980, certainly when you consider the ~50% price difference (UK) for ~80% performance. That said, I would want to know what the memory controllers utilization is on a GTX970 before concluding that it is definitely a problem and recommending the 970 over the 980 (which will still do more work despite the constraints). For the 790 it might be ~43% which isn't great and suggests another problem (architecture/code) besides the 256bit limitation. Any readings for other WU's? In terms of performance these GPU's appear to only be on par with the high-ish end GTX700's, and performance is basically in line with the number of Cuda Cores. Again suggesting that there is some potential for app improvement. It's possible that if the apps are recompiled with new CUDA Development Tools the new drivers will inherently offer improvements for the GTX900 series, but given that these are GM204 I'm not expecting miracles. The big question was always going to be, What's the performance per Watt like for here? Apparently, when gaming a GTX970 uses up to 30W less than a GTX770, and significantly outperforms it (on reviewed games) but the TDP's are 145W and 230W. So a GTX970 might use ~63% of a GTX770's power and at first glance appears to outperform it by ~10%. Thus I'm expecting the performance/Watt to be about 1.75 times that of a GTX770 (ball park). So from that point of view it's a winner, and maybe app tweaks can increase that further. PS. My GTX770's Memory Controller load is only 22% for a trphisx3-NOELIA_SH2 WU, so I'm guessing the same type of WU would have a 38% load on a GTX980. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 38128 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38136 - Posted: 28 Sep 2014, 20:54:54 UTC Trying to fix the scheduler now - if you have a 980, please sub to the acemdbeta app, accept beta work, and try again. It won't work, but I'm logging the problems now. Matt ID: 38136 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 38138 - Posted: 28 Sep 2014, 22:06:44 UTC - in response to Message 38136. Last modified: 28 Sep 2014, 22:16:04 UTC Did as you requested and it's now crunching what looks to be a test WU: MJHARVEY_TEST EDIT: The scheduler worked! ID: 38138 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 38140 - Posted: 29 Sep 2014, 0:28:07 UTC My GTX980 has finished 2 of the beta WUs successfully. http://www.gpugrid.net/results.php?hostid=142719 ID: 38140 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 38144 - Posted: 29 Sep 2014, 7:37:44 UTC - in response to Message 38140. Last modified: 29 Sep 2014, 7:38:36 UTC My GTX980 has finished 2 of the beta WUs successfully. http://www.gpugrid.net/results.php?hostid=142719 The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too. ID: 38144 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 38147 - Posted: 29 Sep 2014, 9:37:28 UTC - in response to Message 38144. My GTX980 has finished 2 of the beta WUs successfully. http://www.gpugrid.net/results.php?hostid=142719 The 8.44 CUDA65 application is available for the short queue, perhaps you should give it a try too. I'm getting "no tasks available" for either the beta or the short run WUs. 9/29/2014 5:36:32 AM \| GPUGRID \| Requesting new tasks for CPU and NVIDIA GPU 9/29/2014 5:36:33 AM \| GPUGRID \| Scheduler request completed: got 0 new tasks 9/29/2014 5:36:33 AM \| GPUGRID \| No tasks sent 9/29/2014 5:36:33 AM \| GPUGRID \| No tasks are available for Short runs (2-3 hours on fastest card) ID: 38147 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 38149 - Posted: 29 Sep 2014, 9:58:02 UTC It looks like Matt has just added a new beta app (version 8.45). I'll keep my preferences for both beta (test applications) and short runs for now unless he requests just beta. ID: 38149 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 38159 - Posted: 29 Sep 2014, 11:17:41 UTC - in response to Message 38149. Just got a test WU with the new beta app (8.45). ID: 38159 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 38160 - Posted: 29 Sep 2014, 11:43:02 UTC - in response to Message 38144. The 8.44 CUDA65 application is available for the short queue Not any more. the CUDA65 error rate is suspiciously high for non GM204 cards. Matt ID: 38160 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level Scientific publications	Message 38347 - Posted: 7 Oct 2014, 14:28:54 UTC Hello fellow crunchers, are there any promising results for comparing performance GTX980 to GTX780Ti, or have we to wait for the GTX980Ti (what Jacob is doing too)? I am still hoping for a "real" Maxwell at 20nm but seems to be not this year anymore. Greetings from TJ ID: 38347 · Rating: 0 · rate: / Reply Quote

eXtreme Warhead Send message Joined: 19 Nov 12 Posts: 2 Credit: 25,526,400 RAC: 0 Level Scientific publications	Message 38349 - Posted: 7 Oct 2014, 17:32:25 UTC if the results from now are relative the same to older ones from early 2014, the performance with cuda65 doesn't look very well? i'm only at 23% of a ,long one and time needed: 94min with a 970gtx, so that would be about 409mins for the whole wu... that would be much much more, than a 660ti would need, because the long ones lasts about 345min...so whats the problem? ID: 38349 · Rating: 0 · rate: / Reply Quote