Message boards :
News :
New acemdshort app 846
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
I've promoted the CUDA65 app version 846 from beta to short. You'll only get this if you have a Kepler or Maxwell card, and have a CUDA 6.5-capable driver, in practice rev 343 or higher. Please post any problems or regressions here. Matt |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Looking good. Boinc reporting 0.90 worth of CPU for 6.5, but task manager only at 1-2%. For Beta tasks boinc reported same, and task showed 1-2%. |
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 4 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
First NOELIA_SH2 WU on GTX980 completed & validated with beta app. http://www.gpugrid.net/result.php?resultid=13145399 |
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 4 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Looking good. Boinc reporting 0.90 worth of CPU for 6.5, but task manager only at 1-2%. For Beta tasks boinc reported same, and task showed 1-2%. I saw that too on windows 8.1 so I added the environment variable swan_sync with a value of 0 and rebooted. Now I see ~100% core usage. I'm not sure if it will make a difference but it makes me feel better. See this thread for discussion of swan_sync: http://www.gpugrid.net/forum_thread.php?id=2123 |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
First NOELIA_SH2 WU on GTX980 completed & validated with beta app. But biodoc, do you have run times of these WU's on non-Maxwell to compare? That is where I am very interested in. Greetings from TJ |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
biodoc, thanks for the tip---You're Win8.1 system has WDDM tax of ~7% compared XP. You're Win8.1 is blazing fast. Have you tested you're GTX780Ti with new short CUDA 6.5? I'm very curious to see how well GTX 780ti performs with new refined code compared to GM204. Also,GM204 shows how Maxwell able to carry more threads (atoms) per SMM vs. SMX. Very impressive to see GTX970 (1664c/104TMU/64ROP) completing tasks in similar or faster times, than GK110 GTX780--(2304c/192TMU/48ROP at Beta APP performance chart. Considering the amount TMU for GTX970 are less, and ACEMD TMU usage is high, this shows how a 145TDP board performing at 225TDP GTX780 levels or above. For anyone with higher taxes/ energy rates, the GTX970 looks to be choice card. (unless future GTX960 doesn't lose more than a couple SMM compared to GTX 970) Excellent code refinement by Matt. Variable swan are for (you're) Higher end cards. For my lowly (2) GK107--- Swan_sync makes no difference. |
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 4 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
First NOELIA_SH2 WU on GTX980 completed & validated with beta app. No, my 780TI is on a linux box and exclusively runs the long WUs. The beta app is for Windows only so we need a data from a 780Ti using the new app for a fair comparison, I think. |
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 4 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The NOELIA_SH2 WU I just finished is ranked #6 in the new Performance section. 2.79 hours. http://www.gpugrid.net/performance.php#! Windows 8.1, nvidia driver 344.16. For the NOELIA_SH2 WUs, my GPU load is only 76%, Memory controller load is 25% and 76% TDP. At 65% fan speed, the GPU temp is 62C. Also Swan_Sync=0 I'm anxious to test in linux, but I can wait. |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2379 Credit: 16,799,896,044 RAC: 21,325,873 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Excellent code refinement by Matt. Was there any code refinement between 8.44 and and 8.46? |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
In the "Maxwell now" thread he mentioned----
I'm assuming there was. |
Send message Joined: 8 Feb 13 Posts: 5 Credit: 6,750 RAC: 0 Level ![]() Scientific publications ![]() |
Nope, that's just a rebuild, modulo a fix for a compiler regression. The good stuff is still fermenting. M |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Can't wait for recipe to be added, when the grapes are wine. |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
While searching for runtimes/processing rates for GTX980/970-- I noticed a abnormal variance concerning the 8.46 short app "Average processing rate". This number 653.09405051673 was taken from host113695 with a (GTX980). While my GT650m "average processing rate" is 71.024125852776 for the same CUDA6.5/8.46app. What's the formula for average processing rate? How does a much more powerful GPU have the smaller number? If I'm misunderstanding the numbers, could someone explain how a GTX980 shows 11digits after decimal point, while a GT650m has 12? A GTX 980 finishes a NoeliaSH2 task in 7,500-8,000s. A GT650m completes same task in 59,000-65,000s. In GFLOPS terms- a GTX 980 is 7.75 GT650m worth of cards. FYI: For 8.46 Beta app- host113695 GTX980 has a 1627.7621166778 processing rate, while my GT650m processing rate is 193.43498955808 This same user GTX780Ti CUDA6.0/8.41 long app processing rate is 310.15025279058, for the same app a GT650m is 41.592640653642- again showing more digits after decimal point. |
![]() Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
eXaPower wrote: ACEMD TMU usage is high I don't have insight into the actual code, but TMUs are Texture Mapping Units. They are fixed functions units to map textures to geometry and I highly doubt they can be exploited for GPU-Grid. The same applies to ROPs: these are Raster Output Units, i.e. they deal with assembling the finalized images ("pushing the pixels"). We're not pixelating anything at GPU-Grid or in other GP-GPU apps. Think of GP-GPU work of endless loops of matrix and vector operations, which are all performed on the shaders. eXaPower wrote: could someone explain how a GTX980 shows 11digits after decimal point, while a GT650m has 12? That seems to be simply caused by the number of total digits being equal to 14. BTW: consider the variance in WU completion times. You can easily round those numbers to 3 significant digits, anything else will be drowned in "experimental noise" anyway: GTX980: 653.09405051673 -> 653 GT650m: 71.024125852776 -> 71.0 This also answers your other question: How does a much more powerful GPU have the smaller number? It doesn't, see the numbers above. BTW2: you also mention a factor of about 8 in performanc ebetween these cards, based on other measures. The factor between the processing rates quoted above matches this, approximately. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
ETA- Thank you for explaining what processing rate numbers mean. Reason I mentioned Texture Mapping units-- http://multiscalelab.org/gianni/publications?action=AttachFile&do=get&... Texture Mapping Units are "capable of performing linear interpolation of values into multidimensional (up to 3D) arrays of floating point data." Quoted from from Matt's "Accelerating Biomolecular Dynamics in the Microsecond Time Scale"-- "The texture units are used to assist the calculation of the electrostatic and van der Waals terms by providing linearly interpolated values for the radial components of those functions from lookup tables." Along other processes. |
![]() Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks for pointing that out! The paper is from 2009, but I suspect the code has been enhanced since then, but not radically changed. Matt, can you briefly (or as lengthy as your time allows) comment on usage of non-shader blocks in GPUs? And regarding the current question: are you still using the TMUs for table lookup? (what a neat trick! :) And does the reduced number of TMUs in Maxwell affect performance? I suspect not, unless you're constantly hammering the TMUs with requests. MrS Scanning for our furry friends since Jan 2002 |
©2025 Universitat Pompeu Fabra