GPU performance chart

Author	Message
Gerard Send message Joined: 26 Mar 14 Posts: 101 Credit: 0 RAC: 0 Level Scientific publications	Message 41859 - Posted: 21 Sep 2015, 12:26:41 UTC I've just included a last chart in the "Performance" tab, featuring the long WU return time per GPU type. Please check it out and tell me if the ranking is consistent with the expectable performance. ID: 41859 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 41861 - Posted: 21 Sep 2015, 14:35:04 UTC - in response to Message 41859. Last modified: 21 Sep 2015, 14:45:57 UTC I've just included a last chart in the "Performance" tab, featuring the long WU return time per GPU type. Wow! I really appreciate the work you've done. Please check it out and tell me if the ranking is consistent with the expectable performance. The first inconsistency is shown right at the first three GPUs: GPU N avg min max GTX TITAN X 40 8.07 6.9 13.3 GTX 980 Ti 290 8.99 4.0 18.1 GTX 980 747 9.89 4.8 16.5 It's more evident if you look at the bars, as the 'average' box of the GTX 980 Ti is lower than of the GTX TITAN X. But it's misleading that the GTX 980 and the GTX 980 Ti could be faster than the GTX TITAN X, because the only reason for this is that no one with a non-WDDM os has a GTX TITAN X. (Could someone please send me one, just to make this graph more consistent :) ) Also, mixing very different long runs in a graph like this will result in inconsistency especially when a batch outnumbers the other. (SDOERR_ntl9evSSXX ~4.1h, GERARD_FXCXCL12_LIG ~5.5h GERARD_VACXCL12_LIG ~5.7h on my GTX 980 Ti) Mixing OSes which have the WDDM overhead (Win Vista & newer) with OSes which don't have it (WinXP, Linux) result in these inconsistencies (another such factors are the use of SWAN_SYNC, and overclocking). If you want to avoid this, I can only suggest to make two charts: one for WDDM OSes, and one for the others. (or four, or eight - no, it makes no sense) But - against all of these inconsistencies - please don't take this chart off. Maybe a brief about different OSes and SWAN_SYNC under this chart will do. ID: 41861 · Rating: 0 · rate: / Reply Quote

Gerard Send message Joined: 26 Mar 14 Posts: 101 Credit: 0 RAC: 0 Level Scientific publications	Message 41862 - Posted: 21 Sep 2015, 14:44:47 UTC - in response to Message 41861. If the WU assignment is random (as I assume it is), a big number of samples (N) would diminish the effect of time differences related to batch, at least in terms of average. The question is: which should be the minium number of sample WU (N) in order to say the batch effect is small enough and therefore data is more reliable? For know I picked to show graphic cards with N>=30, that is considerably low but allows to plot the whole spectrum of GPU cards. I could raise it to 100... ID: 41862 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 41863 - Posted: 21 Sep 2015, 14:53:50 UTC - in response to Message 41862. Sorry, new thoughts keep popping in my mind after I press the "post reply" button. The question is: which should be the minium number of sample WU (N) in order to say the batch effect is small enough and therefore data is more reliable? The reliability is independent from the number of GPUs, when there are empty areas in the matrix of WDDM, GPU, SWAN_SYNC, WU batch, overclocking. For know I picked to show graphic cards with N>=30, that is considerably low but allows to plot the whole spectrum of GPU cards. I could raise it to 100... I think you should not raise it. ID: 41863 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 41864 - Posted: 21 Sep 2015, 15:08:36 UTC - in response to Message 41863. Last modified: 21 Sep 2015, 15:09:07 UTC Ok, this time I don't edit my post, instead I reply myself. The question is: which should be the minium number of sample WU (N) in order to say the batch effect is small enough and therefore data is more reliable? The reliability is independent from the number of GPUs, when there are empty areas in the matrix of WDDM, GPU, SWAN_SYNC, WU batch, overclocking. As GPUs gets faster, it's become more & more evident, that the GPUGrid client on a WDDM OS will never ever be as fast as on a non-WDDM OS. Reaching the GTX 980 Ti the WDDM became the main divide regarding performance, and it won't change unless NVidia integrates a (ARM) CPU in their GPU. (I expected they will do it on the Maxwells, but apparently they didn't). That's why I suggested the two GPU performance graphs (WDDM vs non-WDDM os). ID: 41864 · Rating: 0 · rate: / Reply Quote

eXaPower Send message Joined: 25 Sep 13 Posts: 293 Credit: 1,897,601,978 RAC: 0 Level Scientific publications	Message 41866 - Posted: 21 Sep 2015, 20:51:41 UTC - in response to Message 41859. I've just included a last chart in the "Performance" tab, featuring the long WU return time per GPU type. Please check it out and tell me if the ranking is consistent with the expectable performance. Thank you for the new performance chart: this will be helpful to rookie and seasoned Crunchers alike determining their GPU(s) overall performance. Would it be possible to include an account page - GPU(s) performance quartiles - with already implemented personal records list workunit quartiles? ID: 41866 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level Scientific publications	Message 41867 - Posted: 21 Sep 2015, 23:29:40 UTC I would switch the GPU set up to the vertical axis, add a scroll bar, and move the time to the horizontal axis. That way, you can put the full card names on the left side without squeezing them. I agree that we should break up each card into WDDM and non WDDM operating systems and stop there. It gets too cumbersome after that. I noticed you don't have the GTX 690, 590, 580 and 480 cards listed. I am still using one 690 card on my windows 7 computer, even though on my computer view page says the computer has 3 GTX 980 Ti's. I find that amusing! You should also switch the axis on the WU return time distribution chart, and add a scroll bat to the WU batch name axis. That way you are not squeezing the batch names. ID: 41867 · Rating: 0 · rate: / Reply Quote

Gerard Send message Joined: 26 Mar 14 Posts: 101 Credit: 0 RAC: 0 Level Scientific publications	Message 41870 - Posted: 22 Sep 2015, 8:31:25 UTC - in response to Message 41867. Last modified: 22 Sep 2015, 8:47:56 UTC I thought about horizontal charts, the problem is that google charts has not implemented this feature yet for CandleSticks charts, so I'm afraid we will have to stick to the present result (remember that if you hover over the chart you get further information about each candlestick). I'll try to make two plots, separating WDDM and non-WDDM. EDIT: after looking the database, I'm afraid to say we can't make two plots because there's not enough data for non-WDDM. Almost the totality of the data comes from Microsoft OS. ID: 41870 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 41956 - Posted: 6 Oct 2015, 11:39:31 UTC - in response to Message 41870. Last modified: 6 Oct 2015, 11:42:32 UTC Gerard Hi, Nice work on the performance page but there is improvement to be made in the form of "horizontal scrolling" any chance of getting rid of this by changing font or spacing? Sideways scrolling is something to be avoided on webpages. ID: 41956 · Rating: 0 · rate: / Reply Quote

Gerard Send message Joined: 26 Mar 14 Posts: 101 Credit: 0 RAC: 0 Level Scientific publications	Message 41957 - Posted: 6 Oct 2015, 13:35:16 UTC - in response to Message 41956. You are right... casuistry brought us to the scenario where batch names are waaay too long. I'll do something about it. ID: 41957 · Rating: 0 · rate: / Reply Quote

Wrend Send message Joined: 9 Nov 12 Posts: 51 Credit: 522,101,722 RAC: 0 Level Scientific publications	Message 42151 - Posted: 12 Nov 2015, 19:11:41 UTC Last modified: 12 Nov 2015, 19:49:53 UTC I run two long run tasks per GPU on my two SLIed Titan Black cards (I use SLI for other applications) on Windows 7, so four of them are running simultaneously. While this significantly increases the amount of time it takes a task to run, it also somewhat increases the total amount of work done per time. Sorry if that's throwing off the results. Oh, also, Titan Blacks have approximately 6144MB of memory. I'm not sure why memory seems to often be listed incorrectly, even by DirectX. I know it's there, because I occasionally use nearly that much of it. My BOINC Cruncher, Minecraft Multiserver, Mobile Device Mainframe, and Home Entertainment System/Workstation: http://www.overclock.net/lists/display/view/id/4678036# ID: 42151 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 42154 - Posted: 13 Nov 2015, 1:47:01 UTC - in response to Message 42151. I run two long run tasks per GPU on my two SLIed Titan Black cards (I use SLI for other applications) on Windows 7, so four of them are running simultaneously. While this significantly increases the amount of time it takes a task to run, it also somewhat increases the total amount of work done per time. Sorry if that's throwing off the results. All your Titan Blacks seem to be doing is throwing errors at the moment and this doesn't increase the total amount of work done. Am I missing something? ID: 42154 · Rating: 0 · rate: / Reply Quote

Wrend Send message Joined: 9 Nov 12 Posts: 51 Credit: 522,101,722 RAC: 0 Level Scientific publications	Message 42155 - Posted: 13 Nov 2015, 5:02:19 UTC - in response to Message 42154. Last modified: 13 Nov 2015, 5:15:00 UTC I run two long run tasks per GPU on my two SLIed Titan Black cards (I use SLI for other applications) on Windows 7, so four of them are running simultaneously. While this significantly increases the amount of time it takes a task to run, it also somewhat increases the total amount of work done per time. Sorry if that's throwing off the results. All your Titan Blacks seem to be doing is throwing errors at the moment and this doesn't increase the total amount of work done. Am I missing something? I'm not sure how that's relevant. My computer crashed today while I had it open to clean it, corrupting all the tasks in progress. The majority of work that it's done for both BOINC and GPUGrid has been with this configuration. ... Actually, it just crashed again. This is the first I've been using it after having moved, so something may have damaged it in transit, or the new drivers and tuning software I updated to aren't agreeing with something. Either way, it's still irrelevant to the post I made previously, but thanks for your concern. My BOINC Cruncher, Minecraft Multiserver, Mobile Device Mainframe, and Home Entertainment System/Workstation: http://www.overclock.net/lists/display/view/id/4678036# ID: 42155 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 42170 - Posted: 14 Nov 2015, 23:13:23 UTC - in response to Message 42155. I know there are few, but as well as Linux systems, Win XP and 2003 servers do not incur a WDDM overhead. There are many factors that need to be considered: Found differences in the WDDM overhead on Win 2008+ servers, 5.5 to~8% loss rather than ~11% loss; albeit a while ago. There’s probably not too many attached, however it could skew the results slightly. Also noticed that the bigger (faster) the card the greater the WDDM overhead, as have others. This is probably a consequence of multiple other factors. Observed a noticeable performance variation between DDR3 and DDR2 systems, and would expect some difference between DDR4 and DDR3 systems (especially with older RAM). CPU architecture (what’s on the die [bus]) and frequency differences (2GHz vs 3.9GHz) are a factor too. Different manufacturers/release versions/bins even of the one GPU cause some variation (usually via clock rates) - no 2 GPU's are identical. Settings via software such as Afterburner can also impact performance, sometimes via temps (clock throttling) or GPU Bus. PCIE width + Gen are a factor and again it’s more noticeable with bigger cards and exasperated by other factors; which are multiple (0.940.98.96% performance = 0.88 performance, or a 12% loss). Some people run 2 (or more) WU’s at a time on the one card. Does the stats account for this? 2 WU's are definitely run simultaneously on GTX Titan X cards – would not make sense to use them otherwise! Other factors such as drive speed, what else is running on the system (HD video, defrags, drive scans or searches), CPU availability, SWAN_SYNC and Boinc settings can impact on performance. Occasionally drivers/app versions (CUDA dependent) bring improvements [speed or stability]. Some of these were exasperated by GPU architecture bottlenecks (bus/bandwidth); which is generation dependent. Are mobile versions separate or not? Same architectures (mostly) but different clock settings and weaker hardware backing it up. Downclocking (OS/driver or user), lots of suspend and resume due to user activity. Some of the above variations vary themselves by WU type (GPU RAM, bus or CPU dependency especially) and everything is subject to app changes. Boinc may still misrepresent all cards as being of GPU0 type (though I think it’s more complex than that), however the app knows better! I see that the GTX750Ti is listed twice based on GDDR amount; 2048 and 2047MB. Ditto for the 960, 780Ti… (driver/OS?). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 42170 · Rating: 0 · rate: / Reply Quote