Message boards :
Server and website :
Performance Tab still broken
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 21 Feb 20 Posts: 1112 Credit: 40,764,483,595 RAC: 7,379,314 Level ![]() Scientific publications ![]() |
From my testing with my cards (RTX 2070 and faster) I’ve determined that PCIe 3.0 x4 should be the lowest link you should use. Preferably PCIe 3.0 x8 or better for full speed crunching. I noticed no difference in 3.0x8 and 3.0x16. When comparing link speeds make sure you’re accounting for both link gen and link width. Saying just x4 or x8 or x16 alone is rather meaningless. PCIe 2.0 is half the speed as 3.0 at the same width, and 1.0 is half the speed as 2.0 (1/4 speed as 3.0). Also if you’re using an 3.0x4 slot on an Intel board, that’s likely from the chipset. In which case you will likely have LESS than 3.0x4 actually available to the device depending on what other devices are being serviced by the chipset. The DMI link between the chipset and the CPU only has the equivalent of PCIe 3.0x4 available for ALL devices total (other PCIe slots, SATA devices, networking, etc). You really won’t get the full speed from a chipset based slot because of this. Don’t forget to account for CPU load also. If your CPU is maxed out, you’ll see slow and inconsistent speeds. ![]() |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Is anyone else seeing these? 0_2-GERARD_pocket_discovery_08522dfc_f243_41c4_b232_587af6264fe7-0-3-RND1589 I and Lara ran one in tandem. One of my Dell GTX 1650 cards (8x slot) finished in 28,980.25 seconds. Lara's RTX 2080ti finished in 6,761.39 seconds. These look like fun WUs to track. They're direct comparisons of machines. |
![]() Send message Joined: 13 Dec 17 Posts: 1404 Credit: 8,898,646,190 RAC: 7,548,451 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Good point to emphasize where the slot is serviced by. My card at the bottom of the motherboard is a PCIE Gen2 X4 slot serviced by the chipset. Only can manage 5.0 GT/second compared to the other cpu fed slots that can do 8.0 GT/second. |
Send message Joined: 22 Oct 20 Posts: 4 Credit: 34,434,982 RAC: 0 Level ![]() Scientific publications ![]() |
... |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have captured data from the top 1200 hosts - Volunteer / Hosts tab. Breakdown of top 1200 host's data: - 213 hosts have multiple GPUs (which are excluded from further analysis) - 108,000 completed tasks captured from remaining 987 hosts - 1322 completed tasks are ADRIA (1.2% of total tasks captured) Script runtime: 2 hours 43 minutes (Bit longer than anticipated, but still reasonable when considering the volume of data captured) Scan started: 11th November 2020, 23:20 UTC Below is a summary of ADRIA task runtimes for each GPU type. NOTE: Sorted by fastest Average Runtime All Run Times are in seconds No. Min Max Average Rank GPU Tasks Runtime Runtime Runtime --------------------------------------------------------------- 1 Quadro RTX 8000 1 8691 8691 8691 2 TITAN RTX 1 10834 10834 10834 3 RTX 2080 Ti 119 8168 37177 11674 4 Quadro RTX 5000 1 11972 11972 11972 5 RTX 2080 81 9841 17570 12288 6 RTX 2080 SUPER 43 10774 16409 12290 7 RTX 2070 SUPER 86 10690 16840 12828 8 TITAN V 11 12473 14216 12983 9 TITAN Xp COLLECTORS 2 12620 14501 13560 10 RTX 2060 SUPER 40 11855 22760 14268 11 TITAN X Pascal 2 14348 15043 14696 12 RTX 2070 83 11488 46588 15198 13 GTX 1080 Ti 87 11011 56959 16527 14 RTX 2060 68 12676 31984 16992 15 RTX 3090 3 17081 17914 17383 16 Quadro RTX 4000 1 17509 17509 17509 17 GTX 1080 113 13192 107431 17552 18 GTX 1660 Ti 58 14892 28114 17783 19 GTX 1070 Ti 33 14612 28911 18301 20 GTX 1660 SUPER 41 15903 30930 18664 21 Tesla P100-PCIE-12GB 3 18941 19283 19064 22 GTX 1660 24 17349 26014 19430 23 GTX 1070 103 15787 57960 20168 24 RTX 2070 with Max-Q 4 17142 29194 20429 25 GTX 1660 Ti with Max-Q 2 19940 20928 20434 26 GTX 1650 SUPER 21 18364 25123 20799 27 Quadro M6000 1 23944 23944 23944 28 Quadro P4000 8 21583 26749 24702 29 GTX 980 9 23214 29135 25218 30 Tesla M60 5 25897 26480 26153 31 GTX 1060 6GB 62 21259 54456 26329 32 GTX 980 Ti 11 19789 44804 26637 33 GTX 1650 34 24514 38937 27715 34 GTX 1060 3GB 63 13035 55834 28907 35 GTX TITAN Black 2 30945 30951 30948 36 GTX 780 1 32439 32439 32439 37 GTX 970 31 27366 82557 33367 38 GTX TITAN X 4 21600 45713 33522 39 Quadro P2000 3 33018 37158 34444 40 Quadro K6000 1 34626 34626 34626 41 Tesla K20Xm 1 39713 39713 39713 42 GTX 960 16 37297 47822 40582 43 GTX 1050 Ti 24 36387 66552 41365 44 GTX TITAN 1 41409 41409 41409 45 P104-100 1 43979 43979 43979 46 GTX 1050 6 44597 47854 46514 47 Quadro P1000 1 48555 48555 48555 --------------------------------------------------------------- |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Same as previous post, this time for GERARD tasks Only 129 GERARD tasks completed in the last 10 days for the hosts sampled. (Hosts keep task list for the last 10 days) NOTE: Sorted by fastest Average Runtime All Run Times are in seconds No. Min Max Average Rank GPU Tasks Runtime Runtime Runtime --------------------------------------------------------------- 1 RTX 2080 Ti 12 6220 11696 7554 2 TITAN V 2 7741 9502 8622 3 RTX 2070 SUPER 5 8336 9698 8920 4 RTX 2080 11 7836 11529 9180 5 RTX 2080 SUPER 6 8339 11847 9660 6 GTX 1080 Ti 12 7917 15769 10656 7 RTX 2070 4 9659 11438 10876 8 RTX 2080 with Max-Q 1 11951 11951 11951 9 RTX 2060 SUPER 6 9932 16892 12145 10 RTX 2060 1 13404 13404 13404 11 GTX 1080 11 11478 19816 15457 12 Quadro RTX 3000 1 15839 15839 15839 13 GTX 1660 SUPER 4 15502 17127 16005 14 GTX 1660 Ti 4 13495 18896 16074 15 GTX 1070 8 12329 19960 16089 16 GTX 1660 2 16200 16997 16598 17 GTX 1070 Ti 7 13928 20027 16640 18 Quadro P4000 2 17033 18841 17937 19 Quadro P4200 1 19012 19012 19012 20 Tesla M60 1 21491 21491 21491 21 GTX 1060 6GB 9 20386 29152 23884 22 GTX 1060 3GB 5 19457 30116 24248 23 GTX 1650 SUPER 4 20357 28377 24796 24 GTX 970 8 24052 36054 28795 25 GTX 1050 Ti 1 43300 43300 43300 26 GTX 950 1 59844 59844 59844 --------------------------------------------------------------- |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Thanks a million for your efforts here, rod4x4! This IMHO is the most useful analysis yet. |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks a million for your efforts here, rod4x4! Thank you for your feedback. I am considering a similar comparison for MDAD tasks. This should be ready by end of November. |
![]() ![]() Send message Joined: 24 Sep 10 Posts: 591 Credit: 11,738,036,510 RAC: 10,299,581 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This IMHO is the most useful analysis yet. +1 My admiration for your work. |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This IMHO is the most useful analysis yet. Thanks. |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Too bad that BOINC doesn't individually identify multiple GPUs the way FAHCore does (slots), so data from those hosts could be used also. It would give an idea of how much multiple cards in a host suffer from performance loss. Unfortunately, F@H has poor statistical reporting in comparison to BOINC. |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
From the same dataset used in the previous ADRIA and GERARD comparisons, here is a comparison of MDAD task using average runtime. Due to the variablility of runtimes for the MDAD tasks, this comparison should not be taken too seriously. (although ranking is consistant with expections) NOTE: GPU types with less than 500 tasks have been excluded Runtimes are in seconds No. Average Rank GPU Tasks Runtime --------------------------------------- 1 RTX 2080 Ti 9,423 1,721 2 RTX 2080 SUPER 3,406 1,785 3 TITAN V 1,032 1,812 4 RTX 2080 5,853 1,855 5 RTX 2070 SUPER 6,298 1,970 6 RTX 2060 SUPER 3,206 2,250 7 GTX 1080 Ti 8,766 2,317 8 RTX 2070 5,254 2,354 9 RTX 2060 3,423 2,760 10 GTX 1070 Ti 3,396 2,821 11 GTX 1080 8,873 2,825 12 GTX 1660 Ti 3,548 3,012 13 GTX 1660 SUPER 3,002 3,148 14 GTX 1070 8,627 3,296 15 GTX 1660 1,808 3,363 16 GTX 980 Ti 924 3,738 17 GTX 1650 SUPER 1,999 3,901 18 Quadro P4000 645 4,059 19 GTX 980 811 4,440 20 GTX 1060 6GB 6,051 4,597 21 GTX 1060 3GB 5,393 4,864 22 GTX 1650 2,934 5,035 23 GTX 970 4,015 5,335 24 GTX 1050 Ti 1,975 7,220 25 GTX 960 969 7,365 26 GTX 1050 566 7,916 --------------------------------------- |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
I saw where Toni has responded to ServiEngineIC on the wishlist thread (https://www.gpugrid.net/forum_thread.php?id=5025&nowrap=true#55709) so I trust that we we will be placated eventually and our toy restored to educate and amuse us. Meanwhile, thanks to the loyal, diligent efforts of rod4x4 in this thread, we have a pretty good model of relative performance and efficiency on which to base a GPU buying decision. Three virtual (social distanced) cheers for rod4x4! 🍺🍺🍺 |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I saw where Toni has responded to ServiEngineIC on the wishlist thread (https://www.gpugrid.net/forum_thread.php?id=5025&nowrap=true#55709) so I trust that we we will be placated eventually and our toy restored to educate and amuse us. Cheers! 🍺🍺🍺 |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Summerizing PCIe slot performance --------------------------------- ServicEnginIC posted earlier in this thread: http://www.gpugrid.net/forum_thread.php?id=5194&nowrap=true#55707 * Graphics card ASUS TUF-GTX1650S-4G-GAMING, based on GTX1650 SUPER GPU, running at PCIE gen 3.0 x16 slot, got three. Execution times (seconds): WU#1: 18628,53 ; WU#2: 18660,36 ; WU#3: 18627,83 To paraphrase in a different way, to support the above observations, I have 2 similar GTX 1060 3GB cards, both power limited to 67Watt. The difference is the host CPU and PCIe slot technology. Host 1: https://www.gpugrid.net/show_host_detail.php?hostid=483378 Fitted with 10W fanless CPU and a PCIe 2.0 x16 slot, x2 capable. Rated PCIe throughtput is 0.8GB/s GPU RAC - 254,000 (approx) Host 2: https://www.gpugrid.net/show_host_detail.php?hostid=483296 Fitted with 35W Athlon CPU and a PCIe 3.0 x16 slot, x4 capable (limited by CPU not motherboard). Rated PCIe throughput is 3.94GB/s GPU RAC - 268,000 (approx) So processor and PCIe re-vision affects the output by 14,000 RAC. A 5% performance loss for the less capable Host. Difference is enough to be noticed, but not enough to be disappointing. Obviously the faster the card, the bigger the loss, so best to put low performance GPUs on low performance hardware. This comparison highlights the importance of matching the abilities of the host to the GPU. Host 2 is still a very modest build, yet enough for a GTX 1060 3GB. On a sided note (for Gpugrid ACEMD3 work units), GTX 750 Ti PCIe throughput 1.2 GB/s (max). PCIe 2.0 x4 capable slot recommended. GTX 960 PCIe throughput 2.2 GB/s (max). PCIe 3.0 x4 capable slot recommended. GTX 1060 3GB PCIe throughput 2.5 GB/s (max). PCIe 3.0 x4 capable slot recommended. GTX 1650 Super and GTX 1660 Super PCIe throughput 5.5 GB/s (max). PCIe 3.0 x8 capable slot recommended. As Ian&Steve C. has stated here: http://www.gpugrid.net/forum_thread.php?id=5194&nowrap=true#55710 highend cards are also quite happy with PCIe 3.0 x8 capable slots. As a point of interest, ACEMD2 work unit PCIe throughput was higher, in some cases more than twice the above quoted figures. |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
If anybody has time, look at my Host 514522. https://www.gpugrid.net/show_host_detail.php?hostid=514522 It's a Z-270A ASUS Prime MB, i7-7700K, G.Skill DDR4-3200 (2x8GB), Samsung Evo 970 1st gen. 500GB PCIE3 M.2 SSD (Win10 pro), (2) ASUS Dual OC GTX 1060 3GB GPUs, According to Afterburner's HW Monitor, my GPU1 (x16 slot) draws around 90-97 watts depending on the WU. GPU2, an identical (newer) card in the x8 slot below draws 95-105W. The onboard Intel graphics (GPU3) are the windows graphics, leaving both GPUs 1&2 unhindered (or so I have assumed). The combined RAC of the 3GB GTX 1060s is presently hovering around 589,000 (with 5 threads of WCG apps running on the CPU). I find that encouraging, yet I wonder what factors contributed the most to it. Does the Slot have anything to do with the wattage? I know these cards get additional power from the PSU, but what I see here mystifies me. Should I assume that the GPU drawing more power is doing more work? |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If anybody has time, look at my Host 514522. https://www.gpugrid.net/show_host_detail.php?hostid=514522 I have been testing a script for Multi GPU hosts. Random hosts were tested, it so happens host 514522 was tested. Below data was collected 24th November 2020 at 7:00 UTC. BOINC No. Runtime Credit Average Average Device Tasks TTL TTL Credit Runtime ---------------------------------------------------------- 0 139 603,318 2,020,430 289,342 4,340 1 142 610,021 2,114,351 299,465 4,296 ---------------------------------------------------------- You will need to confirm which card is Device 0 and Device 1 by looking in the BOINC Manager. When two GPUs are fitted, I find the GPU closest to the CPU (generally PCIe slot 1 on consumer motherboards) will run hotter. Nvidia code on the GPU will start to reduce Max clock of the GPU by 13Mhz after GPU reaches 55 degrees, and furthers reduces by 13Mhz every 5 degrees thereafter. This may explain why less power draw on PCIe slot 1. Have you checked the temperatures of the GPUs? I have also observed that WUs will having varying power draw at different stages in the WU processing, so it will be hard to have a direct comparison just by the power draw. The stats indicate that the cards are very close in performance. There is only a 3.4% difference in output, so this is well within normal variances that could be caused by WU variations, silicon lottery and thermal throttling. So overall, performance is not too bad considering all these factors. |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Have you checked the temperatures of the GPUs? My GPU 0 (upper slot) is running a steady 65C at 77% power and GPU 1 in the lower slot runs a steady 57C at 82% power, so the temp induced power limiting seems coherent with what you told me. I've tried increasing the fan curves, but it had no effect for lowering temps. The CPUID hardware monitor shows a constant reliable voltage limit and occasional power limits on both GPUs. My host 556495 shows a similar scenario for my two Alienware GTX 1650s. This all makes more sense to me now, thanks as always for the trans-global tutoring. |
Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My host 556495 shows a similar scenario for my two Alienware GTX 1650s. Host 556495 data grabbed same time as your other host. BOINC No. Runtime Credit Average Average Device Tasks TTL TTL Credit Runtime --------------------------------------------------------- 0 117 604,577 1,883,061 269,108 5,167 1 113 603,400 1,848,528 264,688 5,340 --------------------------------------------------------- |
![]() Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]() ![]() |
Host 556495 data grabbed same time as your other host Dude, you rule! (Learned that one working on a college campus during the 80's) 😎 Thanks a million! I should add that device 1 on that machine is the windows display GPU so everything jives with what you said. Incidentally, I took the side panel off of the 1060 3GB host and set the fan curve based on what you wrote about frequency throttling. My GPU temps have dropped to 60 (upper) and 55, with the fans around 80% and 65%. |
©2025 Universitat Pompeu Fabra