Advice for GPU placement

Author	Message
Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51217 - Posted: 9 Jan 2019, 0:11:45 UTC Last modified: 9 Jan 2019, 0:20:17 UTC Hey folks, I bought an RTX 2080 on Amazon, being delivered tomorrow, and was wondering if you might help confirm my plans for my GPUs. I have 7 GPUs worth crunching with: 2080, 1050 Ti, 980 Ti, 980 Ti, 970, 660 Ti, 660 Ti. I have 2 rigs each capable of housing 3 GPUs: - 3-yr-old Alienware Area-51 R2 main gaming rig - 9-yr-old XPS 730x basement cruncher. My goals are, in this order: - Use the 2080 for gaming - Keep a 980 Ti for gaming and testing - Optimize ability to crunch GPU Grid tasks on all GPUs, knowing that 2080 isn't supported and may cause tasks to not be issued on the PC that has it until acemd is updated (right?) - Use the GPUs to crunch other projects if GPUGrid isn't setup yet to support me I'm thinking of doing this: Area-51 R2: - 2080 (for sure staying in this PC) - 980 Ti (for sure staying in this PC) - 660 Ti reasoning: Game on 2080, 980 Ti as backup/testing games, and all 3 crunch BOINC for non-GPU-Grid projects until GPUGrid gets fixed XPS 730x: - 1050 Ti - 980 Ti - 970 * reasoning: Maximum GPUGrid with remaining GPUs Shelf: - 660 Ti - 460 (gets a friend on the shelf) Does this sound like it'd "best optimize" my goals? Let me know. Thanks. * Great deal here, in my opinion - $850, on sale for $50 for $800, and a 5% %40 coupon checkbox, for final total $760 pre-tax, includes 2 games, Battlefield V and Anthem. https://www.amazon.com/gp/product/B07GHVK4KN ID: 51217 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 51218 - Posted: 9 Jan 2019, 3:34:36 UTC Also consider CUDA GPU Capability: 2080, >6.1 1050 Ti, 6.1 980 Ti, 5.2 970, 5.2 660 Ti, 3.0 460, 2.1 https://developer.nvidia.com/cuda-gpus I'd put them: 980 Ti + 970 2080 + 1050 Ti Capability is one of the factors the work server can consider in assigning WUs. This may get more of the same kind of work to all cards. Just a thought. BTW, I donate my legacy cards, etc to the New-2-U charity here. ID: 51218 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 339 Credit: 7,990,341,558 RAC: 4,423 Level Scientific publications	Message 51221 - Posted: 9 Jan 2019, 18:48:29 UTC Last modified: 9 Jan 2019, 18:51:12 UTC Power supply/cooling capability/spacing would be a bigger concern for me if that was my setup. You can tell BOINC to ignore GPUGrid for just the 2080 until Turing cards work with GPUGrid. No need to not run GPUGrid on all 3 cards just because one doesn't work. ID: 51221 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51222 - Posted: 9 Jan 2019, 19:50:52 UTC - in response to Message 51221. Last modified: 9 Jan 2019, 19:51:31 UTC Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? If so, how? ID: 51222 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 51224 - Posted: 9 Jan 2019, 20:56:53 UTC Do you have this line in your cc_config.xml ??? <use_all_gpus>1</use_all_gpus> If 1, use all GPUs (otherwise only the most capable ones are used). Requires a client restart. ID: 51224 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 51225 - Posted: 9 Jan 2019, 20:58:50 UTC - in response to Message 51222. Last modified: 9 Jan 2019, 20:59:06 UTC Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Actually GPUGrid gives work to 20x0 cards, but it will fail on the host, because the app does not contain the code for CC7.5 (>6.1) cards. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? Yes. If so, how? You should put the following to the <options> section in your c:\ProgramData\BOINC\cc_config.xml: <exclude_gpu> <url>http://gpugrid.net</url> <device_num>0</device_num> </exclude_gpu> You should check the device number of your GTX 2080 in the first ~20 lines of the BOINC client's event log, and put that number there (I guessed that it will be device 0). ID: 51225 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 339 Credit: 7,990,341,558 RAC: 4,423 Level Scientific publications	Message 51226 - Posted: 9 Jan 2019, 21:26:28 UTC - in response to Message 51222. Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? If so, how? Yes, the BOINC exclusion goes by Vendor and GPU index. If only one vendor then just the index as Retvari Zoltan has suggested. It doesn't care what card it is. I have excluded a 980Ti in a system (running FAH) and allowed just the 970 to crunch on several projects. ID: 51226 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51227 - Posted: 9 Jan 2019, 21:40:01 UTC - in response to Message 51226. :) Did you guys know that I'm responsible for <exclude_gpu> being included in BOINC? I know how it works and how to use it. I didn't know that GPUGrid was giving work to 20-series GPUs, even though it would fail on them. That ends up being good for me, in a way, because I can get GPUGrid work for the other 2 GPUs in the system, and BOINC should get work from other projects for the 2080. Thanks for helping to clarify that for me. I'll definitely be adding the GPU exclusion. ID: 51227 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,186,946,190 RAC: 1,288,374 Level Scientific publications	Message 51230 - Posted: 10 Jan 2019, 3:01:12 UTC I'm currently employing gpu_exclude statements for both GPUGrid and Einstein for one of my 4 card hosts with a RTX 2080. Works fine preventing that card from being used. But that causes issues with a project_max_concurrent statement for Seti. It prevents all Seti cpu tasks from running leaving only the four gpu tasks running. I have a thread in the Linux/Unix section of the Questions and Answers forum at Seti. https://setiathome.berkeley.edu/forum_thread.php?id=83645 For now I have to remove the project_max_concurrent statement from cc_config and use the cpu limitation in Local Preferences to limit the number of cores to 16. ID: 51230 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51233 - Posted: 10 Jan 2019, 12:19:06 UTC Why do you have an exclude for Einstein? ID: 51233 · Rating: 0 · rate: / Reply Quote

Zalster Send message Joined: 26 Feb 14 Posts: 211 Credit: 4,496,324,562 RAC: 0 Level Scientific publications	Message 51240 - Posted: 10 Jan 2019, 15:52:58 UTC - in response to Message 51233. Why do you have an exclude for Einstein? Certain types of GPU work on Einstein fail. I believe short running work units do fine, the long running fail. Might be reverse. Anyway, the terminology they use to describe the work unit (as defined by the users not the scientist) is wrong nomenclature. So I stopped paying attention to the discussion. Keith can fill you in on the specifics. ID: 51240 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51241 - Posted: 10 Jan 2019, 16:09:08 UTC Last modified: 10 Jan 2019, 16:09:23 UTC Thanks. I confirmed that at least one of the Einstein task types failed immediately on my 2080, so I also added them to my GPU Exclusion list for that GPU. Pity, really. ID: 51241 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 51244 - Posted: 10 Jan 2019, 16:57:34 UTC - in response to Message 51227. :) Did you guys know that I'm responsible for <exclude_gpu> being included in BOINC? I know how it works and how to use it. I remember this. I'd like to thank you as I use <exclude_gpu> extensively. Some machines are running GPU Grid, Amicable Numbers and Enigma on dedicated GPUs. Also thanks for all the great work that you do debugging BOINC and testing new features. ID: 51244 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51245 - Posted: 10 Jan 2019, 17:07:16 UTC - in response to Message 51244. Last modified: 10 Jan 2019, 17:08:03 UTC :) I'm a rock star at breaking things, for sure! I am happy to hear that you find the feature as useful as I do! ID: 51245 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 51248 - Posted: 10 Jan 2019, 18:39:38 UTC - in response to Message 51221. Last modified: 10 Jan 2019, 18:43:36 UTC Power supply/cooling capability/spacing would be a bigger concern for me if that was my setup. Cooling is a major consideration in GPU placement for me. All my Ryzen 7 boxes are running 3 GPUs, usually 3 x 1060 cards. The top GPU is flexible, the middle GPU is a blower, and the bottom card that sits up against the blower is a short card that leaves the blower fan uncovered. If the machine still runs hotter than I like, I sometimes put a 1050ti in the lower position. Another consideration is bus width. On X370/X470 boards for instance the 2 top PCIe slots run at PCIe 3.0 x8 (if both are used). The bottom slot is PCIe 2.0 x4. The bottom slot handles a 1060 at full speed on a Ryzen 7, but not always on machines with slower processors. For example I have an ITX box with a Celeron and PCI 2.0 x4 and it constricts a 1060 but a 1050ti runs at full speed. BTW, the Ryzen 7 machines use far less CPU to run 3 1060 cards at full blast. My slower boxes take a lot more CPU allocation to run 2 1060 cards than the Ryzens do to handle 3. In this regard I've also found that SWAN_SYNC helps noticeably on all my machines except for the Ryzens, which seem to feed the GPUs fully without SWAN_SYNC. BTW, the new Ryzens coming out mid year will be PCIe 4.0, so again double the speed of PCIe 3.0. You'll need a 500 series MB for PCIe 4.0, on the older boards they'll still run at PCIe 3.0. Of course power/watt ratio is another major consideration. I recently retired my pre 10xx NV GPUs. The 750ti cards use about the same power as a 1050ti but the 1050ti is ~60% faster (it depends somewhat on the project). My 670 is still viable (24hr deadline-wise) but I replaced it because it's slower than a 1060 and uses much more power and much more heat. You might find that good used GPUs are selling inexpensively now as disillusioned miners seem to be fleeing the mines. Perhaps black lung disease? ;-) ID: 51248 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51250 - Posted: 10 Jan 2019, 18:49:02 UTC - in response to Message 51248. :) I'm not too concerned with cooling. My cramped GPUs run hot, and I know it. For my main rig, I just don't do GPU work on it during the day, and instead let it run at night. Even with a max fan curve, it routinely runs around 75-80*C overnight, and the small office becomes very well heated. But I already stress tested the overclocks, and know it is 100% stable, and the GPU fans have proven to be very durable too. What can I say? I like it hot and I like it loud. ;) ID: 51250 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 51253 - Posted: 10 Jan 2019, 19:24:37 UTC - in response to Message 51250. There's something to be said for white noise... ID: 51253 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 51257 - Posted: 10 Jan 2019, 22:18:46 UTC Power connectors. The basement cruncher, my old XPS 730x, has 4 6-pin connectors, and I previously made 2-more using molex adapters, for a total of 6 6-pin connectors. But 0 8-pin connectors. My GPU additional power requirements are: 6+8 : EVGA GeForce RTX 2080 XC ULTRA GAMING None: EVGA GeForce GTX 1050 Ti SSC GAMING 8+8 : EVGA GeForce GTX 980 Ti FTW GAMING ACX 2.0+ 6+8 : Dell GTX 980 Ti 6+6 : EVGA GeForce GTX 970 FTW ACX 2.0 6+6 : EVGA GeForce GTX 660 Ti FTW+ 3GB w/Backplate 6+6 : MSI GTX 660 Ti TwinFrozr III OC 3GB So, I think this means I'm going to: Area 51 R2: 2080, 980 Ti, 980 Ti XPS 730x: 970, 1050, 660 Ti Shelf: 660 Ti Fun! ID: 51257 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,186,946,190 RAC: 1,288,374 Level Scientific publications	Message 51264 - Posted: 11 Jan 2019, 3:06:36 UTC - in response to Message 51245. Last modified: 11 Jan 2019, 3:12:55 UTC :) I'm a rock star at breaking things, for sure! I am happy to hear that you find the feature as useful as I do! I about to join you as a "rock star" for breaking things to apparently. The client code commit that DA wrote to fix my original problem is going to cause major problems for anyone using a max_concurrent or project_max_concurrent statement. The unintended consequence of the code change prevents requesting work fetch task replacement until the hosts cache is empty. Only then does the host report all finished work and then asks for work to refill the cache. So the end of keeping your cache topped up at every 5 minute scheduler connection. The PR2918 commit is close to be being accepted into the master branch. I have voiced my displeasure but since only DA usually authorizes pull requests into the master branch, that decision is up to him. Richard Haselgrove also has voiced his concerns. ID: 51264 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,186,946,190 RAC: 1,288,374 Level Scientific publications	Message 51265 - Posted: 11 Jan 2019, 3:18:18 UTC - in response to Message 51248. BTW, the new Ryzens coming out mid year will be PCIe 4.0, so again double the speed of PCIe 3.0. You'll need a 500 series MB for PCIe 4.0, on the older boards they'll still run at PCIe 3.0. There's talk from CES that the PCIe 4.0 spec cards would still work in the first PCIe slot closest to the cpu on the existing X370/X470 motherboards as the signaling requirements for PCIe 4.0 devices limits the signal path to 6 inches without redrivers. ID: 51265 · Rating: 0 · rate: / Reply Quote