Message boards :
Graphics cards (GPUs) :
PCI-e Bandwidth Usage
Message board moderation
| Author | Message |
|---|---|
hiigaranSend message Joined: 25 Aug 12 Posts: 3 Credit: 52,783,413 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Copying this from the BOINC forums: I've been having some discussions on several sites regarding GPUs and bandwidth usage for distributed-computing projects, and I wanted to broaden things by hopefully getting some BOINC experts in on the matter. If anyone is able to help out with posting some data from their rigs, it would really help. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've been discussing this over on the Folding@Home forums, and to my disappointment, anything less than PCI-e 3.0 x4 or PCI-e 2.0 x8 would result in bandwidth saturation, and thus a performance loss occurs due to the GPUs never reaching full load. This is about the same as I've seen here. PCI-e 2.0 x8 gives full speed with my now aging TI 750 cards, while PCI-e 2.0 x4 is bottle-necking somewhat. Of course the faster the GPU, the more the probable bottleneck. |
hiigaranSend message Joined: 25 Aug 12 Posts: 3 Credit: 52,783,413 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I was afraid of that. Really wanted to drop a good $10K on a dedicated F@H/BOINC rig and use PCI-e splitters to multiply the GPU capacity. Would have saved me a lot of money buying extra mobos and CPUs. You think every other GPU project on BOINC would be the same? |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You think every other GPU project on BOINC would be the same? I'm sure some would be better and perhaps even be unaffected if they do all the processing on the GPU. It's been so long (years) since I tested this on other projects that I won't hazard a guess. There should be someone who's checked this behavior on other projects more recently. Sorry I can't be of more help. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I was afraid of that. Really wanted to drop a good $10K on a dedicated F@H/BOINC rig and use PCI-e splitters to multiply the GPU capacity. Would have saved me a lot of money buying extra mobos and CPUs.You don't have to buy a very expensive MB and CPU for GPU crunching, provided that you want to put only 1 GPU in every MB, and you don't crunch for CPU projects on that host. Even a recent Celeron could feed a GTX980Ti in a cheap m-ATX MB (however, I would recommend an i3 at least). You can gain 10-15% performance by using a non-WDDM OS like Linux, or Windows XP. You think every other GPU project on BOINC would be the same?Surely they are for some extent. Any calculation which is modelling an N-body process is much more complex (as it could need some double precision calculation, or applying extra "forces" depending on the state of the given system) than a hashing algorithm, thus it's need to be controlled by the CPU, and it needs PCIe bandwidth. There's a variation in the PCIe bandwidth requirement between different workunit batches for the GPUGrid app, it could be the same for other projects. The algorithm of "purely mathematical" projects (like primegrid, Collatz or maybe SETI@home) is more like a hashing algorithm, thus they could need less PCIe bandwidth than GPUGrid, Einstein@home or MilkyWay@home, but it could change over time. This situation is the result of that the GPUs we use for these projects are made for gaming, thus their computing capabilities are "crippled" (disabled or non-present double precision FPUs in the cores), but even the "professional" GPUs are still just co-processors, they can't do everything on their own (however their development is going to achieve this). |
hiigaranSend message Joined: 25 Aug 12 Posts: 3 Credit: 52,783,413 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Thanks for the info. Guess my ultimate system vision might not be exactly how I'd have wanted it. I should be able to get away with multiple triple or quad card setups though, given the right chipset. So if each DC project is to some extent similar to each other when it comes to secondary hardware requirements, what kind of CPU would be recommended if each separate system were to have three or four cards each? |
|
Send message Joined: 1 Jun 16 Posts: 1 Credit: 17,942,449 RAC: 0 Level ![]() Scientific publications
|
Some of the work units require the use of a CPU core in addition to a process running on the GPU. With SETI for example, some of the work units require only very light CPU usage - and a single CPU core can quite easily keep many parallel processes running across multiple GPUs. Some of the work units however, require a dedicated CPU core or thread. A quick look in task manager or CPUID HWMonitor (very useful little app if you haven't already got it) will show what is going on. As far as I am aware, it is not currently possible to automatically select a certain type of work unit or refuse another. This has also resulted in a reduction in RAC as the work units that are CPU dependent dont necessarily yield higher credits, they just take several times longer to run. With SETI, I have found peak performance on my system (3770K and 980 Ti) occurs when crunching 4 work units at the same time on the GPU and no more than 3 at a time of the CPU (leaving 5 CPU threads free - 4 of which are used as required by the work units that require GPU and CPU). The older work units that do not require much of the CPU complete in about 15 minutes each. So I crunch about 16 per hour. The CPU intensive units take an hour each, so I only crunch 4 per hour. The credit received for either is broadly the same. The situation with GPUGRID is the opposite as this almost exlusively is designed to run just within the GPU. I am sure I could happily fit 4 x 980s in my PC case and work away at GPUGRID with the added bonus of not having to turn the heating on in my house during winter providing I am sitting in the same room as the computer! |
|
Send message Joined: 3 Nov 15 Posts: 38 Credit: 6,768,093 RAC: 0 Level ![]() Scientific publications
|
The situation with GPUGRID is the opposite as this almost exlusively is designed to run just within the GPU. I do not know how the ACEMD app is designed, but on my system it uses 3-10% of CPU core. And often utilizes PCIe x4 bus up to 60%. The cpu usage, however small, has significant impact on performace. Running with realtime (RR) priority increased GPU usage from 80% to 96%. I am sure I could happily fit 4 x 980s in my PC case and work away at GPUGRID with the added bonus of not having to turn the heating on in my house during winter providing I am sitting in the same room as the computer! You think Eco :) I use this waste heat to dry my powders and papercraft.
|
©2025 Universitat Pompeu Fabra