Message boards :
News :
acemdlong application 815 updated for Maxwell
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6
| Author | Message |
|---|---|
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So now it seems established that SWAN_SYNC reserves a whole CPU core. But is it any faster? If so, how much? In my case (GTX660Ti, 335.23, Win 8.1, CPU not completely saturated) I am not seeing any performance increase due to setting SWAN_SYNC, whereas something like 3% should have been visible during the 4 WUs I've crunched with this setting now. Switching back. Generally the benefit should increase if CPU interaction is needed more often, which happens for smaller molecules / systems and for faster cards. If anyone profits from this it's going to be high-end GK110 users first. MrS Scanning for our furry friends since Jan 2002 |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So now it seems established that SWAN_SYNC reserves a whole CPU core. But is it any faster? If so, how much? I've experienced with some settings, and it seems that the latest drivers, and CUDA 6.0 tasks are pretty fast without SWAN_SYNC set. The gain depends on many factors: 1. The GPU: high-end GPUs (GTX 660Ti, 670, 680, 760, 770, 780, 780Ti, Tinans) can gain a little, lesser GPUs can gain less. 2. Operating system: Windows XP is faster than other versions of Windows, but it still can be up to 3% faster with SWAN_SYNC set. 3. The type of the workunit: there are such workunits which use more CPU, they utilize the GPU less, and can gain more by setting SWAN_SYNC (I'm using WinXPx64) 4. The speed and saturation of the CPU cores: The less the CPU usage, the more the GPU utilization. It also depends the CPU app. It is good to know that hyperthreading means that 1 core can handle 2 threads, but these 2 threads won't detain the other only while they don't try to access simultaneously the same resource (FPU) of the core they running on. 5. The CPU affinity of the tasks: the GPUGrid application can gain up to 3% if it runs on the same thread of the CPU all the time (and no other application using the same core). |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, would you say that the following is true, regarding GPUGrid SWAN_SYNC: - If you are after absolute maximum GPUGrid throughput, then use it - If you are only working on the GPUGrid project, then use it - If you are also working on other CPU projects, then do not use it Those are the guidelines I'd recommend, at least. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It really depends on what you want to prioritise and your system. The number of GPU's you have is important as it will apply to them all, and of course what type they are. Using SWAN_SYNC means one full CPU thread (or core on AMD's) will be allocated to each GPUGrid app, so if you have 3 low end GPU's and a high end CPU then you could be losing most of 3 threads. If you don't want this then just don't use the Variable. Conversely if you have 3 high end cards then you probably want to get the best out of them. At least we get to decide! FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, would you say that the following is true, regarding GPUGrid SWAN_SYNC: Yes. - If you are only working on the GPUGrid project, then use it Yes. - If you are also working on other CPU projects, then do not use it I would say you can use it, if you reduce the number of usable CPUs at least by the number of the GPUs in the system. Those are the guidelines I'd recommend, at least. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
- If you are also working on other CPU projects, then do not use it I've always hated that advice, because if GPUGrid runs out of work, then you're hopefully working on some other GPU project, but now you've unnecessarily taken out one or more CPUs. It's much better (in my opinion) to define an appropriate <cpu_usage> value in a GPUGrid app_config.xml file, instead of changing the "X% of the processors" setting (which is admittedly easier). To each their own. Options are indeed good. Your approach is good, though -- if you are going to use SWAN_SYNC, then you should somehow make sure that, for each GPUGrid task that is actively running, you "budget" a full core to it. :) |
|
Send message Joined: 13 Apr 13 Posts: 61 Credit: 726,605,417 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Waited until I had a couple different WU's to compare the SWAN_SYNC impact. Wanted a low and high utilization WU to review. Image scales Did the testing the same as http://www.gpugrid.net/forum_thread.php?id=3634&nowrap=true#35730 To fill the additional threads, SETI and Einstein units were running on the CPU. No reboots between tests in this test were needed. Only had to stop BONIC, change the environment variable, and restart. These were running under the 8.41 Application Cuda60. Below is the delta average percentage of SWAN_SYNC Yes - No. So depending on the WU and the number of threads, it will vary. GPU1 was running one of the more intense Nathan_RPS1 while GPU2 was on a lower utilization GERARD_A2ART4E. Num_Threads GPU1 GPU2 2 2.0 6.1 3 2.3 5.3 4 2.6 5.4 5 2.9 4.4 6 3.1 5.1 7 2.7 3.7 8 2.3 2.5 I was surprised to see higher variation and larger impacts to utilization with a WU which starts with a much lower utilization. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks, Jeremy! For beasts such as GTX780Ti anyone should set SWAN_SYNC. For smaller cards it IMO depends on GPU speed, CPU speed and personal preference (as Jacob said). Assuming at least a half-decent CPU I'd recommend the following: Always use SWAN_SYNC on GTX780Ti or higher Don't use SWAN_SYNC on GT640 or slower The transition point between these clear cases should be somewhere between GTX660 and GTX680/770 - depending on CPU speed and personal preference Edit: a 2.x% higher GPU utilization sounds very good on fast cards, if it translates into equally faster completion times. Do we have any further measurements on this yet? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
SWAN_SYNC on means setting it to 1? Greetings from TJ |
|
Send message Joined: 15 Feb 07 Posts: 134 Credit: 1,349,535,983 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Doesn't matter what it is set to. It just needs to be set. Matt |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What Matt means is: It doesn't matter what value it has; it only matters that the variable exists. To use it, just create a system variable called SWAN_SYNC, set it to some value (like 1, doesn't matter, may not even need a value, but just set it to 1 to be sure), then restart BOINC. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Jeremy Zimmerman, more good work. Actual runtimes will probably reflect your finding, but may enhance/augment them. Thanks, FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
1. The GPU: high-end GPUs (GTX 660Ti, 670, 680, 760, 770, 780, 780Ti, Tinans) can gain a little, lesser GPUs can gain less. I tried SWAN_SYNC on a couple slower cards with a 128-bit memory bus: a 650TI and a 750TI. No noticeable difference in WU completion time even though the GPU utilization increased by a percent or so. The only real world difference on those cards in Win7-64 was that they now grabbed a whole CPU core so that one less CPU WU could be run. SWAN_SYNC was a losing proposition at least on those GPUs and Win7-64. Edit: Decided to try SWAN_SYNC on a box with a 750Ti in a PCI 2.0 X4 slot. Will report back with results. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
1. The GPU: high-end GPUs (GTX 660Ti, 670, 680, 760, 770, 780, 780Ti, Tinans) can gain a little, lesser GPUs can gain less. Did an extended SWAN_SYNC test on 3 machines. Two showed no improvement and one yielded a 1 to 1.5% decrease in run time. All machines also are running an AMD GPU in PCIe slot 0 and 3-4 CPU WUs on Phenom X6 CPUs. SWAN_SYNC at least on these machines is definitely a waste of resources IMO. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Did an extended SWAN_SYNC test on 3 machines. Two showed no improvement and one yielded a 1 to 1.5% decrease in run time. All machines also are running an AMD GPU in PCIe slot 0 and 3-4 CPU WUs on Phenom X6 CPUs. SWAN_SYNC at least on these machines is definitely a waste of resources IMO. I do agree. SWAN_SYNC can make the crunching a little bit faster only under Windows XP. (My previous post wasn't that straightforward about this.) I assume you did your tests on your computers under Windows 7 (x64). It is known that the more recent OSes than Windows XP have a new Windows Display Driver Model which makes the OS more stable, but it comes with an overhead, which makes the crunching slower on the GPU, and this overhead makes the gain from SWAN_SYNC negligible. However the recent Windows 7 (8, Vista) drivers are faster than the older (CUDA 3.1) versions. One of my hosts (using Windows XP x64) did (does) an unintended testing of the SWAN_SYNC, as this host sometimes receives CUDA4.2 tasks, which don't use the SWAN_SYNC. This comparison is not fully adequate as I'm comparing CUDA6.0 tasks to CUDA4.2 tasks, and the CUDA 6.0 app is a little bit faster of its own. This host have two GTX780Ti's: the faster one (3500MHz RAM clock) is in a PCIe3.0x16 slot, and the slower one (2700MHz RAM clock) is in a PCIe2.0x4 slot. NOELIA_BI_3 workunits: Faster GPU: without SWAN_SYNC:17.353, 17.243 +5.26% .....with SWAN_SYNC: 16.483, 16.384 Slower GPU: without SWAN_SYNC:18.435, 18.426, 18.382, 18.373 +8.56% .....with SWAN_SYNC: 16.992, 16.958, 16.925, 16.935 SDOERR_BARNA5 workunits Faster GPU: without SWAN_SYNC:16.041, 16.060 +6.5% .....with SWAN_SYNC: 15.104, 15.045 Slower GPU: without SWAN_SYNC:16.980, 16.975 +9.2% .....with SWAN_SYNC: 15.545, 15.550 GERARD_A2ARNUL_adapt3 workunits: Slower GPU: without SWAN_SYNC:15.685, GERARD_A2ART4E_adapt workunits: Faster GPU: without SWAN_SYNC: 10.977, 10.966 +6.2% .....with SWAN_SYNC: 10.328, 10.324 SANTI_marsalWTbound2 workunits: Slower GPU: without SWAN_SYNC: 18.686 +11.3% .....with SWAN_SYNC: 16.781 Faster GPU: .....with SWAN_SYNC: 15.586 NATHAN_RPS1_adapt5 workunits: Slower GPU: without SWAN_SYNC: 14.415 +6.9% .....with SWAN_SYNC: 13.484 Faster GPU: without SWAN_SYNC: 13.387 +4% .....with SWAN_SYNC: 12.862 |
©2025 Universitat Pompeu Fabra