acemdlong application 815 updated for Maxwell

Message boards : News : acemdlong application 815 updated for Maxwell
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36614 - Posted: 24 Apr 2014, 15:14:47 UTC - in response to Message 36608.  

So now it seems established that SWAN_SYNC reserves a whole CPU core. But is it any faster? If so, how much?

In my case (GTX660Ti, 335.23, Win 8.1, CPU not completely saturated) I am not seeing any performance increase due to setting SWAN_SYNC, whereas something like 3% should have been visible during the 4 WUs I've crunched with this setting now. Switching back.

Generally the benefit should increase if CPU interaction is needed more often, which happens for smaller molecules / systems and for faster cards. If anyone profits from this it's going to be high-end GK110 users first.

MrS
Scanning for our furry friends since Jan 2002
ID: 36614 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36651 - Posted: 25 Apr 2014, 19:15:37 UTC - in response to Message 36608.  

So now it seems established that SWAN_SYNC reserves a whole CPU core. But is it any faster? If so, how much?

I've experienced with some settings, and it seems that the latest drivers, and CUDA 6.0 tasks are pretty fast without SWAN_SYNC set.
The gain depends on many factors:
1. The GPU: high-end GPUs (GTX 660Ti, 670, 680, 760, 770, 780, 780Ti, Tinans) can gain a little, lesser GPUs can gain less.
2. Operating system: Windows XP is faster than other versions of Windows, but it still can be up to 3% faster with SWAN_SYNC set.
3. The type of the workunit: there are such workunits which use more CPU, they utilize the GPU less, and can gain more by setting SWAN_SYNC (I'm using WinXPx64)
4. The speed and saturation of the CPU cores: The less the CPU usage, the more the GPU utilization. It also depends the CPU app. It is good to know that hyperthreading means that 1 core can handle 2 threads, but these 2 threads won't detain the other only while they don't try to access simultaneously the same resource (FPU) of the core they running on.
5. The CPU affinity of the tasks: the GPUGrid application can gain up to 3% if it runs on the same thread of the CPU all the time (and no other application using the same core).
ID: 36651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36652 - Posted: 25 Apr 2014, 19:23:05 UTC - in response to Message 36651.  

So, would you say that the following is true, regarding GPUGrid SWAN_SYNC:
- If you are after absolute maximum GPUGrid throughput, then use it
- If you are only working on the GPUGrid project, then use it
- If you are also working on other CPU projects, then do not use it

Those are the guidelines I'd recommend, at least.
ID: 36652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36653 - Posted: 25 Apr 2014, 19:43:35 UTC - in response to Message 36652.  

It really depends on what you want to prioritise and your system. The number of GPU's you have is important as it will apply to them all, and of course what type they are.
Using SWAN_SYNC means one full CPU thread (or core on AMD's) will be allocated to each GPUGrid app, so if you have 3 low end GPU's and a high end CPU then you could be losing most of 3 threads. If you don't want this then just don't use the Variable. Conversely if you have 3 high end cards then you probably want to get the best out of them.
At least we get to decide!
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 36653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36655 - Posted: 25 Apr 2014, 22:48:42 UTC - in response to Message 36652.  

So, would you say that the following is true, regarding GPUGrid SWAN_SYNC:
- If you are after absolute maximum GPUGrid throughput, then use it

Yes.

- If you are only working on the GPUGrid project, then use it

Yes.

- If you are also working on other CPU projects, then do not use it

I would say you can use it, if you reduce the number of usable CPUs at least by the number of the GPUs in the system.

Those are the guidelines I'd recommend, at least.
ID: 36655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36656 - Posted: 25 Apr 2014, 22:58:27 UTC - in response to Message 36655.  
Last modified: 25 Apr 2014, 23:01:12 UTC

- If you are also working on other CPU projects, then do not use it

I would say you can use it, if you reduce the number of usable CPUs at least by the number of the GPUs in the system.


I've always hated that advice, because if GPUGrid runs out of work, then you're hopefully working on some other GPU project, but now you've unnecessarily taken out one or more CPUs.

It's much better (in my opinion) to define an appropriate <cpu_usage> value in a GPUGrid app_config.xml file, instead of changing the "X% of the processors" setting (which is admittedly easier).

To each their own. Options are indeed good.

Your approach is good, though -- if you are going to use SWAN_SYNC, then you should somehow make sure that, for each GPUGrid task that is actively running, you "budget" a full core to it. :)
ID: 36656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeremy Zimmerman

Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 36682 - Posted: 26 Apr 2014, 22:32:16 UTC - in response to Message 36656.  

Waited until I had a couple different WU's to compare the SWAN_SYNC impact. Wanted a low and high utilization WU to review. Image scales

Did the testing the same as http://www.gpugrid.net/forum_thread.php?id=3634&nowrap=true#35730 To fill the additional threads, SETI and Einstein units were running on the CPU. No reboots between tests in this test were needed. Only had to stop BONIC, change the environment variable, and restart.

These were running under the 8.41 Application Cuda60.

Below is the delta average percentage of SWAN_SYNC Yes - No. So depending on the WU and the number of threads, it will vary. GPU1 was running one of the more intense Nathan_RPS1 while GPU2 was on a lower utilization GERARD_A2ART4E.

Num_Threads	GPU1	GPU2
2               2.0	6.1
3               2.3	5.3
4               2.6	5.4
5               2.9	4.4
6               3.1	5.1
7               2.7	3.7
8               2.3	2.5


I was surprised to see higher variation and larger impacts to utilization with a WU which starts with a much lower utilization.

ID: 36682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36685 - Posted: 27 Apr 2014, 11:40:40 UTC
Last modified: 27 Apr 2014, 11:42:06 UTC

Thanks, Jeremy!

For beasts such as GTX780Ti anyone should set SWAN_SYNC. For smaller cards it IMO depends on GPU speed, CPU speed and personal preference (as Jacob said). Assuming at least a half-decent CPU I'd recommend the following:

Always use SWAN_SYNC on GTX780Ti or higher
Don't use SWAN_SYNC on GT640 or slower
The transition point between these clear cases should be somewhere between GTX660 and GTX680/770 - depending on CPU speed and personal preference

Edit: a 2.x% higher GPU utilization sounds very good on fast cards, if it translates into equally faster completion times. Do we have any further measurements on this yet?

MrS
Scanning for our furry friends since Jan 2002
ID: 36685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36692 - Posted: 27 Apr 2014, 15:28:38 UTC - in response to Message 36685.  

SWAN_SYNC on means setting it to 1?
Greetings from TJ
ID: 36692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GPUGRID Role account

Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 36694 - Posted: 27 Apr 2014, 15:45:29 UTC - in response to Message 36692.  

Doesn't matter what it is set to. It just needs to be set.

Matt
ID: 36694 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36695 - Posted: 27 Apr 2014, 15:59:18 UTC - in response to Message 36694.  
Last modified: 27 Apr 2014, 15:59:51 UTC

What Matt means is: It doesn't matter what value it has; it only matters that the variable exists. To use it, just create a system variable called SWAN_SYNC, set it to some value (like 1, doesn't matter, may not even need a value, but just set it to 1 to be sure), then restart BOINC.
ID: 36695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36701 - Posted: 27 Apr 2014, 18:31:08 UTC - in response to Message 36695.  

Jeremy Zimmerman, more good work.
Actual runtimes will probably reflect your finding, but may enhance/augment them.
Thanks,
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 36701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36741 - Posted: 30 Apr 2014, 15:13:55 UTC - in response to Message 36651.  
Last modified: 30 Apr 2014, 15:48:56 UTC

1. The GPU: high-end GPUs (GTX 660Ti, 670, 680, 760, 770, 780, 780Ti, Tinans) can gain a little, lesser GPUs can gain less.

I tried SWAN_SYNC on a couple slower cards with a 128-bit memory bus: a 650TI and a 750TI. No noticeable difference in WU completion time even though the GPU utilization increased by a percent or so. The only real world difference on those cards in Win7-64 was that they now grabbed a whole CPU core so that one less CPU WU could be run. SWAN_SYNC was a losing proposition at least on those GPUs and Win7-64.

Edit: Decided to try SWAN_SYNC on a box with a 750Ti in a PCI 2.0 X4 slot. Will report back with results.
ID: 36741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36882 - Posted: 21 May 2014, 14:33:47 UTC - in response to Message 36741.  

1. The GPU: high-end GPUs (GTX 660Ti, 670, 680, 760, 770, 780, 780Ti, Tinans) can gain a little, lesser GPUs can gain less.

I tried SWAN_SYNC on a couple slower cards with a 128-bit memory bus: a 650TI and a 750TI. No noticeable difference in WU completion time even though the GPU utilization increased by a percent or so. The only real world difference on those cards in Win7-64 was that they now grabbed a whole CPU core so that one less CPU WU could be run. SWAN_SYNC was a losing proposition at least on those GPUs and Win7-64.

Edit: Decided to try SWAN_SYNC on a box with a 750Ti in a PCI 2.0 X4 slot. Will report back with results.

Did an extended SWAN_SYNC test on 3 machines. Two showed no improvement and one yielded a 1 to 1.5% decrease in run time. All machines also are running an AMD GPU in PCIe slot 0 and 3-4 CPU WUs on Phenom X6 CPUs. SWAN_SYNC at least on these machines is definitely a waste of resources IMO.
ID: 36882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36887 - Posted: 21 May 2014, 20:13:42 UTC - in response to Message 36882.  

Did an extended SWAN_SYNC test on 3 machines. Two showed no improvement and one yielded a 1 to 1.5% decrease in run time. All machines also are running an AMD GPU in PCIe slot 0 and 3-4 CPU WUs on Phenom X6 CPUs. SWAN_SYNC at least on these machines is definitely a waste of resources IMO.

I do agree.
SWAN_SYNC can make the crunching a little bit faster only under Windows XP. (My previous post wasn't that straightforward about this.)
I assume you did your tests on your computers under Windows 7 (x64). It is known that the more recent OSes than Windows XP have a new Windows Display Driver Model which makes the OS more stable, but it comes with an overhead, which makes the crunching slower on the GPU, and this overhead makes the gain from SWAN_SYNC negligible. However the recent Windows 7 (8, Vista) drivers are faster than the older (CUDA 3.1) versions.

One of my hosts (using Windows XP x64) did (does) an unintended testing of the SWAN_SYNC, as this host sometimes receives CUDA4.2 tasks, which don't use the SWAN_SYNC. This comparison is not fully adequate as I'm comparing CUDA6.0 tasks to CUDA4.2 tasks, and the CUDA 6.0 app is a little bit faster of its own.
This host have two GTX780Ti's:
the faster one (3500MHz RAM clock) is in a PCIe3.0x16 slot, and
the slower one (2700MHz RAM clock) is in a PCIe2.0x4 slot.

NOELIA_BI_3 workunits:
Faster GPU:
without SWAN_SYNC:17.353, 17.243 +5.26%
.....with SWAN_SYNC: 16.483, 16.384
Slower GPU:
without SWAN_SYNC:18.435, 18.426, 18.382, 18.373 +8.56%
.....with SWAN_SYNC: 16.992, 16.958, 16.925, 16.935

SDOERR_BARNA5 workunits
Faster GPU:
without SWAN_SYNC:16.041, 16.060 +6.5%
.....with SWAN_SYNC: 15.104, 15.045
Slower GPU:
without SWAN_SYNC:16.980, 16.975 +9.2%
.....with SWAN_SYNC: 15.545, 15.550

GERARD_A2ARNUL_adapt3 workunits:
Slower GPU:
without SWAN_SYNC:15.685,

GERARD_A2ART4E_adapt workunits:
Faster GPU:
without SWAN_SYNC: 10.977, 10.966 +6.2%
.....with SWAN_SYNC: 10.328, 10.324

SANTI_marsalWTbound2 workunits:
Slower GPU:
without SWAN_SYNC: 18.686 +11.3%
.....with SWAN_SYNC: 16.781
Faster GPU:
.....with SWAN_SYNC: 15.586

NATHAN_RPS1_adapt5 workunits:
Slower GPU:
without SWAN_SYNC: 14.415 +6.9%
.....with SWAN_SYNC: 13.484
Faster GPU:
without SWAN_SYNC: 13.387 +4%
.....with SWAN_SYNC: 12.862
ID: 36887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : News : acemdlong application 815 updated for Maxwell

©2025 Universitat Pompeu Fabra