Advanced search

Message boards : Wish list : suggestion: "adaptive Swan Sync"

Author Message
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30750 - Posted: 9 Jun 2013 | 14:36:40 UTC

Abstract: I'm going to describe the current CPU usage of GPU-Grid, make a few assumptions about how things work behind the scenes and will then suggest a way to reduce the CPU usage, while simultaneously keeping performance of current GPUs constant and optimizing performance of older GPUs.

Current situation: the new Kepler GPUs use a full CPU core for most WUs (only half a core for some WUs). I remember someone (probably GDF) saying this was done to assure they're performing as fast as they could.
On older GPUs a mechanism called Swan Sync is in use by default and keeps CPU usage rather low, e.g. ~8% on a GTX570 if I remember correctly. On these cards performance started to suffer the faster they became, so that it became standard practice in older apps to disable Swan Sync via setting the environment variable "Swan_Sync=0". This way an entire core was used and optimal GPU performance was achieved.

Assumptions:
- Using a full core does not involve any magic or additional calculations, it's just polling the GPU as fast as possible, in order to avoid the GPU running dry.
- Swan Sync itself seems to work pretty well, as evidenced by the slower cards.
- Swan Sync predicts how long a GPU will need for a given time step (or whatever the chunk of work is that the GPU processes without CPU intervention) and sets the CPU thread to sleep for about just that long. After waking up the CPU thread starts continously polling the GPU for the results.
- When Swan Sync starts to "fall apart" is when the time needed per time step approaches the time granularity of the OS scheduler, i.e. for fast cards.
- Each time step requires approximately the same time.
- Swan Sync is not yet adaptive.

Intermediate conclusion: actually we'd want to switch between Swan Sync and constant polling based on GPU performance rather than GPU generation. One could approximate GPU speed from the clock speed, number of shaders and compute capability.. but let's think this further.

Suggestion: I'm about to describe an algorithm which is based on Swan Sync, but introduces a correction to the time prediction calculated by Swan Sync. This correction is empirically determined "on the fly", from data readily available, is continuously updated and hence automatically includes all factors influencing the performance of a machine (static and temporary ones).

Here's how to do it: wiggle the sleep time of the CPU up and down. We start with what ever sleep time Swan Sync calculates. For the next time step we reduce this time by a small amount, maybe 5%. After completing this time step we check if the GPU was done with the work earlier than our initial prediction forcasted. In this case we apply a small correction to the time predicted by Swan Sync and repeat this two-step cycle. This mechanism should ensure that the sleep time is always short enough so that the GPU is not being limited by polling it too late. If this results in practically constant polling for very fast cards - fine, we're not loosing anything compared to the current situation.

And there's another case to be considered: both, the predicted sleep time and the wiggled down one are too short, i.e. in both cases some continous polling of the GPU happened before the time step was completed. In this case we can increase the sleep time (i.e. the correction to the Swan Sync estimate). However, I'd do this carefully, as it can cost GPU performance. Maybe trying 5% longer for every 50 time steps or so, and then check if GPU performance suffered (it was ready before polling started and we keep the current sleep time) or if it's still fine (still some polling before completion, so we can increase sleep time).

Obviously some fine tuning of the wiggle steps and frequency would be needed and I'd also keep a history of the last 100 or 1000 time steps, which should help in making an even better prediction. Make this window small to make the algorithm agile - it will react quickly to changes. And make it larger to smooth things out, so that the occasional odd value doesn't throw the timing completely off balance.

Summary: by wiggling the sleep time constantly down and carefully up it should be possible to achieve optimum GPU performance at little additional CPU load, independent of the GPU generation.

Let me know what you think!
MrS
____________
Scanning for our furry friends since Jan 2002

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31850 - Posted: 6 Aug 2013 | 18:33:28 UTC - in response to Message 30750.

I like the idea of using the "polling history" to determine the appropriate level of polling for each card.

I have 3 generations of GPUs in my system, and some people may have even more! The application should be able to perform best on any GPU, regardless of generation or GPU speed.

Monitoring (and adjusting) the polling makes sense, assuming that is how GPU work is actually done. I hope GPUGrid.net will consider an idea such as this, in order to increase performance.

Profile MJH
Project administrator
Project developer
Project scientist
Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 31916 - Posted: 9 Aug 2013 | 15:32:34 UTC - in response to Message 30750.

Thanks for your suggestion! The CPU use is something I hope to improve soon. Somewhere along the line the blocking synchronisation stopped working as well as it once did, and I need to investigate why.

MJH

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31919 - Posted: 9 Aug 2013 | 20:52:48 UTC - in response to Message 31916.

Good to hear you can afford some time to look into this!

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Wish list : suggestion: "adaptive Swan Sync"

//