GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)

Message boards : Graphics cards (GPUs) : GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30401 - Posted: 26 May 2013, 10:02:27 UTC - in response to Message 30399.  
Last modified: 26 May 2013, 10:03:44 UTC

tomba:

I don't see the portion where the task was reported.
You said you knew the time the task was reported, but... is that correct?
Maybe you were off by a few hours, ie: maybe it used UTC time, but you're in a different timezone?

In the logs, we should see something that says:
"[GPUGRID] Sending scheduler request: To report completed tasks."
and
"[GPUGRID] Reporting 1 completed tasks"

Along with it should be a work fetch sequence.
That's the sequence we need to see.
Can you find it?
ID: 30401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30403 - Posted: 26 May 2013, 11:07:22 UTC - in response to Message 30400.  

Running two GPUGrid tasks at a time is NOT beneficial to the project or most crunchers - overall, it is detrimental to both.

Presently there is only 1 known circumstance where it's faster for some WU types, and that doesn't factor in the observed increase in error rates:

    On GPU's with larger amounts of memory (3GB or more) running on WDDM systems that are poorly optimized for GPU crunching (high CPU usage by CPU projects) - Jacob's 3GB GTX660Ti


I would advise against running the present Short WU's two at a time, because of their runtime and the fact that there is more credit to be had from running Long WU's. On a GTX660Ti (W7) it takes ~5.5h to complete a short WU. Even if it only took 10h to complete two WU's, you would be much better off running Long WU's because of the obvious credit increase. Also, and generally speaking, the contribution of running Long WU's over any extended period of time is more important to the project than running short WU's. That's why the credit system exists in it's present form.


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 30403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30410 - Posted: 26 May 2013, 12:46:24 UTC - in response to Message 29297.  
Last modified: 26 May 2013, 13:04:53 UTC

Getting back on topic... it occurs to me that an investigation into CPU Time does need to happen.

On POEM tasks, as well as GPUGrid tasks, although overall task time is improved when running 2-at-a-time (from my testing), the overall CPU time is higher when running 2-at-a-time.

It appears that things might work this way, for a GPU task that also utilizes a full core:
- When a task is being processed on its own, the CPU core is fully utilized, and the GPU Load is a certain % (say 85%).
- When the task is being processed in tandem with another task on the GPU, aka 2-per-GPU, the CPU core is still fully utilized, but the GPU Load for the task is something akin to (98% / 2 = 49%). So, it will take less-than-double the time to complete, but during that time, the CPU is being fully used.
- I'm not sure if actual computational work is being done in the CPU process, or if it's just a "feeder" to feed/poll the GPU, to keep the GPU busy.
- The results indicate that, though the tasks are completing faster overall at 2-per-GPU, more CPU is being used to get those results.

This is a concern for any cruncher that also crunches CPU projects; that CPU may be being wasted.

So, the "benefits" of running 2-at-a-time may actually be dependent upon the user's preference of sacrificing some CPU time (up to a full core per additional GPU task) to achieve the increased task throughput.

Note: I'm not talking about changing the BOINC preference for "use at most x% of the processors". I'm talking about per-task CPU time for the GPU tasks that are completed 2-per-GPU.

This has caused me to re-evaluate my current 5-per-GPU strategy for POEM (whose GPU tasks always use a full core), and re-evaluate my current 2-per-GPU strategy for GPUGrid (whose GPU tasks only sometimes use a full core). In this re-evaluation, I believe I am going to have to come up with a personal tolerance level of how much CPU I'm willing to sacrifice. ie: I don't think there's an objective way to approach this, to where it won't depend on user preference... is there?

Hope this makes sense to somebody. :)
Logical input would be appreciated.
ID: 30410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30580 - Posted: 31 May 2013, 10:29:28 UTC

Moderation notice: flames went a bit high in this thread, so it was hidden while things burned down. By now 2 posts are missing, but all the relevant information should still be there.

Back to topic:
Jacob wrote:
Won't it only possibly increase WU turn around time if a given application is out of tasks to allocate to computers requesting work? I'm not sure how often that happens, but even then, the server-side-scheduler can be setup to handle it gracefully I believe (possibly sending tasks to additional hosts in case the new host completes it first).

I don't see this limit-increase-request as detrimental.
I see it as logical and beneficial.

You're right that the change in WU turn around time does not matter/harm as long as the number of parallel searches in progress is larger or equal to the number of attached cards times 2 (now) or times 3 (increased limit). I don't know how close we are to this limit, but in recent times there seems to have been plenty of work available, so we might be safe. The project staff would have to monitor and decide in this issue, possibly even adjusting things back if it doesn't work out (any more).

Well, actually it wouldn't be neccessary to go to straight "3 WUs per GPU", I think "2*nGPU + 1" would suffice in all but extreme cases (multiple GPUs, very slow upload).

That seems like a reasonable change to me, but it would have to be communicated appropriately (at least in the news, maybe also pushing it via the BOINC message system). The point I'd be afraid of is this: people running GPU-Grid probably set their cache sizes according to their CPU projects, as long as they still get the credit bonus. At such "typical" settings BOINC would go for a straight 2-WU cache, which might make them miss the bonus credits. We'd need to be careful to avoid this, otherwise there'll be far more harm done by annoying crunchers than throughput gained by having the 3rd WU around.

We might want to run some further tests before pushing for the 3-WU-cache. To begin, quantifying the throughput increase for some long-runs would be nice. GPU utilization sure goes up, so there must be some increase. Ah, if only those SMX's could work on entirely different tasks! But that's not available below Titan.

Some numbers from me:
By now I've changed my app_config to 0.5 for the long-runs as well. Let's see how well it goes. Although I've got a special case since I actually want to run as much POEM as I can, so occasionally I'm running those WUs. Meaning in the last few days I was missing the deadline for credit bonus anyway (since BOINC caches a 2nd GPU-Grid WU far too early for me) and now I'll have widely varying configurations running, depending on how many POEMs I get:

- 5 or more POEMs: up to 8 POEMs run
- 1 to 4 POEMs: these POEMs and 1 GPU-Grid task
- 0 POEMs: 2 GPU-Grid tasks

By now I observed the following: 1 KIDc22_SOD would use ~85% GPU, with 2 POEMs (which themselves would lead to pretty low utilization) along results in 95 - 97% utilization. This must be better, although it will be hard for me to quantify anything.

Alright, now together with a regular KIDc22: GPU usage 98% (wow, haven't seem anything like this before over here!), overall GPU memory used 760 MB (fine - but why so low? Shouldn't it roughly double?) and GPU power consumption is slightly up (~62% -> 68%, again the highest I have seen here).

@SK: I can't remember any report of an increased error rate due to running 2 WUs in parallel. The bug discovered by Jacob some time ago was totally unrelated to this.

MrS
Scanning for our furry friends since Jan 2002
ID: 30580 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30582 - Posted: 31 May 2013, 10:41:58 UTC - in response to Message 30410.  
Last modified: 31 May 2013, 10:47:31 UTC

... it occurs to me that an investigation into CPU Time does need to happen.

On POEM tasks, as well as GPUGrid tasks, although overall task time is improved when running 2-at-a-time (from my testing), the overall CPU time is higher when running 2-at-a-time.

...

This is a concern for any cruncher that also crunches CPU projects; that CPU may be being wasted.

Due to the much larger CPU times involved (and possibly wasted) with running 2-at-a-time for GPUGrid... I think 2-at-a-time should be used only if you are okay giving GPUGrid preference to use the CPU (up to a full core, per task), over any attached CPU projects.

For me, since GPUGrid already gets a lot of my machine's resources (usually 2 GPUs) and credits (approximately 75%), I am not okay with this. I want CPU resources to be available for my other 12 CPU projects.

So, I have personally decided to stick with 1-at-a-time, for GPUGrid. This will also allow me to use both my GTX 660 Ti, and my GTX 460, to do GPUGrid work (which I would prefer), rather than forcing me to allocate the GTX 460 to some other GPU project (which I would not prefer).

Also, as a side note, because of the larger CPU times involved with running x-at-a-time for POEM@Home... I have also adjusted it: from doing 5-at-a-time, to doing 3-at-a-time. I had been keeping track of POEM task times, including per-task-run-time and per-task-CPU time, as ran on my GTX 660 Ti, and I think I finally have enough data to justify my decision to move from 5 to 3. Details are below.

POEM@Home x-at-a-time task times:
1 POEM task on 660 Ti, alongside other tasks at full load:
- Attempt 1:
   - Task Run time (sec): 929.8
   - Task CPU time (sec): 902.3
   - Task complete every: 929.8
- Attempt 2: 5/27/2013 (it only ran at 1045 Mhz)
   - Task Run time (sec): 1,127.50
   - Task CPU time (sec): 960.76
   - Task complete every: 1,127.50
- Attempt 3: 5/27/2013 (it only ran at 1045 Mhz)
   - Task Run time (sec): 1,082.05
   - Task CPU time (sec): 955.13
   - Task complete every: 1,082.05

2 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
   - Task Run time (sec): 1,062.62
   - Task CPU time (sec): 1,021.05
   - Task complete every: 531.31
- Attempt 2: 5/26/2013
   - Task Run time (sec): 1,234.60
   - Task CPU time (sec): 1,056.06
   - Task complete every: 617.3
- Attempt 3: 5/27/2013
   - Task Run time (sec): 1,201.19
   - Task CPU time (sec): 1,036.07
   - Task complete every: 600.595
- Attempt 4: 5/27/2013 1241 Mhz
   - Task Run time (sec): 1,190.03
   - Task CPU time (sec): 1,027.76
   - Task complete every: 595.015

3 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
   - Task Run time (sec): 1,405.38
   - Task CPU time (sec): 1,337.66
   - Task complete every: 468.46
- Attempt 2:
   - Task Run time (sec): 1,295.70
   - Task CPU time (sec): 1,205.33
   - Task complete every: 431.9
- Attempt 3:
   - Task Run time (sec): 1,233.04
   - Task CPU time (sec): 1,197.50
   - Task complete every: 411.01333333333333333333333333333
- Attempt 4:
   - Task Run time (sec): 1,345.84
   - Task CPU time (sec): 1,207.16
   - Task complete every: 448.61333333333333333333333333333
- Attempt 5:
   - Task Run time (sec): 1,584.40
   - Task CPU time (sec): 1,383.26
   - Task complete every: 528.13333333333333333333333333333
- Attempt 6: 5/26/2013
   - Task Run time (sec): 1,412.456667
   - Task CPU time (sec): 1,190.23
   - Task complete every: 470.818889
- Attempt 7: 5/26/2013
   - Task Run time (sec): 1,348.02
   - Task CPU time (sec): 1,142.396667
   - Task complete every: 449.34
- Attempt 8: 5/27/2013
   - Task Run time (sec): 1,417.43
   - Task CPU time (sec): 1,194.49
   - Task complete every: 472.47666666666666666666666666667
- Attempt 9: 5/27/2013
   - Task Run time (sec): 1,361.78
   - Task CPU time (sec): 1,162.97
   - Task complete every: 453.92666666666666666666666666667

4 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
   - Task Run time (sec): 1,464.20
   - Task CPU time (sec): 1,364.09
   - Task complete every: 366.05
- Attempt 2:
   - Task Run time (sec): 1,596.06
   - Task CPU time (sec): 1,378.56
   - Task complete every: 399.015
- Attempt 3:
   - Task Run time (sec): 1,542.45
   - Task CPU time (sec): 1,308.56
   - Task complete every: 385.6125
- Attempt 4: 5/27/2013 1241Mhz
   - Task Run time (sec): 1,670.58
   - Task CPU time (sec): 1,340.23
   - Task complete every: 417.645

5 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
   - Task Run time (sec): 1,801.34
   - Task CPU time (sec): 1,580.75
   - Task complete every: 360.268
- Attempt 2:
   - Task Run time (sec): 1,752.97
   - Task CPU time (sec): 1,535.52
   - Task complete every: 350.594
- Attempt 3:
   - Task Run time (sec): 1,822.53
   - Task CPU time (sec): 1,574.04
   - Task complete every: 364.506

6 POEM tasks on 660 Ti, alongside other tasks at full load:
- Attempt 1:
   - Task Run time (sec): 2,200.69
   - Task CPU time (sec): 1,988.87
   - Task complete every: 366.78166666666666666666666666667
- Attempt 2:
   - Task Run time (sec): 2,138.92
   - Task CPU time (sec): 1,817.86
   - Task complete every: 356.48666666666666666666666666667
ID: 30582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30584 - Posted: 31 May 2013, 10:54:40 UTC - in response to Message 30582.  

Right, I forgot to comment on the CPU usage. For GPU-Grid on Keplers this is mostly polling the GPU, I think, since the CPU usage on older cards is so much lower. POEM does some actual number crunching on the CPU as well (which would be far slower on the GPU).

And you're right in that this is a general tradeoff everyone has to decide for themselves: how much CPU time am I willing to give up in order to feed my GPU better. One criterion would be overall RAC, where "feeding the GPU" should normally win. On the other hand one could argue that CPU credits are worth more than GPU credits. I couldn't see any better rule here than "do what you want".

Regarding POEM: back when there was still enough work I made the tests on my system and found maximum throughput at 867.9k RAC with 8 concurrent tasks. I didn't write the other numbers down, but progression was rather flat at the top, maybe a few 1000's per day more for an additional thread. Hence I view any CPU work being done on this host as bonus (since with full POEM supply it wouldn't do any at all), so I'm fine sacrificing another core if it helps overall throughput (my team needs it, badly ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 30584 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30586 - Posted: 31 May 2013, 11:04:37 UTC - in response to Message 30580.  
Last modified: 31 May 2013, 11:25:07 UTC

Well, actually it wouldn't be neccessary to go to straight "3 WUs per GPU", I think "2*nGPU + 1" would suffice in all but extreme cases (multiple GPUs, very slow upload).


Well.. I am positive some people do in fact have really slow upload speeds (where it could take an hour to upload a task's result). And, if the user was running 2-at-a-time, the worst case would be if both tasks complete at the same time. Ideally, it would actually be nice to have 2 tasks on-hand to start up, while the 2 results are being uploaded (so, server max-in-progress-of-4-per-GPU)... but if only 1 task was available (max-in-progress-of-3-per-GPU), then the GPU could still be worked 1-at-a-time until a 2nd task became available.

In regards to implementation, it COULD be implemented as a project web preference, but then, what if a user has tons of various computers with various combinations of GPUs, and they only want to increase their limit for a specific computer? This is why I don't like the idea of having this as an option within web preferences, or even location-specific (work/school/etc) web preferences.

The point I'd be afraid of is this: people running GPU-Grid probably set their cache sizes according to their CPU projects, as long as they still get the credit bonus. At such "typical" settings BOINC would go for a straight 2-WU cache, which might make them miss the bonus credits. We'd need to be careful to avoid this, otherwise there'll be far more harm done by annoying crunchers than throughput gained by having the 3rd WU around.

I agree that it should be implemented in a way that doesn't cause harm. It's unfortunate that people probably DO rely on the max-x-in-progress to yield them bonus credits, while also keeping large cache settings. So, I'm not sure what the answer is yet.

We might want to run some further tests before pushing for the 3-WU-cache. To begin, quantifying the throughput increase for some long-runs would be nice. GPU utilization sure goes up, so there must be some increase. Ah, if only those SMX's could work on entirely different tasks! But that's not available below Titan.

I did document, towards the beginning of this thread, some results where I showed increased task completion throughput. Not much faster, but within the ballpark of 3%-20% faster.

Regarding your [POEM + GPUGrid] settings, I too run a similar config. I noticed that, while POEM tasks ran alongside a GPUGrid task all on the same GPU, the POEM tasks did not seem to complete timely at all. And because those POEM tasks take a full CPU core to run, there was a lot of CPU time spent to complete the POEM tasks. Unfortunately, there's no good way to say "run x-at-a-time for Project A, run x-at-a-time for Project B, but don't let them run together on the same GPU" unless you specify a hard limit using BOINC GPU exclusions, which we do not want to do. So, my move to doing 1-at-a-time for GPUGrid, solves this POEM-CPU-usage problem for me. I'm definitely interested in your findings.

My current recommendations, for anyone that wants to try 2-at-a-time for GPUGrid, are:
- only do it if all GPUs involved have 2GB GPU RAM or more (due to failures adding a task when not enough GPU RAM is available)
- only do it if you have reasonably fast upload speeds (I'd say capable of uploading a result within 15 minutes. Note: the max-2-in-progress-per-GPU server limit does mean that there is a window where a GPU could be entirely non-utilized by GPUGrid, and faster upload speeds help to close that window)
- only do it if you don't mind GPUGrid tasks spending more CPU time than they normally would
- only do it if you are okay with the possibility of BOINC running multiple GPU projects on a single GPU, which could slow down throughput for them
ID: 30586 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30589 - Posted: 31 May 2013, 12:21:26 UTC

skgiven:

I have recently also made an additional change to my system, which I think you might find interesting.

Previously, you recommended freeing a core through the BOINC preference "Use at most x% of the processors", but I did not want to do that since GPUGrid tasks sometimes use less than a full core, and are okay to be overloaded since they run at a higher Windows priority as compared to the CPU tasks. So I still have that preference set at "Use at most 100% of the processors" and I believe that's the correct setting for me, since I want full CPU utilization always.

BUT... On my system, running 1-task-per-GPU, where GPUGrid can potentially be running on both my Kepler GTX 660 Ti as well as my Fermi GTX 460... because GPUGrid tasks generally use a full core on Keplers, I "for sure" knew a CPU was being fully utilized whenever 2 GPUGrid tasks were running.

So, I found a way to take advantage of that logic, to better accommodate my system, to prevent overloading, while still ensuring full-CPU-load. I changed GPUGrid's app_config.xml file to use <cpu_usage> of 0.5 for all the applications. That way, if only 1 GPUGrid task is running, BOINC will not "allocate a CPU", but if 2 GPUGrid tasks are running, 0.5 + 0.5 = 1.0, and BOINC will "allocate a CPU", which I want because it means I know a task is running on the Kepler, and I'd be unnecessarily overloaded if I didn't allocate it.

I think it's helping, too, based on my initial results. You were right, unnecessary overloading is detrimental to task throughput, thanks for keeping me thinking about that.

Have a good day,
Jacob
ID: 30589 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30596 - Posted: 31 May 2013, 20:42:00 UTC - in response to Message 30586.  
Last modified: 31 May 2013, 20:43:38 UTC

My current recommendations, for anyone that wants to try 2-at-a-time for GPUGrid, are:
- only do it if all GPUs involved have 2GB GPU RAM or more (due to failures adding a task when not enough GPU RAM is available)
- only do it if you have reasonably fast upload speeds (I'd say capable of uploading a result within 15 minutes. Note: the max-2-in-progress-per-GPU server limit does mean that there is a window where a GPU could be entirely non-utilized by GPUGrid, and faster upload speeds help to close that window)
- only do it if you don't mind GPUGrid tasks spending more CPU time than they normally would
- only do it if you are okay with the possibility of BOINC running multiple GPU projects on a single GPU, which could slow down throughput for them

I'll add to this list: only do so if you'Ve got an otherwise stable system.

I got 2 computation errors on GPU-Grid WUs running the config mentioned above while I was away for some sports. This is unusual for my system, but I've been changing my config too much recently, so I can't really blame running 2 GPU-Grids concurrently.

For now I'll be sticking with this, until I'm sure I've got the rest sorted out: GPU-Grid long runs at 0.51 GPUs, POEM at 0.12 GPUs. This way up to 4 POEMs run along GPU-Grid, but I avoid 2 GPU-Grids for now. The POEMs do take longer this way, but I crunch all they give me and average GPU utilization is higher, so overall throughput must be higher. This will be next to impossible to quantify, though, as the amount of POEMs I run is restricted by supply. And the WU times will depend on how many POEMs I get.

MrS
Scanning for our furry friends since Jan 2002
ID: 30596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30606 - Posted: 1 Jun 2013, 12:36:36 UTC

Wow, so much angst over such a small thing. Guess that's how WW1 started too. My 1 cent: keep the limit at 2...
ID: 30606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30609 - Posted: 1 Jun 2013, 13:34:23 UTC - in response to Message 30379.  
Last modified: 1 Jun 2013, 13:46:56 UTC

I have concluded my test (where I had only 1 active GPU, which was processing 2-tasks-at-once, with a 1.5 day min buffer, and wanted to see when the 3rd new task gets started). Note, I'm using BOINC v7.1.1 alpha, which includes a major work fetch tweaking as compared to the v7.0.64 public release.
...
There was a ~14 minute "layover" where BOINC was only allowed to run 1 task on the GPU, due to GPUGrid's server-side limitation. But it did gracefully handle the scenario, did eventually get the 3rd task, and started it promptly. It worked as I expected it to work, given the server-side limitation, but it's not optimal, because we should be allowed to keep the GPU continuously fully loaded with 2 tasks. :(

I wonder if we can convince GPUGrid to relax the limit, to max-3-in-progress-per-GPU, instead of max-2-in-progress-per-GPU? In theory, that should close this gap, as it would allow the 3rd task to be downloaded/ready whenever the min-buffer says the client needed it (earlier than Task 2 completion).

I do appreciate your efforts to squeeze all of the computing power your system has by tweaking your existing software environment, and I think you've done a great job on your system!
But...
1. If you want to have a dedicated cruncher computer for GPUGrid (it means that every part of it - hardware and software - is chosen to be optimal for GPUGrid crunching and built with crunching purposes in mind) can have a different operating system (Linux, or WinXP), which helps it's hardware to perform like your (over)tweaked Win8, without (over)tweaking.
2. It's quite possible that future GPUGrid (long) tasks will use more than 1GB GPU memory (maybe as much as the GPU has), and this will make your 2 tasks at the same time tweak obsolete.
3. I think that to eliminate this 14 minutes of suboptimal crunching in every 10-16 hours of optimal crunching by server side changes (effecting every cruncher, and the whole GPUGrid workflow) does not worth the effort from the GPUGrid (or any project) staff's point of view. (Taking in consideration item 1 and 2, and the effort needed to eliminate the unexpected detrimental side-effects of such changes)
ID: 30609 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30615 - Posted: 1 Jun 2013, 17:32:58 UTC - in response to Message 30609.  

I think that to eliminate this 14 minutes of suboptimal crunching...

Out in the sticks, with a pedestrian upload max, trying 2X, I lost 90 minutes of crunch time before a new WU arrived, not 14. Below is a graph of my experience, FWIW. The black line is the period I was running 2X, the blue line is 1X.

ID: 30615 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30616 - Posted: 1 Jun 2013, 18:02:31 UTC - in response to Message 30609.  
Last modified: 1 Jun 2013, 18:03:53 UTC

Retvari,
Thanks for your response.
I do appreciate your efforts to squeeze all of the computing power your system has by tweaking your existing software environment, and I think you've done a great job on your system!

Thanks. I'm doing it for the community as well, not just me. If we can get more science done, as a community, then let's do it! Plus, I like to test and I like to push performance limits :)

But...
1. If you want to have a dedicated cruncher computer for GPUGrid (it means that every part of it - hardware and software - is chosen to be optimal for GPUGrid crunching and built with crunching purposes in mind) can have a different operating system (Linux, or WinXP), which helps it's hardware to perform like your (over)tweaked Win8, without (over)tweaking.

I do understand that getting the absolute best performance would involve choosing a specific hardware and OS combination. For me, though, and likely for others, we're just running BOINC on PCs that we use for work or for play. So... I'm just trying to make the absolute best out of it, using the hardware and OS that I would normally otherwise use without BOINC.

2. It's quite possible that future GPUGrid (long) tasks will use more than 1GB GPU memory (maybe as much as the GPU has), and this will make your 2 tasks at the same time tweak obsolete.

If the tasks' execution was changed to use more memory by default, then it would trigger a change in minimum specifications for 2-at-a-time, for sure. Perhaps 3GB would become the new minimum-recommended GPU RAM requirement for 2-at-a-time processing. But I wouldn't call the plan obsolete. It would just have to change, along with the new task requirements.... unless they totally change the tasks structure to use all of the GPU's RAM, in which case, they would make 2-at-a-time infeasible. You're right about that case.

3. I think that to eliminate this 14 minutes of suboptimal crunching in every 10-16 hours of optimal crunching by server side changes (effecting every cruncher, and the whole GPUGrid workflow) does not worth the effort from the GPUGrid (or any project) staff's point of view. (Taking in consideration item 1 and 2, and the effort needed to eliminate the unexpected detrimental side-effects of such changes)

It depends on what their priorities are, for sure. I hope they consider making it an option, since their current policy is too restrictive for some. At any rate, this server-side-limitation of max-2-in-progress-per-GPU... is the only "variable" in this performance testing that a user has no control over. And if they decide not to accommodate, then that's the way it would have to be, and I'd then recommend against 2-at-a-time (unless you happen to have another GPU in the system that isn't doing GPUGrid work, such that you could work around the problem by getting GPUGrid to give you additional tasks for the GPU that is doing GPUGrid work).

It is what it is.
We'll see if they change their policy on this.
I'm not holding my breath.

Regards,
Jacob
ID: 30616 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30658 - Posted: 4 Jun 2013, 20:48:02 UTC

I can say now for certain that the problem I was seeing (2 task failures) can not be attributed to running 2-at-once. However, I can not state anything more useful by now. I had upgraded my memory to DDR3-2400 some time ago and thought I'd settled on good settings, but this may not have been the case. I got soem serious instability over the last few days.. I wonder why it took so long to surface, but I'm positive I'll find the issue soon.

After that I'll tweak Collatz on my HD4000 before I'll continue testing here.. I won't forget about it, even if it will take some time!

MrS
Scanning for our furry friends since Jan 2002
ID: 30658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30665 - Posted: 5 Jun 2013, 9:05:50 UTC - in response to Message 30658.  

I don't know where you're located, but if the northern hemisphere and not close to the pole, maybe it is the summer and rising temperatures that made these errors appear?
ID: 30665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30674 - Posted: 5 Jun 2013, 19:23:23 UTC

Thanks, but temperatures are barely climbing over 20°C here in Germany. I stepped back from BCLK 104.5 MHz to 104.0 MHz (plus a cold start in between) and this seems to have done the trick.. for now. I want to be more careful now with further changes.. but am already OC'ing the HD4000 for Collatz.. :D

MrS
Scanning for our furry friends since Jan 2002
ID: 30674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30833 - Posted: 13 Jun 2013, 19:58:56 UTC
Last modified: 14 Jun 2013, 15:28:14 UTC

I do have some results by now which are worth sharing. I switched GPU-Grid to 0.51 GPUs, so that only one of them runs at a time (still being careful..) and some room is there for another project. In my case it's POEM. And I actually got enough work from them to test a few cases. On to the results, always using a Nathan KIDc22_SODcharge, and averaged measurements over a few minutes each:

#GPU-Grids | #POEMs | GPU load in % | GPU power in % | memory controler load in %
1 | 0 | 85.4 | 63.0 | 35.2
1 | 1 | 94.0 | 63.3 | 33.0
1 | 2 | 96.0 | 62.3 | 31.2
1 | 3 | 96.9 | 63.6 | 29.6
1 | 4 | 97.9 | 61.6 | 27.4
0 | 8 | 94.8 | 51.2 | 5

Obviously the WUs take longer this way, but I can't really quantify it since the supply of POEMs is scarce. However, what's clear is the higher average GPU load running in this configuration. GPU power consumption and memory controller load drop with an increasing number of POEMs, because the fractional runtime of POEM on the GPU increases. And POEM itself stresses the GPU significantly less than GPU-Grid.

I have not had any failures running this configuration (although the actual run time with POEMs wasn't that long) and will continue to use it at least in this way.

Edit: GPU clock was a constant 1.23 GHz in all cases (maximum boost clock), GPU temperature was always below 70°C.

MrS
Scanning for our furry friends since Jan 2002
ID: 30833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31085 - Posted: 28 Jun 2013, 8:12:11 UTC
Last modified: 28 Jun 2013, 8:13:14 UTC

I've got a test: running 2 long-runs NATHAN_KIDc22_full for 133950 credits on a GTX660Ti. A single WU needed 36.46 ks, whereas 2 concurrent ones needed 80.88 ks. While I didn't run this myself it seems pretty clear that a 11% performance loss is not what we're looking for, despite the increased GPU utilization.

Edit: it's a quite fresh install of Win 7 or 8, current driver.

MrS
Scanning for our furry friends since Jan 2002
ID: 31085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Graphics cards (GPUs) : GPU Task Performance (vs. CPU core usage, app_config, multiple GPU tasks on 1 GPU, etc.)

©2025 Universitat Pompeu Fabra