Advanced search

Message boards : Graphics cards (GPUs) : Any reason to not run cpu-heavy projects on a computer optimized for GPUGrid?

Author Message
Ken Florian
Send message
Joined: 4 May 12
Posts: 56
Credit: 1,832,989,878
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25168 - Posted: 19 May 2012 | 22:34:46 UTC

Is there any reason to not run cpu-heavy tasks (such as WCG) on a computer optimized for GPUGrid?

Any marginal performance improvement on gpu tasks?

2 nVidia 570's running with a 2G AMD 64 x2, 3800+ cpu

Ken Florian

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25170 - Posted: 20 May 2012 | 6:41:26 UTC - in response to Message 25168.

If you do crunch for WCG on that system I would suggest you only use one CPU core. Your systems balance is a bit GPU-heavy. Some CPU processing is required to feed your GPU's, and GTX570's are powerful cards. If you tried to use the system and crunch two GPU tasks and two CPU tasks there is a fair chance you would encounter task failures or even system instability. If you are using SWAN_SYNC don't crunch on the CPU's (but using it is not really necessary).
You would need to run GPUGrid tasks with one CPU core used and then not used to work out if it makes any difference to your system. In the end it's down to personal choice what you crunch.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25171 - Posted: 20 May 2012 | 6:45:24 UTC - in response to Message 25168.

If your two CPU cores are crunching other projects they won't have time to service the two GPUs which will leave your GPUs sitting idle. GPUs need a CPU to feed them data and collect the results of operations on the data on a nearly continuous basis.

There is a forum here titled Frequently Asked Questions. If you look at it you will probably find a thread that already answers your question as well as other questions new GPU crunchers frequently ask. I think your question about running other projects alongside GPUgrid is answered there.

Also in the Frequently Asked Questions section, there is a thread about which cards are recommended for this project and which are not. I think you'll find 9800s are not useful at this project and if you look at the results your two 9800s are returning you can see why they are not useful here.

Ken Florian
Send message
Joined: 4 May 12
Posts: 56
Credit: 1,832,989,878
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25174 - Posted: 20 May 2012 | 12:38:50 UTC - in response to Message 25171.

I'll certainly review all the faq items. My question isn't about my single running gtx 550ti. It is about reconstituting a system, with to-be-purchased gpus and a psu, explicitly for gpugrid (and only gpugrid) tasks. Last week I disabled gpugrid tasks on all my cards except for th gtx550 per input from gpugrid.

Thank you for the help.

Ken Florian

Paul Raney
Send message
Joined: 26 Dec 10
Posts: 115
Credit: 416,576,946
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 25175 - Posted: 20 May 2012 | 13:07:19 UTC - in response to Message 25174.

I run Rosetta at home on all of my GPUGrid computers to make sure the CPUs and GPUs stay nice and hot! Rosetta at home is a CPU only project so it never attempts to use the GPUs at all.

Many posts warn not to run this type of configuration but with thousands of successful work units completed, it is safe to run both CPU and GPU intensive tasks on the same machine. Usually GPUGrid applications take priority when necessary over the Rosetta apps to keep data flowing to my GPUs.

The intel Q6600 systems (overclocked of course) will keep data flowing to a GTX 570 or a GTX 580. Never tried dual cards in these systems but I am looking for a GTX 590 and that should be about the same load as 2 GTX 570s.

Let us know how it goes.
____________
Thx - Paul

Note: Please don't use driver version 295 or 296! Recommended versions are 266 - 285.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25179 - Posted: 20 May 2012 | 17:12:54 UTC - in response to Message 25175.

Running CPU tasks along GPU-Grid shouldn't crash anything software-wise. That's what we've got multi-threaded OSes for.

However, I suspect you'd see a performance drop at GPU-Grid if you run as many CPU tasks as you've got logical CPUs, especially using a high end GPU and SWAN_SYNC=1 (default).

MrS
____________
Scanning for our furry friends since Jan 2002

Paul Raney
Send message
Joined: 26 Dec 10
Posts: 115
Credit: 416,576,946
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 25181 - Posted: 21 May 2012 | 1:55:21 UTC - in response to Message 25179.

The conventional wisdom is clear that running both CPU intensive and GPU intensive apps on the same computer should cause a decrease in performance.

With Rosetta at home, I have one CPU task for each core (this includes the virtual cores on hyperthreading CPUs like the Core i7). On a Q6600, I have 4 Rosetta processes + a GPU process. On a Core i7, there are 8 Rosetta processes running + 2 GPU tasks (2 GTX 570s).

My experience is that the GPU tasks take priority over the CPU tasks. If you look at my computers, projects and results, you should notice that all systems are overclocked and that my stats are a little better than expected on both CPU and GPU projects.

It would be good to know if others have similar or different results.
____________
Thx - Paul

Note: Please don't use driver version 295 or 296! Recommended versions are 266 - 285.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25182 - Posted: 21 May 2012 | 10:47:56 UTC - in response to Message 25181.

This works because GPU-Grid runs at higher priority than CPU tasks, SWAN_SYNC=1 works quite well and GPU-Grid needs little CPU support. However, one could get really meaningful numbers by averaging over a representative set of WUs and then switching configuration and measuring again. I can't do this, as I don't have a GPU any more capable of running GPU-Grid.

MrS
____________
Scanning for our furry friends since Jan 2002

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25185 - Posted: 21 May 2012 | 16:52:50 UTC - in response to Message 25182.


You're right. Theory and conjecture are fun to bandy about in a forum, the proof is in the results so we need an experiment along the lines of what ET Apes mentions. A representative set of WUs shouldn't be too difficult to obtain and averaging is not a problem, it's keeping all the other influences and factors constant while changing only the control variable(s). I mean when I think about the activity on my own crunch box it's mayhem, the only constant is that it's chaos. If you remove the chaos then it's not a real life test and then you're back to bandying about theory and conjecture concerning what the meaningful numbers actually mean. What to do?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25189 - Posted: 21 May 2012 | 19:18:11 UTC - in response to Message 25185.

One could only consider WUs which ran overnight, or when ever the machine was not interactively used. If this is not possible (due to the long run times), a different machine may be needed. Preferably a dedicated cruncher.

MrS
____________
Scanning for our furry friends since Jan 2002

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25193 - Posted: 21 May 2012 | 21:51:59 UTC - in response to Message 25189.

OK, but how does one evaluate performance? How about recording % GPU utilization every minute or so for each task and when the task is done record the median % usage as one measure of performance. Another measure could be the run time. Are there any other data that might be useful? Obviously I'm talking about employing a script to collect the data and another to process/analyze it later.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25195 - Posted: 21 May 2012 | 22:14:17 UTC - in response to Message 25193.

I'm assuming that the project team knows how much work each WU contains. Based on this all we need is run time and credits for this WU (factoring in the early return bonus at GPU-Grid). Based on this one could generate different numbers. What I like to compare is the credits per day, as it equals the RAC if the machine could run this way undisturbed for about 1 month or so.

MrS
____________
Scanning for our furry friends since Jan 2002

Ken Florian
Send message
Joined: 4 May 12
Posts: 56
Credit: 1,832,989,878
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25197 - Posted: 21 May 2012 | 22:16:23 UTC - in response to Message 25189.

I've got one machine with a GTX 550 ti.

The only thing it does is GPUGrid and, for now, WCG. It is not used for any other automated or human-generated computing purpose.

It complete a long run in about 30 hours.

I could accumulate a set with WCG running then pul WCG out of the mix for another same-sized set.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25199 - Posted: 22 May 2012 | 1:00:01 UTC - in response to Message 25197.
Last modified: 22 May 2012 | 1:03:21 UTC

I would say go for it, kflorian, but I wonder if a same sized set is the best way to get meaningful data. There is only 1 science app in use at this project but there are several researchers submitting tasks and each type of task uses the GPU and CPU differently. The more experienced hands here can advise best but I think you/we have to make sure all types of tasks are represented equally in both sets. In other words, if the set with WCG has 10 type A, 20 type B and 16 type C tasks then the set without WCG should have the same numbers of each type of task.

I also think it might be informative if GPU usage percentage were recorded too. What I propose is far too much work to do manually so I propose a script that will gather the data for us and allow anyone who wants to use the script to accumulate data for hundreds of tasks with next to 0 effort. Once the data is recorded, stored and made available to whoever wants it, it can be analyzed at leisure any number of ways.

I can write the script to collect the data, probably in Python for ease of cross-platform compatibility, but I want to discuss what the script should do first and whether or not it's really needed.

In the meantime you can gather whatever data you want, of course, and I'm sure we'll all be interested in your report.

TheFiend
Send message
Joined: 26 Aug 11
Posts: 99
Credit: 2,500,112,138
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25203 - Posted: 22 May 2012 | 9:44:25 UTC

I run GPUGRID and Docking on my 1055T based system. Up until now I have used all 6 cores for crunching Docking. Today I have set docking to only use 5 cores to see what difference it will make to crunching times. The 6th core has been freed to service my GTX460 and GTX550Ti.

Paul Raney
Send message
Joined: 26 Dec 10
Posts: 115
Credit: 416,576,946
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 25204 - Posted: 22 May 2012 | 11:32:45 UTC - in response to Message 25203.

With GPUGrid and Rosetta running at the same time - Monitor GPU Utilization - 60 seconds.

Suspend Rosetta - all tasks - Monitor GPU Utilization - 60 seconds

After several iterations over 2 days - results - no increase in GPU utilization when Rosetta tasks are suspended. Testing on multiple computers with different size CPUs and GPUs.

This is not a scientific test but provides some interesting data points. Perhaps the GPUGrid tasks take higher priority than Rosetta. These two projects co-habitate well.


____________
Thx - Paul

Note: Please don't use driver version 295 or 296! Recommended versions are 266 - 285.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25210 - Posted: 23 May 2012 | 10:55:01 UTC - in response to Message 25204.

Dagorath, your proposed python app sounds like an excellent idea for those of us who like to optimize across projects.

It's definitely the case that you need to look at several types of tasks here. Most GPUGrid tasks of the same type have almost identical run times, so the number of work units you would need to process would not be very large.

It's also worth noting that different CPU projects use the CPU in different ways, so one CPU project would tax the CPU in a different way than another. So which CPU project you are contributing to could influence the performance of the GPU task. For projects such as WCG there are several projects. Most are x86 (32bit) but one is x64 (64bit). Some perform better on W7, others on Linux. Most are better on an x64 platform, but not necessarily all. I would suggest anyone testing to this extent begin by running one task type, and then look at a preferred mix of tasks. Should anyone be running climate models, it's likely that these would cause more variation due to the high level of disk usage.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25211 - Posted: 23 May 2012 | 19:47:22 UTC - in response to Message 25204.

@Paul: your data looks good and suggests there's not much to worry about here.

As I said before: [Running GPU-Grid with the CPU completely loaded with other BOINC tasks] works because GPU-Grid runs at higher priority than CPU tasks, SWAN_SYNC=1 works quite well and GPU-Grid needs little CPU support.

However, my main critique was that GPU utilization does not necessarily equal performance. At POEM I can get almost 10% difference in throughput at almost similar GPU utilization - the difference is less than 2%. That's why I suggested to look at actual runtimes.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 25222 - Posted: 24 May 2012 | 16:06:37 UTC

I was running a 285GTX on an old P4! CPU which was fully loaded including HT there was no problem with GPUGrid, only needed to free the CPU as FAX4 units came, because they used alot CPU Power parallel to get 24h bonus including the nearly 1hour upload ^^ So i would say the most times you can use the full cpu. But 570 are more powerful then the 285 (witch is not the slowest card too) so it could make a difference. But on my C2D 8400 @ 3,6Ghz and 560TI i can run Fax4 with CPU full loaded with not much difference.

Would say it could better to hold one core free when you want to get the maximum out of the older x2 CPU. On the other side, it seems FAX4 units are done and i can reactive CPU work now, so you can do it too :)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Post to thread

Message boards : Graphics cards (GPUs) : Any reason to not run cpu-heavy projects on a computer optimized for GPUGrid?

//