Advanced search

Message boards : Graphics cards (GPUs) : Problem running on both GPU's in a system

Author Message
Richard Mitnick
Avatar
Send message
Joined: 8 Feb 12
Posts: 60
Credit: 17,816,440
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 26225 - Posted: 8 Jul 2012 | 16:11:31 UTC

I am looking for the experiences of others trying to run on both GPU's in a system with two GPUS nvidia GTX580, driver 301.42 cuda 4.2.1, and SLI.



The system in Win 7 Home Premium 64 bit, hyper threaded six, 16 gigs DRAM.
Both CPU and GPU's have sealed liquid cooling systems.

BOINC is set to run 9 CPU threads @80%.

BOINC is set to not use GPUs while computer is in use and to not start up until 3 minutes after machine has been idle.

I was set to <use_all_gpus>1</use_all_gpus>.

GPU Temp monitor shows both GPU cores at 94% when BOINC is running. Temps are fine, in the 50's C.

With this configuration, I brought up GPU Temp monitor. Sure enough, BOINC GPU activity stopped. Good.

So, now I started a video. The video ran, GPU Temp showed BOINC not running GPU (the temperature dropping down significantly). But, then, soon after I shut off the video, the machine crashed.

The machine is relatively new; but has already had DRAM and hard drive replaced based on crash data which may have actually been from BOINC problems.

I have learned already that especially on the newer hotter CPU's, one can not run BOINC CPU on 100% of the threads at 100% usage. There is no config setting to allow for less than 100% usage of a GPU in use as there is for CPU's.

Is it possible that also one cannot run BOINC GPU work units on 100% of the GPU's 100% of the time?


____________
Please check out my blog
http://sciencesprings.wordpress.com
http://facebook.com/sciencesprings

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26229 - Posted: 8 Jul 2012 | 19:10:57 UTC - in response to Message 26225.

Sounds like you are using lots of settings that are not recommended for crunching here.

Firstly, SLI causes problems - don't use it. There are many posts about this.

By "9 CPU threads @80%" I think you set 9/12 threads and for those threads to be used 80% of the time?
Using less threads makes sense on such a CPU, but I would not set the system to use the CPU 80% of the time; that really has no purpose, unless you have temp issues, and if you don't use LAIM it can cause problems. It's definitely far from optimal. I suggest you use 8 threads 100% of the time, and don't use SWAN_SYNC (if you ever did).

What's your CPU core temps like?

Most people here run GPUGrid tasks 100% of the time.

Watching video might be an issue, or not - probably depends on the player, codecs, HD, and the tasks being crunched...

What are you crunching on the CPU?
Suggest you set the write to disk to a high number and use a secondary drive (if possible) for the Boinc directory.

Constantly monitoring the GPU (when running tasks) is known to cause task and system failures; they poll the GPU continuously while the app is using it!

Most of these questions are covered in the FAQ's, especially the last post in the FAQ - Best configurations for GPUGRID thread.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Richard Mitnick
Avatar
Send message
Joined: 8 Feb 12
Posts: 60
Credit: 17,816,440
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 26230 - Posted: 8 Jul 2012 | 20:15:10 UTC - in response to Message 26229.

Thanks for the help. You have given me much to think about. I read down through the whole FAQ several times. I am like a babe in the woods on so much of the technical material.

While I am trying to work through it, I have cc_cong set to ignore GPU 1.

The 9/12 threads at 80% was the best compromise I could find for temperature.

Running just one GPU work unit will surely cut down on my production; but it should avoid any conflicts.

GPU I am running GPUgrid, SETI, Einstein, and Milkyway.

I am running on CPU rosetta@home, LHC@home, Mersenne@home, SAT@home, WCG, along with the cpu tasks for Einstein@home, SETI@home, and Milkyway@home.

CPU temps are in the 60's C.

How would I shut off SLI? Maingear recommended I keep BOINC to GPU 0. "Jord" said that I could run BOINC on GPU 1; but that just did not work out well. If I am running BOINC on just GPU 0, is it still advised to shut down SLI?

Thanks again.



____________
Please check out my blog
http://sciencesprings.wordpress.com
http://facebook.com/sciencesprings

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26231 - Posted: 8 Jul 2012 | 20:33:44 UTC
Last modified: 8 Jul 2012 | 20:34:46 UTC

When it comes to running the config file I am not the best at that stuff.

BUT what I can say FOR SURE is, and this will be one of the VERY FEW times I am 100% positive, is that you do not, nor will you ever need SLI enabled.

SLI connects your two GPUs together in order to act as one. This is only used for gaming. As long as you do not play computer games, I would strongly suggest opening up the side cover of your case (while the machine is off of course), and unpluging your SLI cable. This is the little strip that connects the two GPUs together. It should come right off.

With this disconnected, you will be able to run whichever GPU you like for BOINC since your computer will see two independent GPUs.

When SLI is enabled, the computer detects two, but uses them as if they were dependent on each other (which they are).

This SHOULD fix your problem. Which ever GPU you have running your monitor-> movies, etc. This should be the GPU that is disabled for BOINC. This way, it will only be running BOINC and not ever be running any of your media applications.

So in short:

Disable SLI by disconnecting your SLI cable.

Whichever GPU your HDMI (or DVI cable) is connected to your monitor, do not run BOINC on this GPU. This will ensure that GPU is ALWAYS ready for your own personal needs, while the other GPU is ONLY being used for BOINC.

Pretty positive all of this is correct, if anybody else has any suggestions or corrections on my reply, they're always appreciated.

Cheers

Richard Mitnick
Avatar
Send message
Joined: 8 Feb 12
Posts: 60
Credit: 17,816,440
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 26233 - Posted: 8 Jul 2012 | 20:46:13 UTC - in response to Message 26231.

5pot

Thanks for the advice. You are saying that I am better off running BPOINC GPU wu's on just one card, leaving the other card free to the monitor.
____________
Please check out my blog
http://sciencesprings.wordpress.com
http://facebook.com/sciencesprings

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26237 - Posted: 8 Jul 2012 | 21:17:42 UTC

I'm only running single-GPU system, but made the experience that frequent paus/unpause cycles due to "don't use GPU while computer is in use" can cause erros. Personally I like the GPU crunching all the time.. if the lag isn't too bad (that's where a small GPU integrated into CPU comes in handy..).

And I like to run at 100% CPU usage. If the cooling system system can't handle this, I would reduce the clock speed for 100% usage and lower that voltage. That's more efficient than running less threads.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26240 - Posted: 8 Jul 2012 | 22:52:47 UTC - in response to Message 26237.
Last modified: 8 Jul 2012 | 23:07:11 UTC

I have also seen that start, stop, start, stop causes errors, but even if it does not it slows the task down. On average 2.5min per start/stop. So, far from ideal.

Snooze GPU can be used if you want to use the system without any lag.

Alas that CPU doesn't have a tiny little GPU built in, unlike others. Unfortunately getting the others to work isn't easy, especially on Linux!

By using 8/12 threads the turbo is 100MHz higher than 9/12, and 8*1.00/12 is 11% greater than 9*0.80/12. About 14% better overall, for CPU tasks, and ever so slightly better for GPU tasks too.

What hasn't been mentioned is that running a GPUGrid task, then interrupting it to run another GPU task (from another project) Will cause GPUGrid failures. Increasing the switch between tasks time, and reducing the cache (to a bare minimal) helps a lot if you really must run multiple GPU projects (which makes no sense as this is clearly the best project for Fermi cards). You can also set up program exceptions, to stop crunching when a specified program/application (video player) starts.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Richard Mitnick
Avatar
Send message
Joined: 8 Feb 12
Posts: 60
Credit: 17,816,440
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 26242 - Posted: 8 Jul 2012 | 23:58:36 UTC

Well, for now, and thanks to everyone for the exceptionally pointed advise, but, for now, here is what I am doing.

This machine was purchased for the express purpose of working on projects running on GPU processors. While GPUgrid was a prime mover for me, I do have other interests. I really would like to get credited with a pulsar over at Einstein@home. I believe that every cruncher, even the diehard "WCG only" people owe a debt to SETI@home. Milkyway@home continues an interest that I express in the ScienceSprings blog.

So, I am crunching GPU on both cards, for all four projects. but have unchecked the box for using GPU while else using the computer, and using the 3 minute delay to start GPU crunching after the computer is else used. It was probably checking that box that got me in trouble in the first place. I really do not even need the Statistics tab on BOINC Manager. I can see what I have done in BOINCStats.

This machine took me from 7,000,000 credits in March of this year, gathered since 2007, to over 24,000,000 credits now, only 4 months later. It is not the credits which interest me, but the amount of work they represent.

I am heavy on motivation, but very thin on technical skills. So, a lot of the material on the FAQ was too far over my head.

I have seven machines in the digiteria, so I am not trying to get everything done on one machine. The only thing that I really give up if I just crunch on the machine in question is my 23" monitor. I have a 19" I could put on that machine, but my other machines are laptops and they do not have a DVI port. Even the business of videos is in my case petty. All of my videos are available to my on my WDTV Live box at my big screen TV. There is no reason but laziness for me to watch anything up in the digiteria.

I will still be going over the FAQ and trying to learn more about what I am doing.

Thanks again.
____________
Please check out my blog
http://sciencesprings.wordpress.com
http://facebook.com/sciencesprings

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26243 - Posted: 9 Jul 2012 | 0:27:30 UTC

If you're using BOINC version 7 I would honestly just set an exclusion for your movie player app. This way, when you're using the computer it will not suspend but whenever you start that particular application it will.

This way you can still do whatever you want on your computer and have boinc run, but it will suspend once that app is started

Still, make sure you disable that SLI

Cheers

Richard Mitnick
Avatar
Send message
Joined: 8 Feb 12
Posts: 60
Credit: 17,816,440
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwat
Message 26244 - Posted: 9 Jul 2012 | 0:45:38 UTC - in response to Message 26243.

I am using 6.12.34.
____________
Please check out my blog
http://sciencesprings.wordpress.com
http://facebook.com/sciencesprings

dave34
Send message
Joined: 17 Jan 09
Posts: 1
Credit: 378,554,757
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26265 - Posted: 10 Jul 2012 | 8:35:56 UTC

Running two GTS 450's in SLI locks up my system. I left the SLI cable for gaming but disabled SLI in NVidia setup with a click of a button. Worked like a charm. If my machine slows editing full screen HD movie I just pause boinc until I am done and crunch the other 99% of the time 24/7.

I have found the program "TThrottle" to be a life saver. It will throttle your cpu/gpu usage based on temperature. Rules are easily added also. Before I had A/C in my home my cpu temp would jump to 90-C with ambient temp of 75-F, that is with a clean Corsair H60 CPU Cooler. I set my i7 processor to 72-C limit and gpu's to a 90-C limit (they could run hotter but running 24/7 I am going for longevity). No more worrying about overheating no matter what the ambient temperature!

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 26275 - Posted: 10 Jul 2012 | 19:15:38 UTC - in response to Message 26265.

SLI has been known to cause problems for years. Don't use it.

90°C is too hot and using TThrottle is defeatist.
We recommend using fan/temperature controlling software such as EVGA Precision or MSI Afterburner, and creating a fan to temperature profile for the GPU.

If your CPU is too hot, improve case cooling, get a better heatsink or use less cores.

Turning threads on and off is likely to increase GPU task failure rates, but it's just the wrong way to keep the card cool. If you must, downclock, but don't use TThrottle.

Some GPU's will run at 93°C if you let them. Don't, this will reduce the life expectancy of the card and increase failures. You need to turn the fans up.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Graphics cards (GPUs) : Problem running on both GPU's in a system

//