2x GPU - One GPU Errors Frequently

Message boards : Graphics cards (GPUs) : 2x GPU - One GPU Errors Frequently
Message board moderation

To post messages, you must log in.

AuthorMessage
Redirect Left

Send message
Joined: 8 Dec 12
Posts: 23
Credit: 182,017,044
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 50395 - Posted: 5 Sep 2018, 0:59:46 UTC
Last modified: 5 Sep 2018, 1:18:18 UTC

Hi there.

I am currently running two GPUs, a GTX 760, and a GTX 670. Neither are amazing for number crunching, but they're decent with games and my PC is on 24/7 - so spare time is donated to varying BOINC projects, so things taking time doesn't matter.

Anywho, tasks on the GTX 670 have a habit of erroring, with an unknown reason. An example of one of these tasks is here; http://www.gpugrid.net/result.php?resultid=18676941 - the error is related to the simulation becoming unstable.

It isn't a PSU / power issue, the draw on the PSU is less than 65% of its output on the rails related to graphics, i've tried to re-install drivers, didn't fixc anything. Both GPUs are 2GB GDDR5 VRAM editions.

Is it possible the GTX 670 isn't properly supported by GPUGrid anymore? If this is the case, is it possible to exclude the 670 from GPUGrid, but continue using the 760?

- Cheers!
ID: 50395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tullio

Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 50396 - Posted: 5 Sep 2018, 8:02:27 UTC - in response to Message 50395.  

I have a GTX 750 Ti on a Linux box, and a GTX 1050 Ti on a Windows 10 PC, none overclocked. On the Linux the GPU temperature reaches at most 63 C, on the Windows PC 80 C and then it crashes with an error message similar to yours.
Tullio
ID: 50396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 50410 - Posted: 5 Sep 2018, 20:34:32 UTC

First thing I would do is to increase the fan speed on the 670 to 100% and see if the errors reduce.

The other thing would be to run another BOINC client and exclude the 670 gpu in the cc_config.xml file.

<ignore_nvidia_dev>N</ignore_nvidia_dev>
Ignore (don't use) a specific NVIDIA GPU. You can ignore more than one. Replaces <ignore_cuda_dev/>. Requires a client restart.
Example: <ignore_nvidia_dev>0</ignore_nvidia_dev> will ignore the first NVIDIA GPU in the system.
ID: 50410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
3de64piB5uZAS6SUNt1GFDU9dRhY
Avatar

Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 50411 - Posted: 5 Sep 2018, 21:02:19 UTC

In addition to the below messages and suggestions. What if you let MSI Afterburner limit the GPU temperature to 70°C max … provided that the temperature of this card shows up in there? If not, reduce both the GPU and memory clock manually by maybe 50MHz and see where the temperature gets.
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.
ID: 50411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 50515 - Posted: 14 Sep 2018, 21:24:37 UTC - in response to Message 50395.  

... tasks on the GTX 670 have a habit of erroring, with an unknown reason. An example of one of these tasks is here; http://www.gpugrid.net/result.php?resultid=18676941 - the error is related to the simulation becoming unstable.
This is the typical error message for too high GPU clocks at the given GPU temperature.
You should reduce the GPU clock speed of your GTX 670 (or its power target in MSI Afterburner).
Judging by the stderr.txt of your tasks, your other GPU goes up to 93°C, which is dangerously high. This surely reduces the lifetime of your card.
You should increase the airflow of that card. If the two cards are next to each other, I strongly recommend you to physically remove the older card.
ID: 50515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Graphics cards (GPUs) : 2x GPU - One GPU Errors Frequently

©2025 Universitat Pompeu Fabra