Message boards :
Graphics cards (GPUs) :
2x GPU - One GPU Errors Frequently
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 8 Dec 12 Posts: 23 Credit: 182,017,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi there. I am currently running two GPUs, a GTX 760, and a GTX 670. Neither are amazing for number crunching, but they're decent with games and my PC is on 24/7 - so spare time is donated to varying BOINC projects, so things taking time doesn't matter. Anywho, tasks on the GTX 670 have a habit of erroring, with an unknown reason. An example of one of these tasks is here; http://www.gpugrid.net/result.php?resultid=18676941 - the error is related to the simulation becoming unstable. It isn't a PSU / power issue, the draw on the PSU is less than 65% of its output on the rails related to graphics, i've tried to re-install drivers, didn't fixc anything. Both GPUs are 2GB GDDR5 VRAM editions. Is it possible the GTX 670 isn't properly supported by GPUGrid anymore? If this is the case, is it possible to exclude the 670 from GPUGrid, but continue using the 760? - Cheers! |
|
Send message Joined: 8 May 18 Posts: 190 Credit: 104,426,808 RAC: 0 Level ![]() Scientific publications
|
I have a GTX 750 Ti on a Linux box, and a GTX 1050 Ti on a Windows 10 PC, none overclocked. On the Linux the GPU temperature reaches at most 63 C, on the Windows PC 80 C and then it crashes with an error message similar to yours. Tullio |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
First thing I would do is to increase the fan speed on the 670 to 100% and see if the errors reduce. The other thing would be to run another BOINC client and exclude the 670 gpu in the cc_config.xml file. <ignore_nvidia_dev>N</ignore_nvidia_dev> Ignore (don't use) a specific NVIDIA GPU. You can ignore more than one. Replaces <ignore_cuda_dev/>. Requires a client restart. Example: <ignore_nvidia_dev>0</ignore_nvidia_dev> will ignore the first NVIDIA GPU in the system. |
|
Send message Joined: 20 Apr 15 Posts: 285 Credit: 1,102,216,607 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
In addition to the below messages and suggestions. What if you let MSI Afterburner limit the GPU temperature to 70°C max … provided that the temperature of this card shows up in there? If not, reduce both the GPU and memory clock manually by maybe 50MHz and see where the temperature gets. I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... tasks on the GTX 670 have a habit of erroring, with an unknown reason. An example of one of these tasks is here; http://www.gpugrid.net/result.php?resultid=18676941 - the error is related to the simulation becoming unstable.This is the typical error message for too high GPU clocks at the given GPU temperature. You should reduce the GPU clock speed of your GTX 670 (or its power target in MSI Afterburner). Judging by the stderr.txt of your tasks, your other GPU goes up to 93°C, which is dangerously high. This surely reduces the lifetime of your card. You should increase the airflow of that card. If the two cards are next to each other, I strongly recommend you to physically remove the older card. |
©2025 Universitat Pompeu Fabra