Advanced search

Message boards : Graphics cards (GPUs) : Consecutive failures

Author Message
Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 34941 - Posted: 8 Feb 2014 | 3:00:38 UTC

My rig which has been crunching happily (1 x 660ti and 1 580 GTX) has just failed 64 times in a row. Tried the usual.
____________

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 11
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34943 - Posted: 8 Feb 2014 | 10:58:48 UTC

Looking at the temperatures of your cards, I would say checking the fan(s) on the 580 is a good place to start. Being at that level for a while may have stuffed it for good.

core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
# GPU [GeForce GTX 660 Ti] Platform [Windows] Rev [3203M] VERSION [55]
# SWAN Device 1 :
# Name : GeForce GTX 660 Ti
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:02:00.0
# Device clock : 980MHz
# Memory clock : 3004MHz
# Memory width : 192bit
# Driver version : r331_82 : 33221
# GPU 0 : 97C
# GPU 1 : 74C
# GPU 1 : 75C
# GPU 0 : 98C
# GPU 1 : 76C
# GPU 1 : 77C
# GPU 1 : 78C
# GPU 1 : 79C
# GPU 1 : 80C
# GPU 1 : 81C
# GPU 1 : 82C
# Time per step (avg over 9500000 steps): 3.649 ms
# Approximate elapsed time for entire WU: 34661.101 s
12:34:52 (4364): called boinc_finish

</stderr_txt>
]]>

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34946 - Posted: 8 Feb 2014 | 14:09:31 UTC - in response to Message 34943.
Last modified: 8 Feb 2014 | 14:12:05 UTC

One time, I had found that (somehow) Precision-X had been set to a low fan speed, without "automatic" selected. While a GPUGrid task was running on it, I noticed that the GPU was at 99*C, and its speed was throttled way down. I promptly stopped BOINC, and fixed the fan setting (turn on "automatic", with a fan curve that keeps temp below 70*C), to fix the problem.

So, if I were you, I'd investigate if a similar issue (fan software setting) is causing your GPU to hit it's 100*C thermal limit.

I think we're both lucky our GPUs are still working.

Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 34947 - Posted: 8 Feb 2014 | 15:01:33 UTC

Thanks for the input. I suspect the 580 is near the end of its useful life.

I have addressed the fan speed 2 years ago I had updated the BIOS to unlock the fan restriction and allow it to reach 100% in 2012. It always runs much hotter than the 660ti. I need to get a water block if it is to continue.

Most of the errors were.

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
]]>


____________

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34948 - Posted: 8 Feb 2014 | 15:48:11 UTC - in response to Message 34947.

I need to get a water block if it is to continue.


Not necessarily. If you can reduce the case air temperature or just reduce the temperature of the air feeding into the 580's cooling fan you can substantially reduce its operating temp. If it's fan is a radial fan you might be able to accomplish that with less than $10 in parts (some glue, tape, an empty beer box and a few screws. Compare that with a minimum $100 investment for a decent water cooling system though you mention a water block alone so maybe you have your 660ti already on water and just need to tie the 580 into the system?

If you don't want to invest anything in a card that is already 2 generations old and soon to be 3 gens old, I might be interested in buying it. I want a reasonably priced card to run Asteroids@home GPU tasks on. They need DP and the 580 is said to not be DP crippled. I can run that 580 at under 50C and full bore quite easily, no waterblock required.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34980 - Posted: 11 Feb 2014 | 15:55:05 UTC - in response to Message 34948.
Last modified: 11 Feb 2014 | 16:06:11 UTC

The King's Own, did you actually check that both GPU fans were turning and that the profile was indeed still being applied?

With time GPU's deteriorate. The fix is to downclock slightly (usually 5 or 10% is sufficient).


Always good to see another GPU app, but the performance of most GeForce card at Asteroids@home is poor. My GTX770 performed about half as well as my i7 CPU.

The double precision of a GTX580 is 1/8th that of SP, so 1581/8 = 198GFlops.

For reference a GTX770 is 134, a GTX780 is 166, a GTX780Ti is 210 GFlops.
A Titan however is 1500GFlops (DP)!

So even though a GTX580 would sit between a GTX780 and a GTX780Ti in DP performance, it's still not any better than a full CPU. IMO the GF range isn't a good fit for that project. You really need Titans or Teslas, which can do the work of five or six i7 processors. In terms of purchase cost and running costs the Titan's would be much better than i7 processors. Albeit a bit rich for my blood.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34983 - Posted: 11 Feb 2014 | 17:03:40 UTC - in response to Message 34980.

The double precision of a GTX580 is 1/8th that of SP, so 1581/8 = 198GFlops.


Thanks for that. I mistakenly thought it was 1/2 of DP. For A@H the answer is clearly Titan but as you said, too rich for my blood.

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 35021 - Posted: 13 Feb 2014 | 18:21:51 UTC

Guys,
Thanks for the input but the massive number of failures was across both cards and not IMHO cooling related.
____________

Post to thread

Message boards : Graphics cards (GPUs) : Consecutive failures

//