gtx295 returning nearly constant errors

Message boards : Graphics cards (GPUs) : gtx295 returning nearly constant errors
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10014 - Posted: 21 May 2009, 8:46:15 UTC

I had a gtx295 running gpugrid with only the occasional w/u that errors.

Something changed on the 18th and now i get nearly constant errors.
I'm running boinc 6.6.28 with NV 185.85 drivers on Vista HP 64bit.

I cant work out what changed as all automatic updates are turned off and i update everything manually every weekend.Any suggestions after looking through my task history would be much appreciated.
ID: 10014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10024 - Posted: 21 May 2009, 10:24:34 UTC - in response to Message 10014.  

Mhh, occasionally you still finish a WU. And it looks like you alread tried a project reset? Otherwise.. tried a reboot and to switch power off + remove the power cord for >10 mins?

MrS
Scanning for our furry friends since Jan 2002
ID: 10024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10031 - Posted: 21 May 2009, 11:58:05 UTC - in response to Message 10024.  

I've just gone back to basics :) un-installed everything nvidia and boinc related. Restarted and cleaned out all traces of nvidia anywhere. Fresh download of 185.85 and fresh download of boinc client.
Reattached to project and now running first 2 W/U. Fingers crossed it was just a bit of corruption somewhere amongst all the files i just replaced.

2 hours into the W/u so far and all looks good.

Will update when they finish.



ID: 10031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10032 - Posted: 21 May 2009, 12:22:33 UTC - in response to Message 10031.  

And there goes the first error
ID: 10032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bender10
Avatar

Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 10035 - Posted: 21 May 2009, 14:21:54 UTC - in response to Message 10032.  

Try going back to the 182.50 driver....


Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.
ID: 10035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popandbob

Send message
Joined: 18 Jul 07
Posts: 67
Credit: 43,351,724
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10043 - Posted: 21 May 2009, 18:55:22 UTC
Last modified: 21 May 2009, 18:56:41 UTC

Since the re-install the errors are only on GPU1 (meaning the second GPU of the 295)
Make sure you've got it configured correctly.

Bob
ID: 10043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10049 - Posted: 21 May 2009, 20:46:16 UTC - in response to Message 10043.  

Since the re-install the errors are only on GPU1 (meaning the second GPU of the 295)
Make sure you've got it configured correctly.

Bob

Not sure what there is to configure? Its not in sli mode or gpu 1 would not exist.
Physx is on as it always has been. Am i missing something else as i thought that was pretty much it. Is there something else?


Try going back to the 182.50 driver....


Thats what i'm now trying. Just got it ready and started 2 more w/u



ID: 10049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popandbob

Send message
Joined: 18 Jul 07
Posts: 67
Credit: 43,351,724
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10057 - Posted: 22 May 2009, 0:46:50 UTC

The new drivers for some people made the need for a dummy plug/monitor on the second GPU unnecessary. As long as phyisx is enabled on the second GPU it should detect it. Others have found out they still need the dummy plug/monitor on the card.
The desktop still needs to be extended onto the second GPU.

The new drivers have changed the rules for cuda and has caught a few people off guard. Of course for some the new drivers wouldn't work either.

Bob
ID: 10057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10064 - Posted: 22 May 2009, 11:34:32 UTC - in response to Message 10057.  

I never needed a plug and have never had to extend the desktop onto 2nd GPU.
No drivers or settings changed between it working and it not working.
In the NVIDIA control panel Physx is enabled and the SLI is set to not use multiple gpu mode as it always has been.


ID: 10064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popandbob

Send message
Joined: 18 Jul 07
Posts: 67
Credit: 43,351,724
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10067 - Posted: 22 May 2009, 18:50:24 UTC

The amazing thing is about the drivers is that sometimes they change their mind...

For this next batch of work please dont abort any as you keep aborting the work on card 0 (the first of the 2 in the 295) and we haven't seen if it will error out or not.

Bob
ID: 10067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10073 - Posted: 22 May 2009, 22:29:33 UTC - in response to Message 10067.  

For this next batch of work please dont abort any as you keep aborting the work on card 0 (the first of the 2 in the 295) and we haven't seen if it will error out or not.

Bob

OK i'm now running 2 w/u and will be leaving it overnight so both cards should be on their second W/u's by the time i get up :)

Will see if my hard work today paid off or not. Been a busy bee indeed.

Thats the reason i cancelled all jobs and dropped out for a fresh start.

Fingers crossed now or its definately the card!!
ID: 10073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popandbob

Send message
Joined: 18 Jul 07
Posts: 67
Credit: 43,351,724
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10081 - Posted: 23 May 2009, 5:11:10 UTC

I'm sure its not the card.
I see the new host (37017)
currently 4 errors all on the second GPU of the 295.
Nothing yet from the first GPU...
Sounding more like a config issue as there would be errors from the first GPU as well if something was wrong...

Bob
ID: 10081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10085 - Posted: 23 May 2009, 10:24:10 UTC - in response to Message 10081.  

I was thinking the fault might be on the 2nd gpu only hence thats why it is failing.

might try 2 seti w/u and see if both finish.

ID: 10085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10086 - Posted: 23 May 2009, 10:28:11 UTC - in response to Message 10085.  

I have reinstalled everything including vista so everything was clean instal with fresh driver downloads.
Seeing as both gpu's get configured with same setting how can it be a setting issue. If i plug monitor into 2nd gpu instead of first and it still fails it must be card. However if first gpu starts failing tasks then it must be settings. If thats how it works anyway.
ID: 10086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10092 - Posted: 23 May 2009, 14:59:38 UTC - in response to Message 10086.  

Agreed. What about other software? 3DMark or FurMark in SLI mode? Any artefacts?

MrS
Scanning for our furry friends since Jan 2002
ID: 10092 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10096 - Posted: 23 May 2009, 17:11:50 UTC - in response to Message 10092.  
Last modified: 23 May 2009, 18:00:05 UTC

ok tasks are still failing on the 2nd gpu regardless of which socket monitor is plugged into.
I'm going to get in touch with vendor of card on tuesday and see what they say as i refuse to overclock something that cost so much in the first place lmao. So at least my warranty is valid still.
EDIT 1
in the meantime im going to run seti enhanced w/u as they dont fail it seems.

Edit 2: Seti enhanced w/u for cuda fly through without a problem on both cores. I'm still going to try for a replacement as it worked then stopped working within a few weeks of use. I might even be inside the ole 28 day mark :)
ID: 10096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10102 - Posted: 23 May 2009, 19:41:26 UTC - in response to Message 10096.  

Did you watch the temperatures on both projects? I suppose they're rather similar at automatic fan control?

MrS
Scanning for our furry friends since Jan 2002
ID: 10102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Curt Timmerman

Send message
Joined: 24 Dec 08
Posts: 13
Credit: 17,931,283
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 10114 - Posted: 23 May 2009, 22:45:51 UTC - in response to Message 10102.  

Did you watch the temperatures on both projects? I suppose they're rather similar at automatic fan control?

MrS


I had a similar problem with my gtx295, when I set the fan to 100% duty cycle I stopped having most of the WU errors.

Curt
ID: 10114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popandbob

Send message
Joined: 18 Jul 07
Posts: 67
Credit: 43,351,724
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10118 - Posted: 24 May 2009, 0:21:22 UTC

So its quite clear there is something wrong somewhere with the second GPU

A list of things to try

1) plug monitors into both monitor sockets (if you have a second monitor)If not you could try and switch the monitor over while the PC is running (that sometimes tricks it into seeing 2 GPU's)
2) Ensure the desktop is extended onto the second GPU
3) Try disabling PhysX on the first card while enabling it on the second (I think thats possible)
4) Try disabling PhysX on both

5) Run Cuda memory tester (see sticky) on the second GPU

Bob

P.S. If they'll replace it that may/may not work. Its worth a try if they'll go for it.
ID: 10118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile madas91

Send message
Joined: 22 Apr 09
Posts: 21
Credit: 8,119,831
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 10120 - Posted: 24 May 2009, 7:20:13 UTC - in response to Message 10118.  

My temps for running the gtx have never been over 80 while crunching. I have the fan set to 75% which keeps it down around 72 degrees.

I've tried without physx on both cards.
My system doesn't need to be tricked into "seeing 2 gpu's" there are 2 or it couldn't be failing or starting on the second one.

Are you copy and pasting this from elsewhere cos you've said the same thing repeatedly with the same advice time and time again.
Thats not me having a go, i apreciate everyones help its just you keep telling me to do stuff so the second gpu is detected!! thats the same one that must be detected cos its failing work units.

Cuda mem test i have run a couple times but in the readme it states that "In our
testing, we have found that even "problematic" cards may only fail sporadically
(e.g., once every 50,000 test iterations). Like other stress testing tools,
to properly verify stability MemtestG80 should be run for an extended period of
time."

This test throws up no errors when i run it.
ID: 10120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Graphics cards (GPUs) : gtx295 returning nearly constant errors

©2025 Universitat Pompeu Fabra