Message boards :
Graphics cards (GPUs) :
gtx295 returning nearly constant errors
Message board moderation
| Author | Message |
|---|---|
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had a gtx295 running gpugrid with only the occasional w/u that errors. Something changed on the 18th and now i get nearly constant errors. I'm running boinc 6.6.28 with NV 185.85 drivers on Vista HP 64bit. I cant work out what changed as all automatic updates are turned off and i update everything manually every weekend.Any suggestions after looking through my task history would be much appreciated. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mhh, occasionally you still finish a WU. And it looks like you alread tried a project reset? Otherwise.. tried a reboot and to switch power off + remove the power cord for >10 mins? MrS Scanning for our furry friends since Jan 2002 |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've just gone back to basics :) un-installed everything nvidia and boinc related. Restarted and cleaned out all traces of nvidia anywhere. Fresh download of 185.85 and fresh download of boinc client. Reattached to project and now running first 2 W/U. Fingers crossed it was just a bit of corruption somewhere amongst all the files i just replaced. 2 hours into the W/u so far and all looks good. Will update when they finish. |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And there goes the first error |
Bender10Send message Joined: 3 Dec 07 Posts: 167 Credit: 8,368,897 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
|
|
Send message Joined: 18 Jul 07 Posts: 67 Credit: 43,351,724 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Since the re-install the errors are only on GPU1 (meaning the second GPU of the 295) Make sure you've got it configured correctly. Bob |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Since the re-install the errors are only on GPU1 (meaning the second GPU of the 295) Not sure what there is to configure? Its not in sli mode or gpu 1 would not exist. Physx is on as it always has been. Am i missing something else as i thought that was pretty much it. Is there something else? Try going back to the 182.50 driver.... Thats what i'm now trying. Just got it ready and started 2 more w/u |
|
Send message Joined: 18 Jul 07 Posts: 67 Credit: 43,351,724 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The new drivers for some people made the need for a dummy plug/monitor on the second GPU unnecessary. As long as phyisx is enabled on the second GPU it should detect it. Others have found out they still need the dummy plug/monitor on the card. The desktop still needs to be extended onto the second GPU. The new drivers have changed the rules for cuda and has caught a few people off guard. Of course for some the new drivers wouldn't work either. Bob |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I never needed a plug and have never had to extend the desktop onto 2nd GPU. No drivers or settings changed between it working and it not working. In the NVIDIA control panel Physx is enabled and the SLI is set to not use multiple gpu mode as it always has been. |
|
Send message Joined: 18 Jul 07 Posts: 67 Credit: 43,351,724 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The amazing thing is about the drivers is that sometimes they change their mind... For this next batch of work please dont abort any as you keep aborting the work on card 0 (the first of the 2 in the 295) and we haven't seen if it will error out or not. Bob |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
For this next batch of work please dont abort any as you keep aborting the work on card 0 (the first of the 2 in the 295) and we haven't seen if it will error out or not. OK i'm now running 2 w/u and will be leaving it overnight so both cards should be on their second W/u's by the time i get up :) Will see if my hard work today paid off or not. Been a busy bee indeed. Thats the reason i cancelled all jobs and dropped out for a fresh start. Fingers crossed now or its definately the card!! |
|
Send message Joined: 18 Jul 07 Posts: 67 Credit: 43,351,724 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm sure its not the card. I see the new host (37017) currently 4 errors all on the second GPU of the 295. Nothing yet from the first GPU... Sounding more like a config issue as there would be errors from the first GPU as well if something was wrong... Bob |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I was thinking the fault might be on the 2nd gpu only hence thats why it is failing. might try 2 seti w/u and see if both finish. |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have reinstalled everything including vista so everything was clean instal with fresh driver downloads. Seeing as both gpu's get configured with same setting how can it be a setting issue. If i plug monitor into 2nd gpu instead of first and it still fails it must be card. However if first gpu starts failing tasks then it must be settings. If thats how it works anyway. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Agreed. What about other software? 3DMark or FurMark in SLI mode? Any artefacts? MrS Scanning for our furry friends since Jan 2002 |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
ok tasks are still failing on the 2nd gpu regardless of which socket monitor is plugged into. I'm going to get in touch with vendor of card on tuesday and see what they say as i refuse to overclock something that cost so much in the first place lmao. So at least my warranty is valid still. EDIT 1 in the meantime im going to run seti enhanced w/u as they dont fail it seems. Edit 2: Seti enhanced w/u for cuda fly through without a problem on both cores. I'm still going to try for a replacement as it worked then stopped working within a few weeks of use. I might even be inside the ole 28 day mark :) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Did you watch the temperatures on both projects? I suppose they're rather similar at automatic fan control? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 24 Dec 08 Posts: 13 Credit: 17,931,283 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Did you watch the temperatures on both projects? I suppose they're rather similar at automatic fan control? I had a similar problem with my gtx295, when I set the fan to 100% duty cycle I stopped having most of the WU errors. Curt |
|
Send message Joined: 18 Jul 07 Posts: 67 Credit: 43,351,724 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So its quite clear there is something wrong somewhere with the second GPU A list of things to try 1) plug monitors into both monitor sockets (if you have a second monitor)If not you could try and switch the monitor over while the PC is running (that sometimes tricks it into seeing 2 GPU's) 2) Ensure the desktop is extended onto the second GPU 3) Try disabling PhysX on the first card while enabling it on the second (I think thats possible) 4) Try disabling PhysX on both 5) Run Cuda memory tester (see sticky) on the second GPU Bob P.S. If they'll replace it that may/may not work. Its worth a try if they'll go for it. |
madas91Send message Joined: 22 Apr 09 Posts: 21 Credit: 8,119,831 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My temps for running the gtx have never been over 80 while crunching. I have the fan set to 75% which keeps it down around 72 degrees. I've tried without physx on both cards. My system doesn't need to be tricked into "seeing 2 gpu's" there are 2 or it couldn't be failing or starting on the second one. Are you copy and pasting this from elsewhere cos you've said the same thing repeatedly with the same advice time and time again. Thats not me having a go, i apreciate everyones help its just you keep telling me to do stuff so the second gpu is detected!! thats the same one that must be detected cos its failing work units. Cuda mem test i have run a couple times but in the readme it states that "In our testing, we have found that even "problematic" cards may only fail sporadically (e.g., once every 50,000 test iterations). Like other stress testing tools, to properly verify stability MemtestG80 should be run for an extended period of time." This test throws up no errors when i run it. |
©2025 Universitat Pompeu Fabra