Message boards :
Graphics cards (GPUs) :
GTX 295 and nothing but errors
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The machine's here http://www.gpugrid.net/results.php?hostid=24557 Originally I could do workunits. Some would still fail. After upgrading to any driver later than 182.02, nothing will run. Every workunit runs for several seconds before erroring out. I don't believe it's hardware related, as I've been able to run units before. I've also tried dropping the clock rates by about 20%, and still the same. No games show any errors or instability either. Any suggestions would be very welcome as a lot of potential work is going undone. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The machine's here First suggestion, down-level the drivers to the ones that used to work ... see if they still work ... |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
First suggestion, down-level the drivers to the ones that used to work ... see if they still work ... I've tried going back to 181.20 which worked before somewhat, but still only errors. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This is really tough, as you already tried quite a few things. - it looks like it's not hardware-related, as you already downclocked and the tasks fail immediately at the beginning, so it can't be heat either. Just to make sure: did you downclock everything, core, shader and memory? - if it was only the new drivers, downgrading them should help. Maybe try to use "Driver cleaner" before installing the one again which is known to work? Or do you ave a system restore point prior to the new drivers? - and just to be sure: did you try powering off and removing the power cord for >10 mins? - does 3D Mark still run? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is really tough, as you already tried quite a few things. The lot was backed off. Rather than failing in 4 seconds, the units went on for 7 seconds instead before failing which, which suggests either slower processing before failing, or at least indicates a response. I can't believe it's heat either, the case is well ventilated, and the card temp starts at 55C and only rises to about 64C or so under full load.
Downgrading didn't help.
The machines been reset, but not fully powered off admittedly. I'll try it later, can't hurt.
All games are completely fine so far, not artifacts, no instability. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It really sounds like the new driver changed something permanently, which is not undone by the driver downgrade. Now I remember, I think we had a similar report before.. not sure though, if the guy used a 295 or a 9800GX2. And I can't remember if the problem was solved.. You could try this or something similar to remove the NV driver completely (more "completely" than by just overwriting things with the older version). MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It really sounds like the new driver changed something permanently, which is not undone by the driver downgrade. Now I remember, I think we had a similar report before.. not sure though, if the guy used a 295 or a 9800GX2. And I can't remember if the problem was solved.. I'm inclined to think it's something in the software environment as well. I'm going to drain my work cache, and clean out the BOINC installation completely and install afresh. As it stands now, the current folder has been inherited from an older XP 64 machine, then a Vista 64 machine, then the current machine, not to mention various development versions. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm inclined to think it's something in the software environment as well. I'm going to drain my work cache, and clean out the BOINC installation completely and install afresh. As it stands now, the current folder has been inherited from an older XP 64 machine, then a Vista 64 machine, then the current machine, not to mention various development versions. If you are going to that extent, you might want to consider a fresh install of the OS. In a case like this I would copy off the BOINC folder, install the OS, and before doing updates copy back the BOINC folder and installing the version of BOINC I wanted... then as I did updates and driver installs I would be running BOINC in the background... since XP Pro takes 3-6 hours to do all the SP updates and drivers that is a way to not lose productive time ... |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you are going to that extent, you might want to consider a fresh install of the OS. Only a minor hassle, with Seti, Cosmology and Einstein all suffering outages recently it won't take long, just need to wait a day or two. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This plan is OK, but I don't think it's the BOINC install. First argument: BOINC only launches GPU-Grid.. there's not much it could do to make the tasks error out, even if it wanted to. And the 2nd: it happened when you upgraded the vid driver, not when you upgraded BOINC, right? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This plan is OK, but I don't think it's the BOINC install. First argument: BOINC only launches GPU-Grid.. there's not much it could do to make the tasks error out, even if it wanted to. And the 2nd: it happened when you upgraded the vid driver, not when you upgraded BOINC, right? Agreed, it won't be the only possibility I'll be following. I'll still be trying the driver cleaner before that as well. |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've uninstalled anything nVidia related and ran the driver cleaner. Reinstalled afterwards, and still no improvement. I did find this small app to perform CUDA based memory checks on the card. http://www.softpedia.com/progDownload/CUDA-MemTest-Download-121066.html Though my card passes the tests. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That's too bad. Now there are 2 options left: clean and reinstall BOINC and reinstall windows. Or could you test the card in some other computer, maybe even one which is known to work with GPU-Grid? Or test some single-GPU card in your PC? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm not too sure it's something inherent to the operating system. I can run the Distributed.net CUDA client without issue. |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Removed BOINC completely, including wiping the folders and all the cruft they'd accummulated. I also turned off SLI. And now I've two units running simultaneously and correctly. When I tried without the SLI before, they would still break. So I'm not sure if it's the combo of reinstall and non-SLI or not that fixed it. |
|
Send message Joined: 28 Jan 09 Posts: 19 Credit: 15,297,622 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The last post that suggested it was working was premature. It actually errored on one core, though this time after more than a minute, rather than the prior 8 seconds. I do now have it fixed. A combination of 6.6.25 and the newest 185.81 drivers works. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The last post that suggested it was working was premature. It actually errored on one core, though this time after more than a minute, rather than the prior 8 seconds. I do now have it fixed. A combination of 6.6.25 and the newest 185.81 drivers works. Ok, be advised that 6.6.25 still has the "debt" bug and in a day or two (more if you are lucky) you will see your normal cache of 4 GPU Grid tasks shrink. You have to set the reset debts flag in cc _config and stop and restart BOINC. You will get a new fresh load of tasks, rinse and repeat. On my i7 I have to do this in as often as 24 but more like 36-48 hours ... on my Q9300 it took a couple weeks to get totally snarled up. {edit} Other people have had the opposite probem, the can't keep the right amount of CPU work on hand. NOt sure why one or the other ... |
X1900AIWSend message Joined: 12 Sep 08 Posts: 74 Credit: 23,566,124 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I did find this small app to perform CUDA based memory checks on the card. There is another CUDA test software (MemtestG80) from the folding forum, but I fear testing won´t solve your problem. Announcing release: standalone memory tester for NVIDIA GPUs More details in the readme.txt. (Up to now you must register to download the software.) |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
NOt sure why one or the other ... Could be related to ressource share and the number of attached projects [which get either their cpu or gpu debts whacked]. MrS Scanning for our furry friends since Jan 2002 |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
NOt sure why one or the other ... Riotous night. I submitted what seemed like 20 posts to the lists this night. Hopefully with enough detail that they will finally start to correct some of the issues. I did a long tacking session starting about 6 last night and finally proved that the code I told them was wrong, is wrong. There is two ways to fix it, lets see what happens with that. If they opt for the change it should make 4 CPU or better machines more stable in scheduling work and not starting a task and only working on it for a few seconds before starting another task it just downloaded. Also found at least two issues with IBERCIVIS tasks, one of which is going to have to be corrected by the project the other is a flaw in BOINC when the DCF is 100 or more ... Oh, and found more evidence of the debt imbalance problems. ... which is related to what we are talking about here in this note. |
©2025 Universitat Pompeu Fabra