Advanced search

Message boards : Graphics cards (GPUs) : cuda driver error 719

Author Message
a1kabear
Send message
Joined: 19 Oct 13
Posts: 15
Credit: 578,770,199
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34584 - Posted: 6 Jan 2014 | 17:10:24 UTC

Hi!

So I got two new 780 cards but I am getting a few errors "cuda driver error 719" although the work units seem to complete successfully and I get credited. Is this error message something I should be bothered about and should try to fix by underclocking maybe?

http://www.gpugrid.net/result.php?resultid=7625460
http://www.gpugrid.net/result.php?resultid=7624300

A couple recent example tasks.

Thanks for any help :)

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34585 - Posted: 6 Jan 2014 | 19:01:57 UTC - in response to Message 34584.

Made URLs clickable.

http://www.gpugrid.net/result.php?resultid=7625460
http://www.gpugrid.net/result.php?resultid=7624300
____________
BOINC <<--- credit whores, pedants, alien hunters

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34586 - Posted: 6 Jan 2014 | 19:27:11 UTC - in response to Message 34584.

As this happened on you Windows machine, you can check if the GPU Clock has down clocked (by half). I guess that this has happened as I have had these fatal cuda driver errors as well. Tasks do result good, but take way longer to complete. Rebooting is the only option to get the GPU Clock work at its operational speed again.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34587 - Posted: 6 Jan 2014 | 19:56:00 UTC - in response to Message 34586.

a1kabear, use a tool such as MSI Afterburner to control the GPU fan speeds and temperatures. Try to keep the temperature below 70C.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

a1kabear
Send message
Joined: 19 Oct 13
Posts: 15
Credit: 578,770,199
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34588 - Posted: 7 Jan 2014 | 2:55:05 UTC
Last modified: 7 Jan 2014 | 2:55:27 UTC

Thanks for the suggestions and sorry about the non-clickable links. I'm gonna try it all on Linux today and see how it does. If I still get the errors/poor performance I will go back to windows and try to keep temps down, maybe underclock, maybe overvolt a bit.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34589 - Posted: 7 Jan 2014 | 8:04:54 UTC - in response to Message 34588.
Last modified: 7 Jan 2014 | 8:07:01 UTC

It sounds like you might have one of the Kamikaze Kards that wants to die an early death by allowing the temp to hit 80C before it ramps up the fan speed. I have one of those. I don't know if it's the fault of the BIOS on the video card, the driver or some combination of the 2 but other than flashing the BIOS, the only way to manage temps on those nasty cards is to configure COOLBITS which allows manual fan control. I see you have a few Linux machines already so you're probably familiar with COOLBITS already but I thought I would just check and make sure.
____________
BOINC <<--- credit whores, pedants, alien hunters

a1kabear
Send message
Joined: 19 Oct 13
Posts: 15
Credit: 578,770,199
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34591 - Posted: 7 Jan 2014 | 10:15:31 UTC - in response to Message 34589.

Yes I tried coolbits but it seems to only activate on the first card. I read I need to either connect another display to the second card or a virtual display (complicated maybe) to get the fan controls for the second card. However the fan was already running quite high (I am in Thailand, ambient temperature here is high) and pushing it up more only got me an extra 3c off and its noticeably louder.

Tomorrow I will probably put Windows on another drive in the machine and backup the bios, then use nibitor to lower the stock clocks to the reference card speeds (im using 780 lightning) and flash both cards and jump back to linux.

I am also considering changing the BCLK on the motherboard to lower the speed of pci-express/cpu/memory which should also reduce the work on the cards helping them run cooler. This may be a lot easier for me than installing windows and flashing. However with a non-standard BCLK maybe it wont be stable, no idea.

From this point on I wont be buying any "OC" cards they are too much trouble for folding :P

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34592 - Posted: 7 Jan 2014 | 12:02:52 UTC - in response to Message 34591.

Yes I tried coolbits but it seems to only activate on the first card. I read I need to either connect another display to the second card or a virtual display (complicated maybe) to get the fan controls for the second card.


Hi, As says to control the fan on more than one GPU in Linux using Coolbits requires that each GPU has an associated real or virtual screen.

It's not complicated, it's annoying, I think there is a site that has already been explained as connecting the second or more virtual monitors (but not exactly the link).

If you are interested I can hang the system I've used to activate a second monitor and to control the fan of my two GPUs. in Ubuntu.

Dagorath
Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34593 - Posted: 7 Jan 2014 | 16:53:54 UTC - in response to Message 34591.

As for controlling temps after you get COOLBITS configured, I am interested in Carlesa's solution. I know it involves gkrellm and lsensors (?) but I've never been able to figure it out so far so I wrote my own GPU temp control app in Python, still in development, get it at https://github.com/Dagorath/gpu_d.

Yes I tried coolbits but it seems to only activate on the first card. I read I need to either connect another display to the second card or a virtual display (complicated maybe) to get the fan controls for the second card. However the fan was already running quite high (I am in Thailand, ambient temperature here is high) and pushing it up more only got me an extra 3c off and its noticeably louder.


Downclocking is one way to handle high ambient but IMHO a better way is to duct the heat from the computer to the outdoors, in other words get it out of the house/office ASAP and that means not allowing it to mix in with the air in the room, collect it at the back of the computer and push/suck it outside immediately. It's not hard to do.

There are 2 solutions for getting the card to initialize without a monitor plugged in. I prefer [url=blog.zorinaq.com?e=11]the VGA dummy plug[url] hardware solution over the software solution because I always manage to somehow undo the software solution and then I have to search for the link that explains how to fix what I broke. The hardware solution is permanent and safe if you do it properly. By properly I mean making sure the resistors can't slip out and can't accidentally touch ground. To do that I cut the resistor leads short enough so that they don't stick out so far and then I bent them over and taped them down. When you bend them over make sure the lead(s) inserted into the c1/c2/c3 holes don't touch the leads inserted into the ground hole or the metal around the surrounding the plug. In the blog article one of the posters claims shorting c1/c2/c3 to ground works and doesn't cause any damage but I would avoid shorts as that is not the specification the card is designed to operate at. 75 ohms is the ideal resistance but you won't find a 75 ohm resistor. 68 ohm and 100 ohm resistors are common and are close enough to the 75 ohm ideal.

The software solution is to use nvidia-xconfig tool shipped with the driver.

sudo nvidia-xconfig --help
sudo nvidia-xconfig --advanced-help | less
sudo nvidia-xconfig --enable-all-gpus
sudo nvidia-xconfig --cool-bits=5

The last 2 commands do what you want, run them in the order shown. Some say use cool-bits=5 while other say cool-bits=4. Both work but 5 supposedly unlocks the clocks as well. I say supposedly because it used to unlock the clocks on my GTX 570 with older drivers but it doesn't work on 6xx cards with newer drivers Or maybe there is now a second lock that must be bypassed, I dunno, I use 5 in case I come across a way to get around that second lock though I think it's going to take a driver hack to do it, still working on that.

I am also considering changing the BCLK on the motherboard to lower the speed of pci-express/cpu/memory which should also reduce the work on the cards helping them run cooler. This may be a lot easier for me than installing windows and flashing. However with a non-standard BCLK maybe it wont be stable, no idea.


"Whatever works" they say but IMHO that's the unnecessarily complicated approach and it defeats the purpose to some extent. Deal with the culprit directly. The culprit is ambient temperature. Deal with that first and if that isn't enough then augment it with other strategies. I mean if your car's brakes don't work and you don't like getting injured in smash ups then you could install an elaborate system of rubber bumpers mounted on shock absorbers and more airbags but the best solution is to just fix the damn brakes :-)

____________
BOINC <<--- credit whores, pedants, alien hunters

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34594 - Posted: 7 Jan 2014 | 17:38:14 UTC - in response to Message 34593.
Last modified: 7 Jan 2014 | 17:39:44 UTC

Hello: Using gkrellm + lmsensors is allowing only have information, but not contol, the temperatures of the CPU and GPUs (each independent) also the fan speed and set reminder alarms, also other interesting information.

It is easy to mount all in the Ubuntu repositories, the first thing is to install "lmsensors" and run in a terminal " sudo sensors-detect " Care final question, and the default is NO but you should select YES to automatically add the lines" Do you want to add these lines to /etc/modules automatically? (yes/NO)= YES.

And let detects all board sensor, then if we install gkrellm (another useful too) we can this information.

The solution for the virtual hardware monitor GPU is what I recommend, it works perfectly.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2356
Credit: 16,377,028,840
RAC: 3,486,991
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34596 - Posted: 7 Jan 2014 | 19:16:15 UTC - in response to Message 34591.

(im using 780 lightning)

The cooler of this card blows the hot air into the case, in that way if you have two such cards, they gonna heat each other (and the MB, CPU, etc). You have to blow cool air into the case with case fans, or take the side panel off.

I am also considering changing the BCLK on the motherboard to lower the speed of pci-express/cpu/memory which should also reduce the work on the cards helping them run cooler. This may be a lot easier for me than installing windows and flashing. However with a non-standard BCLK maybe it wont be stable, no idea.

Do not lower (or raise) the PCIe frequency, it has strict nominal working frequency (100MHz). Changing it will make your GPUs unreliable.
It's much better if you lower the frequency of your GPUs (by flashing it's BIOS, or by using a 3rd party utility)

Post to thread

Message boards : Graphics cards (GPUs) : cuda driver error 719

//