Please upgrade to DRIVER 334.21 or NEWER [closed]

Message boards : News : Please upgrade to DRIVER 334.21 or NEWER [closed]
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36556 - Posted: 22 Apr 2014, 21:02:17 UTC - in response to Message 36555.  

Yeah - scheduling on the Cuda capability reported by the driver is insufficient - the 331s say they do, but they don't. We've reverted to giving cuda60s out only to 334+

Matt

Thanks Matt. Highly appreciated especially as it is late evening.I have now two tasks running again at 90% GPU load at 1150MHz on the 780Ti.
Greetings from TJ
ID: 36556 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36557 - Posted: 22 Apr 2014, 21:11:59 UTC - in response to Message 36555.  

Yeah - scheduling on the Cuda capability reported by the driver is insufficient - the 331s say they do, but they don't. We've reverted to giving cuda60s out only to 334+

Matt

Mmmm...it's not working so far with 331.38 driver on Linux
ID: 36557 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GPUGRID Role account

Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 36560 - Posted: 22 Apr 2014, 21:24:07 UTC - in response to Message 36557.  

Hi Stoneageman,

This is a Linux-specific problem - turns out the boinc client in't reporting the driver version to the server, so the scheduler can't make the right allocation.

I am working on a patch (to the client)..

Matt
ID: 36560 · Rating: 0 · rate: Rate + / Rate - Report as offensive
GPUGRID Role account

Send message
Joined: 15 Feb 07
Posts: 134
Credit: 1,349,535,983
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 36561 - Posted: 22 Apr 2014, 21:25:31 UTC - in response to Message 36560.  

PS Stoneage - hope you don't think I'm mucking you but just to get the #1 slot off you! :-)

ID: 36561 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Stoneageman
Avatar

Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,374,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36563 - Posted: 22 Apr 2014, 21:50:19 UTC

Ha! On that host I've changed to the 337.12 beta driver (after some hassle) so cuda60 tasks are now running OK on that. Just don't have the time to do the rest of the farm just now.
PS, don't you ever sleep?
ID: 36563 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36617 - Posted: 24 Apr 2014, 17:10:13 UTC

Okay I have now updated to latest driver (337.50). The 780Ti goes to boast 1158MHz, without any program like AfterBurner running. Problem: temperature goes quickly to 83°C when crunching GPUGRID. I don't like those high values so I started AfterBurner, it is still working in boast speed but temperature is 76°C with 91% fan speed. What I now notice is that the GPU load is 66-67% and that is less then it has been with 331.82 driver. I have no Swan_Sync settings yet.

No statistical data yet, but as others mentioned that with drivers after 331.82 the GTX780Ti and higher hampered little on Win7. I will let it run overnight and see what happens. However temperature is now 76°C steady and was 69-72° until this afternoon, when I updated. Will also try with stock clock (875MHz) what temperature does, later tomorrow.
Greetings from TJ
ID: 36617 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jeremy Zimmerman

Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 36620 - Posted: 24 Apr 2014, 17:55:41 UTC - in response to Message 36617.  

TJ,

With some of the new WU's actually taking advantage of all those cores/SMUs/Shader units of the 780Ti, the air cooling is getting a wee bit strained especially with spring and warmer temps in the house. This is why I am using the Precision and prioritize temperature over power. It is cutting voltage and throttling GPU frequency as needed to keep the temperature desired. That is of course after the custom fan profile is at 100%.

I am enjoying the 87-90% utilization that I have seen, but those cards are sweating heavily. :) Work harder there yee poor silicon I say...work harder! Dear researchers, no easy WU's, make my silicon work hard! But do not crash them. :)

Regards,
Jeremy
ID: 36620 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36623 - Posted: 24 Apr 2014, 18:26:56 UTC - in response to Message 36620.  

Thank you Jeremy, that give me some relieve.

It is now a Gerard WU that runs at 66% and 77°C at 1055.6MHz.
This morning same WU type ran at 88% at 70°C and 875MHz.
And with ambient temperature 28°C and warmer weather on the way, I like the last values despite the lower clock.
I will wait and see how much faster it is tomorrow and then try to throttle the card with PrecisionX or revert to 331.82 drivers.
Greetings from TJ
ID: 36623 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36625 - Posted: 24 Apr 2014, 22:06:58 UTC

After some experimenting with the core clock I see that if I set it to 1060MHz with AfterBurner it stays at 1060 and GPU load is 66%. If I set he core clock to 875MHz, the GPU load increases to 71%. I have not seen this before but can imagine that it works like this.
Have set fan to max. 100% but stays at 75°C with clock at 1060MHz.

The first WU the 780Ti did with 337.50 driver is about 2000 seconds faster then with 331.82 but core clock was higher too. But temperature also and that bothers me the most as I know that my attic will become warmer in the coming weeks.
Greetings from TJ
ID: 36625 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36640 - Posted: 25 Apr 2014, 12:22:32 UTC

Hello Jeremy, one more question if I may.
I have now set the temperature target at 72°C and the power target at 88. With a new WU starts the cards boast but after a few minutes the temp. rises and clock goes down. This is off course what I want. However with the fan at 90% the temperature is 74°C steady with a GPU load of 83-84%.

What have you set in PrecisionX at temp. and power targets?
By the way I am now running 337.50 beta driver.
Greetings from TJ
ID: 36640 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36648 - Posted: 25 Apr 2014, 17:39:40 UTC - in response to Message 36550.  

Try cuda60 version 841

(By the way, this reintroduces SWAN_SYNC - if you set that to 1 you should find you get improved performance)

Matt

Since this version has the SWAN_SYNC, I've upgraded my drivers, and this version is working fine.
However I had a strange period of getting tasks which led to the decision of upgrading the drivers:
My hosts had the 332.21 (and the 326.80) driver, and they received and completed CUDA 5.5 tasks normally (CPU time = RUN time).
Then my hosts received a couple of CUDA 6.0 tasks which all have failed.
Then my hosts received CUDA 4.2 tasks which all have completed successfully.
Then I've upgraded my drivers to 337.50, and now my hosts are receiving and completing CUDA 6.0 tasks normally.

After that, I've checked the nvcuda32.dll and the nvcuda64.dll in both drivers (332.21 and 337.50), and all four dll's state that they are "NVIDIA CUDA 6.0.1 drivers" (right click -> properties -> version tab -> Product name field), but they have different file sizes. So the CUDA 6 included in previous drivers than 334.21 is not working (to be polite).

Yeah - scheduling on the Cuda capability reported by the driver is insufficient - the 331s say they do, but they don't. We've reverted to giving cuda60s out only to 334+

My experiences assure this.
Why haven't they increased at least the last digit of the driver's version number? (You don't have to answer this)
ID: 36648 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jeremy Zimmerman

Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 36662 - Posted: 26 Apr 2014, 3:53:30 UTC - in response to Message 36640.  
Last modified: 26 Apr 2014, 4:11:07 UTC

TJ,

Here are my Precision Settings for the machine with the 780Ti cards.

Power Target = 105%
Temp Target = 72°
Power and Temp are NOT linked
Prioritize Temp Target (click on arrow to point down) ***EDIT***
GPU Clock Offset = +38 (not of much good now that winter is over)
MEM Clock Offset = 0
Fan Curve = Auto
Under Fan Curve
Fan Speed Update 5000 msec
Temperature hysteresis (in °C) = 2
Force Fan Speed on each Period is not checked
35% at 30°C
40% at 50°C
45% at 60°C
60% at 65°C
100% at 70°C

Since I have the EVGA ACX cooling cards which just blow the hot air inside the case, I am using the Cooler Master Half932 case which can circulate air in and out pretty quick. Built a duct to take outside 'cool' air directly to the cpu, and then 4x120mm on the side of case blowing on the gpu's, and 3x120mm exhausting at top. Those fans are all linked to cpu temp so the fan profiles in Asus AI Suite3 are set to run them roughly where I want. Could also just run them straight from 12V, 7V, or 5V, but I like the AI Suite for cpu/case fan control.

I have a smaller half case for another machine, and both the video card and cpu could not run full clock until a side fan was installed when crunching. Side fans or open cases are critical for the non exhausting gpu's. It is really surprising what 100 cfm of outside air on a side panel blowing onto a gpu can do.

Regards,
Jeremy
ID: 36662 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36672 - Posted: 26 Apr 2014, 15:24:50 UTC - in response to Message 36662.  

Thank you very match Jeremy.

You have some settings different, that could explain my to high temperatures.
I will go for your settings and let it run for a day to see how that goes.
I have one 20cm fan in the top and 14cm at the back. No side fan.


Greetings from TJ
ID: 36672 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36693 - Posted: 27 Apr 2014, 15:34:54 UTC - in response to Message 36662.  

Hello Jeremy,

I have used your settings and indeed the card never became above 72°C, and GPU use is ~86% for a Gianni Ligand with driver 331.82

So perhaps I will update to the latest driver again and then with your settings still in place see the results. But I like few more WU's finished first to compare.
But I am very glad that I now know how I can keep the GPU at 72°. So thanks again for your help.
Kindest regards.


Greetings from TJ
ID: 36693 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jeremy Zimmerman

Send message
Joined: 13 Apr 13
Posts: 61
Credit: 726,605,417
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwat
Message 36696 - Posted: 27 Apr 2014, 17:00:58 UTC - in response to Message 36693.  

TJ,

Glad to hear it is replicated. The only issue I am facing now is the downclock of GPU speed.

* Only happening to GPU1 and not GPU2 (identical cards/bios).
* Downclocks to 548Mhz.
* Shutdown/Restart of Boinc will not reset the speed.
* Must shutdown and restart the computer to get speed resent and may go 0-3 days before downclocking again.
* Is not a thermal issue since the card is staying 72°C (actually ticks to 73 on occasion).
* Happens with both 335.23 and 337.50 drivers.
* Was not happening on the 331.82 drivers.
* Does not happen to the GTX680/GTX460 XP machines with the 331 or 335 drivers.
* I have not tried Jacob Klein's force Max Boost speed yet because I liked the temperature control of Precision. The 780Ti will sit max boost at 68C on the low utilization WU's and climbs to the upper 70's on high utilization WU's (without temp control set in Precision). So max boost would be a little rough with my current cooling setup.
So, I set for Max Boost, and it works wonderfully, even when the drivers would otherwise stupidly downclock due to supposed low utilization. Forcing Max Boost works wonders.
http://www.gpugrid.net/forum_thread.php?id=3647&nowrap=true#36320


So I will be going back to 331.82 today since I will not be able to watch the systems close. I think the new app 8.41 is now scheduling WU's correctly so I should be ok.

Regards,
Jeremy
ID: 36696 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36699 - Posted: 27 Apr 2014, 17:58:12 UTC - in response to Message 36696.  
Last modified: 27 Apr 2014, 18:03:02 UTC

Indeed Jeremy, with 331.82 drivers you will get app 8.41 and cuda42. I have not had any errors since I revered back to those drivers, but that is only about 30 hours. My 780Ti runs now at 888MHz with 72°C. I am happy with that.
Good luck with your system downgrading the driver!

Edit: with those "old" drivers we can not use SWAN_SYNC what should give our big cards some extra performance.
Greetings from TJ
ID: 36699 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36793 - Posted: 4 May 2014, 19:23:14 UTC - in response to Message 36550.  

Try cuda60 version 841

(By the way, this reintroduces SWAN_SYNC - if you set that to 1 you should find you get improved performance)

Matt

I've had a small problem twice with cuda60 v8.41:

Task 9763720
Task 9784050

Both times, I noticed that the task had 'stalled' - had counted up an unusually large elapsed time, and was making no progress.

No message on screen, no crash. I simply suspended the task for a few seconds, then resumed it, and it started from the last checkpoint without any fuss.

Stderr has this error logged:

<stderr_txt>
# GPU [GeForce GTX 670] Platform [Windows] Rev [3301M] VERSION [60]
# SWAN Device 1 :
# Name : GeForce GTX 670
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:08:00.0
# Device clock : 1084MHz
# Memory clock : 3054MHz
# Memory width : 256bit
# Driver version : r334_89 : 33523
# GPU 0 : 78C
# GPU 1 : 57C
# GPU 1 : 58C
# GPU 1 : 59C
# GPU 1 : 60C
SWAN : FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3301M] VERSION [60]
# SWAN Device 1 :
# Name : GeForce GTX 670
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:08:00.0
# Device clock : 1084MHz
# Memory clock : 3054MHz
# Memory width : 256bit
# Driver version : r334_89 : 33523
# GPU 0 : 69C
# GPU 1 : 38C

(same both times)

Note how far the temperature of GPU 1 has fallen - the tasks were probably stalled for several hours before I noticed.

Also, that 'SWAN swan_assert 0' on restart is new - I don't have SWAN_SYNC set.
ID: 36793 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36812 - Posted: 7 May 2014, 21:28:59 UTC - in response to Message 36793.  
Last modified: 7 May 2014, 21:30:46 UTC

I would predominantly be worried about the message,
    SWAN : FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965.

Apparent in both failures.
However, I would also be inclined to put the temps down to the buggy 335.23 driver; which not only fails to boost but on occasion downclocks, at least in my experience.
FWIW I suggest trying a 337.x driver...


FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 36812 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36814 - Posted: 8 May 2014, 10:28:51 UTC - in response to Message 36812.  

I would predominantly be worried about the message,
    SWAN : FATAL : Cuda driver error 719 in file 'swanlibnv2.cpp' in line 1965.

Apparent in both failures.
However, I would also be inclined to put the temps down to the buggy 335.23 driver; which not only fails to boost but on occasion downclocks, at least in my experience.
FWIW I suggest trying a 337.x driver...


I had another one yesterday:

Task 10212213

Only seems to happen on Gerard's tasks, though this is a slightly different sub-type.

Yes, I'm primarily concerned about the SWAN FATAL - that should trigger a boinc_temporary_exit, but doesn't.

No, a 20 degree drop in temperature isn't the result of a drop in boost - it's a complete cessation of processing, for several hours. I caught yesterday's much sooner, and it only had time to cool down by 2 degrees.

337.50 is still in Beta - I'll leave that to the rest of you, thanks.
ID: 36814 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36817 - Posted: 8 May 2014, 13:39:23 UTC - in response to Message 36814.  
Last modified: 8 May 2014, 13:44:25 UTC

Only seems to happen on Gerard's tasks, though this is a slightly different sub-type.

I had a BSOD a few days ago (very unusual), which I put down to performing a Windows update. Something similar happened on another PC, which I thought was connected to the most recent security update to Internet Explorer. However, on looking though BoincTasks, it seems that I was running Gerards on both of the GTX 660s at the time (WinXP, 335.28 driver).

There is nothing in the Stderr output to indicate anything other than a shutdown to install the updates, but now I am beginning to wonder.
http://www.gpugrid.net/result.php?resultid=9785157
http://www.gpugrid.net/result.php?resultid=9786033
ID: 36817 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : News : Please upgrade to DRIVER 334.21 or NEWER [closed]

©2025 Universitat Pompeu Fabra