Message boards :
News :
Please upgrade to DRIVER 334.21 or NEWER [closed]
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yeah - scheduling on the Cuda capability reported by the driver is insufficient - the 331s say they do, but they don't. We've reverted to giving cuda60s out only to 334+ Thanks Matt. Highly appreciated especially as it is late evening.I have now two tasks running again at 90% GPU load at 1150MHz on the 780Ti. Greetings from TJ |
![]() ![]() Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Yeah - scheduling on the Cuda capability reported by the driver is insufficient - the 331s say they do, but they don't. We've reverted to giving cuda60s out only to 334+ Mmmm...it's not working so far with 331.38 driver on Linux |
Send message Joined: 15 Feb 07 Posts: 134 Credit: 1,349,535,983 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Stoneageman, This is a Linux-specific problem - turns out the boinc client in't reporting the driver version to the server, so the scheduler can't make the right allocation. I am working on a patch (to the client).. Matt |
Send message Joined: 15 Feb 07 Posts: 134 Credit: 1,349,535,983 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
PS Stoneage - hope you don't think I'm mucking you but just to get the #1 slot off you! :-) |
![]() ![]() Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ha! On that host I've changed to the 337.12 beta driver (after some hassle) so cuda60 tasks are now running OK on that. Just don't have the time to do the rest of the farm just now. PS, don't you ever sleep? |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Okay I have now updated to latest driver (337.50). The 780Ti goes to boast 1158MHz, without any program like AfterBurner running. Problem: temperature goes quickly to 83°C when crunching GPUGRID. I don't like those high values so I started AfterBurner, it is still working in boast speed but temperature is 76°C with 91% fan speed. What I now notice is that the GPU load is 66-67% and that is less then it has been with 331.82 driver. I have no Swan_Sync settings yet. No statistical data yet, but as others mentioned that with drivers after 331.82 the GTX780Ti and higher hampered little on Win7. I will let it run overnight and see what happens. However temperature is now 76°C steady and was 69-72° until this afternoon, when I updated. Will also try with stock clock (875MHz) what temperature does, later tomorrow. Greetings from TJ |
Send message Joined: 13 Apr 13 Posts: 61 Credit: 726,605,417 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
TJ, With some of the new WU's actually taking advantage of all those cores/SMUs/Shader units of the 780Ti, the air cooling is getting a wee bit strained especially with spring and warmer temps in the house. This is why I am using the Precision and prioritize temperature over power. It is cutting voltage and throttling GPU frequency as needed to keep the temperature desired. That is of course after the custom fan profile is at 100%. I am enjoying the 87-90% utilization that I have seen, but those cards are sweating heavily. :) Work harder there yee poor silicon I say...work harder! Dear researchers, no easy WU's, make my silicon work hard! But do not crash them. :) Regards, Jeremy |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you Jeremy, that give me some relieve. It is now a Gerard WU that runs at 66% and 77°C at 1055.6MHz. This morning same WU type ran at 88% at 70°C and 875MHz. And with ambient temperature 28°C and warmer weather on the way, I like the last values despite the lower clock. I will wait and see how much faster it is tomorrow and then try to throttle the card with PrecisionX or revert to 331.82 drivers. Greetings from TJ |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
After some experimenting with the core clock I see that if I set it to 1060MHz with AfterBurner it stays at 1060 and GPU load is 66%. If I set he core clock to 875MHz, the GPU load increases to 71%. I have not seen this before but can imagine that it works like this. Have set fan to max. 100% but stays at 75°C with clock at 1060MHz. The first WU the 780Ti did with 337.50 driver is about 2000 seconds faster then with 331.82 but core clock was higher too. But temperature also and that bothers me the most as I know that my attic will become warmer in the coming weeks. Greetings from TJ |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello Jeremy, one more question if I may. I have now set the temperature target at 72°C and the power target at 88. With a new WU starts the cards boast but after a few minutes the temp. rises and clock goes down. This is off course what I want. However with the fan at 90% the temperature is 74°C steady with a GPU load of 83-84%. What have you set in PrecisionX at temp. and power targets? By the way I am now running 337.50 beta driver. Greetings from TJ |
![]() ![]() Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Try cuda60 version 841 Since this version has the SWAN_SYNC, I've upgraded my drivers, and this version is working fine. However I had a strange period of getting tasks which led to the decision of upgrading the drivers: My hosts had the 332.21 (and the 326.80) driver, and they received and completed CUDA 5.5 tasks normally (CPU time = RUN time). Then my hosts received a couple of CUDA 6.0 tasks which all have failed. Then my hosts received CUDA 4.2 tasks which all have completed successfully. Then I've upgraded my drivers to 337.50, and now my hosts are receiving and completing CUDA 6.0 tasks normally. After that, I've checked the nvcuda32.dll and the nvcuda64.dll in both drivers (332.21 and 337.50), and all four dll's state that they are "NVIDIA CUDA 6.0.1 drivers" (right click -> properties -> version tab -> Product name field), but they have different file sizes. So the CUDA 6 included in previous drivers than 334.21 is not working (to be polite). Yeah - scheduling on the Cuda capability reported by the driver is insufficient - the 331s say they do, but they don't. We've reverted to giving cuda60s out only to 334+ My experiences assure this. Why haven't they increased at least the last digit of the driver's version number? (You don't have to answer this) |
Send message Joined: 13 Apr 13 Posts: 61 Credit: 726,605,417 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
TJ, Here are my Precision Settings for the machine with the 780Ti cards. Power Target = 105% Temp Target = 72° Power and Temp are NOT linked Prioritize Temp Target (click on arrow to point down) ***EDIT*** GPU Clock Offset = +38 (not of much good now that winter is over) MEM Clock Offset = 0 Fan Curve = Auto Under Fan Curve Fan Speed Update 5000 msec Temperature hysteresis (in °C) = 2 Force Fan Speed on each Period is not checked 35% at 30°C 40% at 50°C 45% at 60°C 60% at 65°C 100% at 70°C Since I have the EVGA ACX cooling cards which just blow the hot air inside the case, I am using the Cooler Master Half932 case which can circulate air in and out pretty quick. Built a duct to take outside 'cool' air directly to the cpu, and then 4x120mm on the side of case blowing on the gpu's, and 3x120mm exhausting at top. Those fans are all linked to cpu temp so the fan profiles in Asus AI Suite3 are set to run them roughly where I want. Could also just run them straight from 12V, 7V, or 5V, but I like the AI Suite for cpu/case fan control. I have a smaller half case for another machine, and both the video card and cpu could not run full clock until a side fan was installed when crunching. Side fans or open cases are critical for the non exhausting gpu's. It is really surprising what 100 cfm of outside air on a side panel blowing onto a gpu can do. Regards, Jeremy |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you very match Jeremy. You have some settings different, that could explain my to high temperatures. I will go for your settings and let it run for a day to see how that goes. I have one 20cm fan in the top and 14cm at the back. No side fan. Greetings from TJ |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello Jeremy, I have used your settings and indeed the card never became above 72°C, and GPU use is ~86% for a Gianni Ligand with driver 331.82 So perhaps I will update to the latest driver again and then with your settings still in place see the results. But I like few more WU's finished first to compare. But I am very glad that I now know how I can keep the GPU at 72°. So thanks again for your help. Kindest regards. Greetings from TJ |
Send message Joined: 13 Apr 13 Posts: 61 Credit: 726,605,417 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
TJ, Glad to hear it is replicated. The only issue I am facing now is the downclock of GPU speed. * Only happening to GPU1 and not GPU2 (identical cards/bios). * Downclocks to 548Mhz. * Shutdown/Restart of Boinc will not reset the speed. * Must shutdown and restart the computer to get speed resent and may go 0-3 days before downclocking again. * Is not a thermal issue since the card is staying 72°C (actually ticks to 73 on occasion). * Happens with both 335.23 and 337.50 drivers. * Was not happening on the 331.82 drivers. * Does not happen to the GTX680/GTX460 XP machines with the 331 or 335 drivers. * I have not tried Jacob Klein's force Max Boost speed yet because I liked the temperature control of Precision. The 780Ti will sit max boost at 68C on the low utilization WU's and climbs to the upper 70's on high utilization WU's (without temp control set in Precision). So max boost would be a little rough with my current cooling setup. So, I set for Max Boost, and it works wonderfully, even when the drivers would otherwise stupidly downclock due to supposed low utilization. Forcing Max Boost works wonders. So I will be going back to 331.82 today since I will not be able to watch the systems close. I think the new app 8.41 is now scheduling WU's correctly so I should be ok. Regards, Jeremy |
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Indeed Jeremy, with 331.82 drivers you will get app 8.41 and cuda42. I have not had any errors since I revered back to those drivers, but that is only about 30 hours. My 780Ti runs now at 888MHz with 72°C. I am happy with that. Good luck with your system downgrading the driver! Edit: with those "old" drivers we can not use SWAN_SYNC what should give our big cards some extra performance. Greetings from TJ |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Try cuda60 version 841 I've had a small problem twice with cuda60 v8.41: Task 9763720 Task 9784050 Both times, I noticed that the task had 'stalled' - had counted up an unusually large elapsed time, and was making no progress. No message on screen, no crash. I simply suspended the task for a few seconds, then resumed it, and it started from the last checkpoint without any fuss. Stderr has this error logged: <stderr_txt> (same both times) Note how far the temperature of GPU 1 has fallen - the tasks were probably stalled for several hours before I noticed. Also, that 'SWAN swan_assert 0' on restart is new - I don't have SWAN_SYNC set. |
![]() Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I would predominantly be worried about the message,
Apparent in both failures. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I would predominantly be worried about the message, I had another one yesterday: Task 10212213 Only seems to happen on Gerard's tasks, though this is a slightly different sub-type. Yes, I'm primarily concerned about the SWAN FATAL - that should trigger a boinc_temporary_exit, but doesn't. No, a 20 degree drop in temperature isn't the result of a drop in boost - it's a complete cessation of processing, for several hours. I caught yesterday's much sooner, and it only had time to cool down by 2 degrees. 337.50 is still in Beta - I'll leave that to the rest of you, thanks. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Only seems to happen on Gerard's tasks, though this is a slightly different sub-type. I had a BSOD a few days ago (very unusual), which I put down to performing a Windows update. Something similar happened on another PC, which I thought was connected to the most recent security update to Internet Explorer. However, on looking though BoincTasks, it seems that I was running Gerards on both of the GTX 660s at the time (WinXP, 335.28 driver). There is nothing in the Stderr output to indicate anything other than a shutdown to install the updates, but now I am beginning to wonder. http://www.gpugrid.net/result.php?resultid=9785157 http://www.gpugrid.net/result.php?resultid=9786033 |
©2025 Universitat Pompeu Fabra