Message boards : News : monitor suspend/resume bug in 295/296 drivers
Author | Message |
---|---|
There are some reports of bugs concerning the latest NVIDIA drivers (failures when monitor goes to sleep). GPUGRID may not be immune to the bug. If it occurs to you, either | |
ID: 23636 | Rating: 0 | rate: / Reply Quote | |
The following driver sets are bugged for me: | |
ID: 23637 | Rating: 0 | rate: / Reply Quote | |
There are some reports of bugs concerning the latest NVIDIA drivers (failures when monitor goes to sleep). GPUGRID may be immune to the bug, but if it occurs to you, rollback to previous drivers. I wrote the BOINC version of the GeneferCUDA app over at PrimeGrid, and the diagnostics it's spitting out indicate that the CUDA subsystem is completely unavailable when the 295 drivers put a monitor into sleep mode. As far as I can tell, no CUDA program at all, from any project, or even non-BOINC CUDA programs, will be able to work under these circumstances. I don't know yet which platforms it affects (Windows/Linux/Mac), and I don't know if OpenCL is affected, but I'd be very surprised if the GPUGRID apps worked. We're advising people to either use an earlier driver, or make sure they've configured their system to never turn the monitors off. ____________ Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG. | |
ID: 23638 | Rating: 0 | rate: / Reply Quote | |
Does anyone know if nVIDIA is aware of/working on this issue? | |
ID: 23640 | Rating: 0 | rate: / Reply Quote | |
nVidia has been informed, but there has been no response. | |
ID: 23642 | Rating: 0 | rate: / Reply Quote | |
Thanks, Michael and Jacob, for the details. | |
ID: 23643 | Rating: 0 | rate: / Reply Quote | |
The message threads over at SETI seem to indicate its the windows driver that has the issue. It has been reported by people using a DVI connected monitor, not sure if a VGA connected monitor also has the problems. It depends on the card and if they are using a DVI to VGA adaptor. | |
ID: 23645 | Rating: 0 | rate: / Reply Quote | |
The message threads over at SETI seem to indicate its the windows driver that has the issue. It has been reported by people using a DVI connected monitor, not sure if a VGA connected monitor also has the problems. It depends on the card and if they are using a DVI to VGA adaptor. I use the HDMI connector on my GTX 580's and the issue affected me using both 295.51 beta and 295.73 WHQL drivers. I have configured power settings to never turn off the monitor and have since completed 4 tasks in a row successfully. | |
ID: 23681 | Rating: 0 | rate: / Reply Quote | |
I’ve rolled back to previous drivers thanks. 3 days of all error on milkyway, SETI, GPUGRID and Einstein. What mess! | |
ID: 23682 | Rating: 0 | rate: / Reply Quote | |
I thought I'd chime in with some more information. | |
ID: 23798 | Rating: 0 | rate: / Reply Quote | |
Thanks Jacob, I amended a post in the FAQ - Best configurations for GPUGRID thread to reflect your findings. | |
ID: 23806 | Rating: 0 | rate: / Reply Quote | |
I did a rollback to the 285.62 driver and still no work, what do i do now? | |
ID: 23931 | Rating: 0 | rate: / Reply Quote | |
What does "still no work" mean? | |
ID: 23948 | Rating: 0 | rate: / Reply Quote | |
There is a 296.10 WHQL driver out. According to the SETI guys it still has the sleep mode bug. | |
ID: 23954 | Rating: 0 | rate: / Reply Quote | |
Did not see anything CUDA-related in the changelog. | |
ID: 23956 | Rating: 0 | rate: / Reply Quote | |
Did not see anything CUDA-related in the changelog. We couldn't see anything either, though we had a good chuckle over some of them. A new bug ticket has been raised by a SETI developer and acknowledged by a named NVidia staffer. Einstein are also now in active engagement with NVidia: http://einstein.phys.uwm.edu/forum_thread.php?id=9307&nowrap=true#116397 | |
ID: 23963 | Rating: 0 | rate: / Reply Quote | |
The 266.58 are the last drivers that seem to be problem-free, no downclocking bug and obviously no sleep mode bug. AFAIK they support everything up through the GTX 580. Unless you have a game or other software that requires the newer drivers, I would suggest rolling back to those. You will have to do a clean install though and be absolutely sure that no Nvidia software remains on your system before installing them. Otherwise certain core files will remain and you might still get the same issues. | |
ID: 23964 | Rating: 0 | rate: / Reply Quote | |
266.58 doesn't work well on Ubuntu with Albert&Einstein and DistrRTgen tasks. | |
ID: 23965 | Rating: 0 | rate: / Reply Quote | |
1 am running 296.10 NVidia (WHQL) drivers. Screen saver is set never to turn monitor off or "sleep" system. GPUGRID tasks yield computation errors immediately. SETI and Einstein are functioning without errors. So what's up? | |
ID: 23998 | Rating: 0 | rate: / Reply Quote | |
Again, avoid using 295 and 296 drivers. | |
ID: 24001 | Rating: 0 | rate: / Reply Quote | |
The 266.58 are the last drivers that seem to be problem-free, no downclocking bug and obviously no sleep mode bug. 285.62 doesn't have problems. ____________ Radio Caroline, the world's most famous offshore pirate radio station. Great music since April 1964. Support Radio Caroline Team - Radio Caroline | |
ID: 24050 | Rating: 0 | rate: / Reply Quote | |
OK, so I went back step by step to to 285.72 drivers. All CUDA tasks have performed without errors. I have not been able to test with CPUGRID, as I am waiting for a new WU. | |
ID: 24132 | Rating: 0 | rate: / Reply Quote | |
Win 7 64-bit (SP1) | |
ID: 24138 | Rating: 0 | rate: / Reply Quote | |
There are many many people out there with the 295.73 driver, and must be causing thousands of errors on the GPUGRID projects. | |
ID: 24200 | Rating: 0 | rate: / Reply Quote | |
Do what? | |
ID: 24201 | Rating: 0 | rate: / Reply Quote | |
Do what? Has anyone investigated why the tasks fail with some drivers? On a global scale, Nvidia will not change their drivers to pander to a relatively small group. Therefore, shouldn't GPUGRID be looking at re-writing the code required to untertake the tasks under the newer drivers? | |
ID: 24202 | Rating: 0 | rate: / Reply Quote | |
Do what? UPDATE: I have just done a casual check on some of the top performers on GPUGRID, and note that the majority of them are experiencing multiple failures of tasks. Even some of those with 285.xx drivers. Maybe there is something else wrong here? I am also active with Seti@home, and, with one unrelated exception, have no failures on those tasks... | |
ID: 24203 | Rating: 0 | rate: / Reply Quote | |
Do what? That is a question EVERY project is asking themselves. This is what I did with PrimeGrid's GeneferCUDA application. Ken had previously done something similar with PrimeGrid's other CUDA applications. If a CUDA API call returns CUDA_ERROR_NO_DEVICE, GeneferCUDA prints a warning to stderr saying that under Windows, using RDP or using the 295/296 Nvidia driver causes the GPU to not work. Stderr is visible in the BOINC task webpage, so there's a chance the user might read it. After printing the message, Genefer goes to sleep for 10 minutes. It's still active, and doesn't return to BOINC, but it's not doing anything. This is intentially tying up the GPU, since no other BOINC task is going to be able to run on the GPU. After 10 minutes, it tries again. This continues until either the program can run successfully, or one hour elapses. After an hour, Genefer gives up, declares a computation error, and exits. This approach has two benefits. First, this error is transient and may go away while Genefer is still waiting, if either the RDP session is closed, or the monitor comes out of sleep mode. Second, in the more likely event that the problem doesn't go away, we're only failing one WU per hour instead of several per minute. This certainly doesn't solve the problem, but it does mitigate its affect on the project somewhat. | |
ID: 24204 | Rating: 0 | rate: / Reply Quote | |
On a global scale, Nvidia will not change their drivers to pander to a relatively small group. On the contrary, Nvidia told Einstein: This bug is considered as release critical (show-stopper) for the next NVIDIA driver release that's due in 2-4 weeks. Thus a fix will be available by that time. We are only a 'relatively small group' if we hide away in our separate corners and try to sort out problems like this 'one project at a time'. There are times when collective action is necessary, and if 'BOINC Central' isn't proactively co-ordinating it, then projects which have adopted the BOINC platform should go and bang on their doors until they do. | |
ID: 24205 | Rating: 0 | rate: / Reply Quote | |
The latest version of the BOINC client software + manager, that being version 7.0.25 , has taken it upon themselves not to recognize the GPU anymore, if the user has one of those 'incompatible drivers'. | |
ID: 24358 | Rating: 0 | rate: / Reply Quote | |
Sorry to double post, but it seems important: | |
ID: 24359 | Rating: 0 | rate: / Reply Quote | |
Doesn't it even seem to matter to you, that BOINC's new client and manager package, is overriding the individual projects' policies on the subject? | |
ID: 24363 | Rating: 0 | rate: / Reply Quote | |
It does, but since many people rely on auto update, or always get latest driver. This may become a moot point after awhile. From what I can tell, only people who know what they're doing didn't use those drivers anyway, and since BOINC blacklisted them it won't matter when NVIDIA releases WHQL. I mean it prevents failed WU, and even Einstein quit allowing people to use those drivers (they blacklisted WU from going to hosts w/ those drivers anyways. Even if you fixed it yourself it wouldn't work. Should actually help projects in the long run, even if its rude to the users. | |
ID: 24364 | Rating: 0 | rate: / Reply Quote | |
The latest version of the BOINC client software + manager, that being version 7.0.25 , has taken it upon themselves not to recognize the GPU anymore, if the user has one of those 'incompatible drivers'. I probably didn't dig deep enough, but where in the release notes does it say this? I couldn't find mention of this. Assuming it's true, I've got very mixed feelings about it. On the one hand, from the project side of things, this driver bug is a huge pain in the posterior. Thousands and thousands of errors, and I've got WU's over at PrimeGrid that are hitting the "too many tasks" limits because of this. From a user's perspective, it's not so nice -- but the user has the option of upgrading the driver to 301, downgrading the driver to 285, or reverting BOINC to 6.12.34 after clearing their work queue. All in all, the benefits probably outweigh the disadvantages. ____________ Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG. | |
ID: 24367 | Rating: 0 | rate: / Reply Quote | |
@Michael Goetz: It does, but since many people rely on auto update, or always get latest driver. This may become a moot point after awhile. From what I can tell, only people who know what they're doing didn't use those drivers anyway, and since BOINC blacklisted them it won't matter when NVIDIA releases WHQL. In my opinion there are two errors here. 1) Updating your graphics driver and system software, is not a trivial task. When I asked Windows Update to do it first, Windows Update updated and left me with an improper install. I could no longer open my nVidia Control Panel from that. So I had to do a manual upgrade afterward, my icons were all displaced and so on... I don't think that users who simply have their computers on auto-pilot experience that. 2) The other people who chose the 296.10 driver, have other things to do with their computers, than BOINC Work Units. We only run BOINC on the side. I'm into game development, PhysX etc.. I'd say that ~BOINC is my screensaver~, but in fact mine is the 3D Text Screensaver, with BOINC running in the background. You can't convince me to reinstall, and then re-reinstall my graphics drivers. Dirk | |
ID: 24368 | Rating: 0 | rate: / Reply Quote | |
@Michael Goetz: I'm pretty sure that's a feature that's been in BOINC for many years now, and certainly isn't driver specific. It definitely was in the 6.12.34 client, and possibly in all of the 6.x.x clients. ____________ Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG. | |
ID: 24369 | Rating: 0 | rate: / Reply Quote | |
I'm not trying to convince anyone of anything. My point is that if you are using the WHQL driver for game development and the like (i play games) than you would upgrade to newest WHQL in order to have their ( NVIDIA) latest software. In this case you would be upgrading to 300 series whenever the WHQL is released if I'm not mistaken. This was my point, if your using the latest currently, than why not upgrade when newest is released. I personally use NVIDIA website so i can do a clean install. When I made my comment I was merely saying if boinc is currently being run on the side, than I would ASSUME you would want the latest. This being 300, which is a good thing for everyone all around. | |
ID: 24370 | Rating: 0 | rate: / Reply Quote | |
My apology. I thought that I was being urged to upgrade to the beta driver, etc.. I can upgrade to the 300.xx driver as soon as it becomes WHQL, just because my current setup will continue to work for now. | |
ID: 24371 | Rating: 0 | rate: / Reply Quote | |
Quite allright. I've been trying to spread the word to various sites, b/c as Michael had stated, it has been a HUGE problem from the projects standpoint. MANY MANY errors have been caused by this monitor sleep bug, and when I said, "only the people that know what they're doing don't use it anyways" I should have been more clear. I meant to mean the BOINC ONLY crowd, but since this can be a rather small percentage on some sites, many users who aren't BOINC ONLY (they attach project and leave it) w/o ever checking results, they just keep producing errors w/o knowing it. | |
ID: 24372 | Rating: 0 | rate: / Reply Quote | |
@Michael Goetz: Trying to eliminate some confusion here: 'Service mode' and 'Protected Application Execution' are the same thing. In Windows Vista and Windows 7 GPUs can NOT be used in Service/PAE mode - in any version of BOINC (it's an OS restriction). In Windows XP, GPUs CAN be used in Service/PAE mode up to and including BOINC v6.12.34 - but not in the new BOINC v7.0.25 | |
ID: 24380 | Rating: 0 | rate: / Reply Quote | |
Toshiba has now released a new display driver for notebooks that hava a NVIDIA card installed. | |
ID: 26322 | Rating: 0 | rate: / Reply Quote | |
Toshiba has now released a new display driver for notebooks with a NVIDIA card installed. To answer my own question, in case anyone might have the same problem... I had to install the latest driver from the manufacturer of my laptop for other reasons then boincing, and since I had the driver installed I decided to test with two short WU (1 Natham and 1 Noelia). I replicated the conditions under which WU failed with 295.xx and 296.xx, as they were explained above by Jacob Klein: for anyone trying to reproduce the problem, I have found that the problem occurs when Windows powers off the monitor first, and then BOINC tries to start or resume a CUDA task while the monitor is off. This means that, if you try to reproduce it using tasks that are already running before Windows powers down the monitor, those tasks will not fail. But any tasks that try to start or resume, while the monitor is off, will fail... according to my testing. None of the WU failed, which might mean that either NVIDA solved the problem in the 296.31 version, or Toshiba did that for them. | |
ID: 29317 | Rating: 0 | rate: / Reply Quote | |
Message boards : News : monitor suspend/resume bug in 295/296 drivers