monitor suspend/resume bug in 295/296 drivers

Message boards : News : monitor suspend/resume bug in 295/296 drivers
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24050 - Posted: 20 Mar 2012, 8:01:00 UTC - in response to Message 23964.  

The 266.58 are the last drivers that seem to be problem-free, no downclocking bug and obviously no sleep mode bug.


285.62 doesn't have problems.



Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline
ID: 24050 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Matman

Send message
Joined: 3 Oct 10
Posts: 2
Credit: 34,005,977
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24132 - Posted: 24 Mar 2012, 16:17:16 UTC

OK, so I went back step by step to to 285.72 drivers. All CUDA tasks have performed without errors. I have not been able to test with CPUGRID, as I am waiting for a new WU.

Matman
ID: 24132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
coldFuSion

Send message
Joined: 22 May 10
Posts: 20
Credit: 85,355,427
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 24138 - Posted: 24 Mar 2012, 21:58:12 UTC

Win 7 64-bit (SP1)
Dual GTX 580s
Driver: 295.73
Power Control Panel -> Turn off the display: Never

Result: no errors

ID: 24138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bob Harris

Send message
Joined: 10 Jun 11
Posts: 6
Credit: 70,330,451
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 24200 - Posted: 1 Apr 2012, 9:56:45 UTC

There are many many people out there with the 295.73 driver, and must be causing thousands of errors on the GPUGRID projects.

People in general, will be reluctant to role back drivers to earlier versions, because GPUGRID is not the primary reason to own or use a computer.


GPUGRID is wasting valuable data at this moment, because of many thousands of errors.

Therefore, shouldn't the GPUGRID team do something themselves, instead of asking every member to change drivers?
ID: 24200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24201 - Posted: 1 Apr 2012, 10:22:11 UTC - in response to Message 24200.  

Do what?

GPUGrid has to rely on the system to deal with these issues. Users that continuously fail tasks will stop getting tasks.
If you ban users with specific drivers, unless the drivers universally fail, you will be banning users that complete tasks successfully too.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 24201 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bob Harris

Send message
Joined: 10 Jun 11
Posts: 6
Credit: 70,330,451
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 24202 - Posted: 1 Apr 2012, 11:59:50 UTC - in response to Message 24201.  

Do what?

GPUGrid has to rely on the system to deal with these issues. Users that continuously fail tasks will stop getting tasks.
If you ban users with specific drivers, unless the drivers universally fail, you will be banning users that complete tasks successfully too.



Has anyone investigated why the tasks fail with some drivers?

On a global scale, Nvidia will not change their drivers to pander to a relatively small group.

Therefore, shouldn't GPUGRID be looking at re-writing the code required to untertake the tasks under the newer drivers?
ID: 24202 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bob Harris

Send message
Joined: 10 Jun 11
Posts: 6
Credit: 70,330,451
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 24203 - Posted: 1 Apr 2012, 12:07:22 UTC - in response to Message 24202.  

Do what?

GPUGrid has to rely on the system to deal with these issues. Users that continuously fail tasks will stop getting tasks.
If you ban users with specific drivers, unless the drivers universally fail, you will be banning users that complete tasks successfully too.



Has anyone investigated why the tasks fail with some drivers?

On a global scale, Nvidia will not change their drivers to pander to a relatively small group.

Therefore, shouldn't GPUGRID be looking at re-writing the code required to untertake the tasks under the newer drivers?



UPDATE:
I have just done a casual check on some of the top performers on GPUGRID, and note that the majority of them are experiencing multiple failures of tasks. Even some of those with 285.xx drivers.

Maybe there is something else wrong here?

I am also active with Seti@home, and, with one unrelated exception, have no failures on those tasks...
ID: 24203 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 39
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 24204 - Posted: 1 Apr 2012, 12:09:46 UTC - in response to Message 24201.  

Do what?


That is a question EVERY project is asking themselves. This is what I did with PrimeGrid's GeneferCUDA application. Ken had previously done something similar with PrimeGrid's other CUDA applications.

If a CUDA API call returns CUDA_ERROR_NO_DEVICE, GeneferCUDA prints a warning to stderr saying that under Windows, using RDP or using the 295/296 Nvidia driver causes the GPU to not work.

Stderr is visible in the BOINC task webpage, so there's a chance the user might read it.

After printing the message, Genefer goes to sleep for 10 minutes. It's still active, and doesn't return to BOINC, but it's not doing anything. This is intentially tying up the GPU, since no other BOINC task is going to be able to run on the GPU.

After 10 minutes, it tries again. This continues until either the program can run successfully, or one hour elapses. After an hour, Genefer gives up, declares a computation error, and exits.

This approach has two benefits. First, this error is transient and may go away while Genefer is still waiting, if either the RDP session is closed, or the monitor comes out of sleep mode. Second, in the more likely event that the problem doesn't go away, we're only failing one WU per hour instead of several per minute.

This certainly doesn't solve the problem, but it does mitigate its affect on the project somewhat.
ID: 24204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,053,468,649
RAC: 1,308,024
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24205 - Posted: 1 Apr 2012, 15:39:45 UTC - in response to Message 24202.  

On a global scale, Nvidia will not change their drivers to pander to a relatively small group.

On the contrary, Nvidia told Einstein:

This bug is considered as release critical (show-stopper) for the next NVIDIA driver release that's due in 2-4 weeks. Thus a fix will be available by that time.

We are only a 'relatively small group' if we hide away in our separate corners and try to sort out problems like this 'one project at a time'. There are times when collective action is necessary, and if 'BOINC Central' isn't proactively co-ordinating it, then projects which have adopted the BOINC platform should go and bang on their doors until they do.

ID: 24205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dirkmittler

Send message
Joined: 13 Mar 12
Posts: 21
Credit: 8,773,573
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 24358 - Posted: 10 Apr 2012, 19:46:01 UTC

The latest version of the BOINC client software + manager, that being version 7.0.25 , has taken it upon themselves not to recognize the GPU anymore, if the user has one of those 'incompatible drivers'.

I think that this represents a mistake, because now that I've programmed my own Windows 7 Pro, x64 computer never to switch off the monitor, I am handing in work units successfully again, and I do think it's unrealistic thinking from BOINC, that users will downgrade their drivers, for the sake of BOINC.

I'm using driver version 296.10 successfully now.

Actually, one reason I had for upgrading from my outdated drivers, was my concern that 'the old method' of implementing "PhysX", would have used a discrete "PPU" (Physics Processing Unit) on my graphics card, and I wanted to /make sure/ that since this approach has been abandoned by nVidia, in favor of using the "GPGPU" itself, my own graphics card should also be using the GPGPU.

Especially since the instructions for downgrading, now tell us to remove ALL nVidia software from our computers, this has become a totally infeasible thing for me to do, with PhysX and "CUDA" SDKs all installed and working.

I have to add something to the advice, for how to prevent the monitor from sleeping though. Well enough, one would set the general Power Settings, accessible through Screensaver preferences. But then it can happen that some other process tries to give the command anyway, to put the monitor to sleep, especially since /some of us/ have sundry programs installed.

The stronger setting I would recommend would be (in addition to the standard setting):

Start Menu
Type in "Edit Group Policy" into the search field and hit Enter
Computer Configuration
Administrative Templates
System
Power Management
Video And Display Settings
Turn Off Display (Plugged In AND On Battery)
--> Disabled

What this does on Windows 7 Pro at least, is take away the privileges processes would have, which we might not have kept rack of, to put the monitor to sleep.

I think that by simply banning all up-to-date device drivers, the new version of BOINC client software will kill off one major source of contributed work for you. The main reason my own GC did crash at one point, was simply the fact that I had not researched the subject (in the forums), and it's not likely to happen to me again.

Dirk
ID: 24358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24359 - Posted: 10 Apr 2012, 20:03:45 UTC - in response to Message 24358.  
Last modified: 10 Apr 2012, 20:53:19 UTC

Sorry to double post, but it seems important:

(not my words)
You'll be happy to know that 301.24 fixes the sleeping monitor Bug, althrough i haven't tried it on PrimeGrid yet, only Einstein & Seti so far,

Claggy (Some user on PG)

First post
I did 7 Setiathome offline Benches last night and couldn't get it to fail (But i'm using a different monitor to when i could get to fail with 295.xx drivers),
Before i upgraded i grabbed some BRP4Cuda work and have done some of it this morning no problem,
In a little while i'll downgrade to 295.73 and check i can get offline benches to fail on this monitor.

Second:

I downgraded my i7-2600K/GTX460/HD5770 host to 295.73, ran a setiathome offline bench, proved that the cuda apps do fail with this monitor, then upgraded back up to 301.24,

EDIT: Decided to check for myself using 301.10, and after letting monitor sleep for awhile, I resumed and saw that GPU usage remained steady. Still have to wait for validation from wingman
ID: 24359 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dirkmittler

Send message
Joined: 13 Mar 12
Posts: 21
Credit: 8,773,573
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 24363 - Posted: 10 Apr 2012, 20:52:13 UTC

Doesn't it even seem to matter to you, that BOINC's new client and manager package, is overriding the individual projects' policies on the subject?

Dirk
ID: 24363 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24364 - Posted: 10 Apr 2012, 20:56:28 UTC
Last modified: 10 Apr 2012, 20:59:35 UTC

It does, but since many people rely on auto update, or always get latest driver. This may become a moot point after awhile. From what I can tell, only people who know what they're doing didn't use those drivers anyway, and since BOINC blacklisted them it won't matter when NVIDIA releases WHQL. I mean it prevents failed WU, and even Einstein quit allowing people to use those drivers (they blacklisted WU from going to hosts w/ those drivers anyways. Even if you fixed it yourself it wouldn't work. Should actually help projects in the long run, even if its rude to the users.

EDIT: by allowing BOINC to blacklist, it would have allowed me to use my 680 on Einstein, since they would know that even though mines higher than there 290 limit (301), they would have been able to send me WU. Arrogant, kinda yea, but MANY WU are failing everywhere b/c of it. Especially here I do believe
ID: 24364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 39
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 24367 - Posted: 10 Apr 2012, 21:19:03 UTC - in response to Message 24358.  
Last modified: 10 Apr 2012, 21:43:30 UTC

The latest version of the BOINC client software + manager, that being version 7.0.25 , has taken it upon themselves not to recognize the GPU anymore, if the user has one of those 'incompatible drivers'.


I probably didn't dig deep enough, but where in the release notes does it say this? I couldn't find mention of this.

Assuming it's true, I've got very mixed feelings about it. On the one hand, from the project side of things, this driver bug is a huge pain in the posterior. Thousands and thousands of errors, and I've got WU's over at PrimeGrid that are hitting the "too many tasks" limits because of this.

From a user's perspective, it's not so nice -- but the user has the option of upgrading the driver to 301, downgrading the driver to 285, or reverting BOINC to 6.12.34 after clearing their work queue.

All in all, the benefits probably outweigh the disadvantages.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 24367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dirkmittler

Send message
Joined: 13 Mar 12
Posts: 21
Credit: 8,773,573
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 24368 - Posted: 10 Apr 2012, 21:25:40 UTC - in response to Message 24364.  
Last modified: 10 Apr 2012, 21:33:41 UTC

@Michael Goetz:

It's possible that I misread the issues with the newer BOINC Manager and Client.
What they wrote, is that when we install BOINC as a Service, OR in Protected Execution, GPU detection won't work anymore.

http://boinc.berkeley.edu/wiki/Release_Notes#Protected_Application_Execution_.28Service.29_Installation.2C_GPU_detection_and_Windows_XP

I was under the impression that 'as a service' is the opposite of 'in protected execution'.

If in fact they are one and the same thing, then I got it wrong. In that case, BOINC installed 'in User Mode' will still recognize the GPUs (without problem)... If that's so, you might want to make the text just a tad more clear about it. How does it address the malfunctions?


It does, but since many people rely on auto update, or always get latest driver. This may become a moot point after awhile. From what I can tell, only people who know what they're doing didn't use those drivers anyway, and since BOINC blacklisted them it won't matter when NVIDIA releases WHQL.


In my opinion there are two errors here.

1) Updating your graphics driver and system software, is not a trivial task. When I asked Windows Update to do it first, Windows Update updated and left me with an improper install. I could no longer open my nVidia Control Panel from that. So I had to do a manual upgrade afterward, my icons were all displaced and so on...

I don't think that users who simply have their computers on auto-pilot experience that.

2) The other people who chose the 296.10 driver, have other things to do with their computers, than BOINC Work Units. We only run BOINC on the side. I'm into game development, PhysX etc..

I'd say that ~BOINC is my screensaver~, but in fact mine is the 3D Text Screensaver, with BOINC running in the background.

You can't convince me to reinstall, and then re-reinstall my graphics drivers.

Dirk
ID: 24368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael Goetz
Avatar

Send message
Joined: 2 Mar 09
Posts: 124
Credit: 124,873,744
RAC: 39
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 24369 - Posted: 10 Apr 2012, 21:42:39 UTC - in response to Message 24368.  

@Michael Goetz:

It's possible that I misread the issues with the newer BOINC Manager and Client.
What they wrote, is that when we install BOINC as a Service, OR in Protected Execution, GPU detection won't work anymore.



I'm pretty sure that's a feature that's been in BOINC for many years now, and certainly isn't driver specific. It definitely was in the 6.12.34 client, and possibly in all of the 6.x.x clients.
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

ID: 24369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24370 - Posted: 10 Apr 2012, 21:45:58 UTC

I'm not trying to convince anyone of anything. My point is that if you are using the WHQL driver for game development and the like (i play games) than you would upgrade to newest WHQL in order to have their ( NVIDIA) latest software. In this case you would be upgrading to 300 series whenever the WHQL is released if I'm not mistaken. This was my point, if your using the latest currently, than why not upgrade when newest is released. I personally use NVIDIA website so i can do a clean install. When I made my comment I was merely saying if boinc is currently being run on the side, than I would ASSUME you would want the latest. This being 300, which is a good thing for everyone all around.
ID: 24370 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dirkmittler

Send message
Joined: 13 Mar 12
Posts: 21
Credit: 8,773,573
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 24371 - Posted: 10 Apr 2012, 21:46:23 UTC - in response to Message 24369.  
Last modified: 10 Apr 2012, 22:07:23 UTC

My apology. I thought that I was being urged to upgrade to the beta driver, etc.. I can upgrade to the 300.xx driver as soon as it becomes WHQL, just because my current setup will continue to work for now.

And I did just upgrade my client software to 7.0.25, as requested.

Dirk
ID: 24371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
5pot

Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24372 - Posted: 10 Apr 2012, 22:27:51 UTC

Quite allright. I've been trying to spread the word to various sites, b/c as Michael had stated, it has been a HUGE problem from the projects standpoint. MANY MANY errors have been caused by this monitor sleep bug, and when I said, "only the people that know what they're doing don't use it anyways" I should have been more clear. I meant to mean the BOINC ONLY crowd, but since this can be a rather small percentage on some sites, many users who aren't BOINC ONLY (they attach project and leave it) w/o ever checking results, they just keep producing errors w/o knowing it.

No need to go to beta if you already prevent monitor from sleeping, but not everyone does this, and this was the problem. People who play games etc. in spare time want/need the latest drivers in order for their system to function properly (whether it's WHQL or not).

All in all, it's great news for the BOINC community as a whole, b/c now everyone's happy (will be soon anyways when WHQL is released). From both the projects side (valid WU), and latest and greatest PhyX etc.

As always Happy Crunching
ID: 24372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,053,468,649
RAC: 1,308,024
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24380 - Posted: 11 Apr 2012, 13:34:16 UTC - in response to Message 24369.  

@Michael Goetz:

It's possible that I misread the issues with the newer BOINC Manager and Client.
What they wrote, is that when we install BOINC as a Service, OR in Protected Execution, GPU detection won't work anymore.

I'm pretty sure that's a feature that's been in BOINC for many years now, and certainly isn't driver specific. It definitely was in the 6.12.34 client, and possibly in all of the 6.x.x clients.

Trying to eliminate some confusion here:

'Service mode' and 'Protected Application Execution' are the same thing.

In Windows Vista and Windows 7 GPUs can NOT be used in Service/PAE mode - in any version of BOINC (it's an OS restriction).

In Windows XP, GPUs CAN be used in Service/PAE mode up to and including BOINC v6.12.34 - but not in the new BOINC v7.0.25
ID: 24380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : News : monitor suspend/resume bug in 295/296 drivers

©2025 Universitat Pompeu Fabra