Message boards : Number crunching : Monitor sometimes becomes black while crunching GPUGRID
Author | Message |
---|---|
on the same machine with 2 GTX980ti on which I have been crunching GPUGRID for 2 1/2 years on Windows XP, I have installed Windows 10 recently. | |
ID: 50139 | Rating: 0 | rate: / Reply Quote | |
Driver crashes are a common problem when a GPU is overheated or overclocked. | |
ID: 50140 | Rating: 0 | rate: / Reply Quote | |
You can install TThrottle and record the GPU temps in order to find out. After a crash, you can reboot and check the temperature graphs of the last 24 hours. In TThrottle you may also set a particular max. temperature to shut down the PC automatically before the GPU gets damaged, I normally set it to 85°C. | |
ID: 50141 | Rating: 0 | rate: / Reply Quote | |
If you don't find this problem is heat related it could be that you are not using a good driver. You mentioned that you installed Windows 10. That implies that you did a clean or new install rather than an upgrade. | |
ID: 50142 | Rating: 0 | rate: / Reply Quote | |
Driver crashes are a common problem when a GPU is overheated or overclocked. heat and/or overclocking should not be the problem. As before in Windows XP, the GPU temp is around 61/62 °C, clock around default value. I would rather guess that is has to do with what is described here: If you don't find this problem is heat related it could be that you are not using a good driver. You mentioned that you installed Windows 10. That implies that you did a clean or new install rather than an upgrade. I used the driver that originally came with the new install of Windows 10,it was version 388.. The driver I now downloaded from NVIDIA is 398.36. Installation worked without problems, so I restarted BOINC / GPUGRID and will see what happens (even the tasks which I interruped for the new driver installation were continued normally) | |
ID: 50144 | Rating: 0 | rate: / Reply Quote | |
... The driver I now downloaded from NVIDIA is 398.36. Installation worked without problems, so I restarted BOINC / GPUGRID and will see what happens (even the tasks which I interruped for the new driver installation were continued normally) The problem still exists, despite of the new driver :-((( Again, I looked up the Event log of Windows (System), and it shows the above cited warning ("the graphic driver nvlddmkm does no longer react and was restored") many times from 10:19a.m. on, in exactly 4-seconds-intervals, until 10:52 - the time I pushed the "off"- button. Anyone any idea what could be the reason? What can I do in order to get GPUGRID work properly? | |
ID: 50147 | Rating: 0 | rate: / Reply Quote | |
as I wrote, in case the GPU temperatures are OK, that unfortunately does not mean anything as the memory chips or voltage regulators temps don't show up I would take the 980ti GPUs out one after another, to see if the problem is related to a particular card. If the system works properly, having only one GPU installed, then you know. You may also want to move the 980ti's into another PC to see if the error moves along with one or another card. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 50148 | Rating: 0 | rate: / Reply Quote | |
The problem still exists, despite of the new driver :-((( In Windows, the only proper way to install a new driver is to first uninstall the old driver. And it has to be a clean uninstall using Display Driver Uninstaller (DDU), to get rid of all the traces of the old one (chose the option to reboot into Safe Mode). https://www.wagnardsoft.com/forums/viewtopic.php?f=5&t=1174&sid=38069867de013db1e7c3bd469b98c82a You might think that Nvidia would do that themselves, but they don't. | |
ID: 50149 | Rating: 0 | rate: / Reply Quote | |
In Windows, the only proper way to install a new driver is to first uninstall the old driver. And it has to be a clean uninstall using Display Driver Uninstaller (DDU), to get rid of all the traces of the old one (chose the option to reboot into Safe Mode). that's exactly what I did, anyway. I am trying now various methods to "delimit" the problem: - right now, I am crunching SETI@home tasks, so I'll see, whether the problem occurs also there. If so, then I might install - Folding@Home, which is working with OpenGL (in contrast to GPUGRID and SETI, both of which work with CUDA). If the problem persists in both of the above cases, then I will revert back to Windows XP (on the same machine, with dual boot) which I have used for the past 2 1/2 years, and see,if the problem also occurs then. In case it does, then I am afraid that JoergF may be right when assuming that there may be some kind of hardware failure :-((( | |
ID: 50150 | Rating: 0 | rate: / Reply Quote | |
I had the same problem, it was a power saving issue with the BIOS and Windows 10 | |
ID: 50151 | Rating: 0 | rate: / Reply Quote | |
I had the same problem, it was a power saving issue with the BIOS and Windows 10 How did you resolve it? | |
ID: 50152 | Rating: 0 | rate: / Reply Quote | |
I had the same problem, it was a power saving issue with the BIOS and Windows 10 the strange thing is, though, that the problem did NOT occur within the first 2-3 days after the installation of Windows 10; but only after I started GPUGRID crunching (before, I was only crunching LHC tasks via the CPU). Still your reply to Zoltan's question How did you resolve it?would be very interesting. | |
ID: 50153 | Rating: 0 | rate: / Reply Quote | |
On my Windows 10 PC SETI@home uses opencl_nvidia_SoG. On GPUGRID the app uses cuda80 and all tasks fail. Tullio On my Linux box SETI@home uses opencl_nvidia_sah. Einstein@home uses FGRPopenclK-nvidia. GPUGRID used cuda80 and it worked on Linux. GPU boards are GTX 1050 Ti on Windows and GTX 750 Ti on Linux. ____________ | |
ID: 50154 | Rating: 0 | rate: / Reply Quote | |
the strange thing is, though, that the problem did NOT occur within the first 2-3 days after the installation of Windows 10; but only after I started GPUGRID crunching (before, I was only crunching LHC tasks via the CPU). Windows was always very erratic for me when I was running it with a screensaver, for example. Have you disabled all screen savers, sleep and power-down features? Also, in the BIOS, I would disable the various power control modes. They are known to be problematic. But I like JoergF's idea of removing the cards one at a time. I think you could just unplug the PCIe power cables one at a time to do a simple test. | |
ID: 50155 | Rating: 0 | rate: / Reply Quote | |
I think you could just unplug the PCIe power cables one at a time to do a simple test.This is a very bad idea. | |
ID: 50156 | Rating: 0 | rate: / Reply Quote | |
I think you could just unplug the PCIe power cables one at a time to do a simple test.This is a very bad idea. I agree. For a test, you should remove the card completely. Because If you leave one in the Mainboard slot without further 6/8pin supply, it will not get supplied properly and (likely) the PC not power up. If you are lucky you'll hear the common BIOS beep codes and no more, but it also could result in further hardware damage, as some components e.g. gates will be operated at undefined states or even oscillate, resulting in local excess current. DONT try that. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 50157 | Rating: 0 | rate: / Reply Quote | |
On my Windows 10 PC SETI@home uses opencl_nvidia_SoG. On GPUGRID the app uses cuda80 and all tasks fail. I just noticed that SETI has tasks with opencl_Nvidia_SoG as well as with Cuda42 and Cuda50. (I tried to test also Einstein, but beside GPU tasks, it downloads CPU tasks as well which fill up my total 12 CPU cores, which I don't want to happen. I am sure this can be controlled somehow, but I havn't found out yet). | |
ID: 50158 | Rating: 0 | rate: / Reply Quote | |
For a test, you should remove the card completely. Because If you leave one in the Mainboard slot without further 6/8pin supply, it will not get supplied properly and (likely) the PC not power up. If you are lucky you'll hear the common BIOS beep codes and no more, but it also could result in further hardware damage, as some components e.g. gates will be operated at undefined states or even oscillate, resulting in local excess current. DONT try that. My experience is that the card won't power up at all, and won't draw much power for anything. | |
ID: 50160 | Rating: 0 | rate: / Reply Quote | |
My understanding is that NVidia cards are designed to power up using the 75W available from the PCIe slot, detect that the additional power cables are unconnected, and refuse to move out of a protective low-power state. | |
ID: 50162 | Rating: 0 | rate: / Reply Quote | |
I don't think that any of the signal inputs are left "floating", if that is the concern. They will all be tied to a supply voltage, and clamped in a known state. Nvidia would not leave that situation unprotected by any means. | |
ID: 50163 | Rating: 0 | rate: / Reply Quote | |
earlier today I wrote: I am trying now various methods to "delimit" the problem: I had run first SETI and then Einstein for several hours now, no failure occured. Hence, a minute ago I changed from Win10 to WinXP and now crunch two GPUGRID tasks overnight, plus a few LHC tasks (CPU only), as I used to do it for long time. I am curious what I will see tomorrow morning. | |
ID: 50165 | Rating: 0 | rate: / Reply Quote | |
I had the same problem, it was a power saving issue with the BIOS and Windows 10 It had something to do with multiple cards and UEFI BIOS, if your monitor was set to turn off after a time limit mine wouldn't come back on. I turned off that feature untill a new BIOS came out, that and the "Fall creators update" fixed it for me. It seems to me that update and the April 2018 update fixed alot of issues with Windows 10, I know they never mention everything that is fixed because of interdependencies. | |
ID: 50166 | Rating: 0 | rate: / Reply Quote | |
I turned off that feature untill a new BIOS came out, that and the "Fall creators update" fixed it for me. which means: in the Windows Energy settings you switched "turn off monitor" to "never" ? So your monitor was on 24 hours per day? Also: did the problem occur only when crunching GPUGRID? Or anytime else? | |
ID: 50167 | Rating: 0 | rate: / Reply Quote | |
When ever I was using all the cards, I dug into it so far as it had to do with your utilization being high and no SLI bridge being on or present. | |
ID: 50168 | Rating: 0 | rate: / Reply Quote | |
I turned off that feature untill a new BIOS came out, that and the "Fall creators update" fixed it for me. Only if you leave the monitor on. I just turn off the monitor. Couple of things, find energy profiles. Change it to Max performance. I turn off Screen saver, switch power off monitor to never, screen to blank never, disable spin down hard drive (if you are still using one) vs an SSD Make sure that no where you have put machine to sleep after x minutes of no use. Those error messages you are getting are coming from when the computer shuts down the GPUs and the drivers crash. You need to figure out why the GPUs are being put to sleep. As far as keeping Einstein from using both CPU and GPU, that is in your preference setting on your account page ---->Preferences--->Project. select a location, change use CPU to no and use Nvidia to yes. Save and then make sure the location of your computer is correct. ____________ | |
ID: 50169 | Rating: 0 | rate: / Reply Quote | |
... and no SLI bridge being on or present. this reminds me - after re-installing the NVIDIA driver yesterday morning, on the right lower corner of the screen I got some kind of warning that the SLI bridge is missing. I don't remember whether the warning came from NVIDIA or from Windows. Whether this (also) has to do with my problem or not - no idea ... Anyway, all last night I crunched GPUGRID and LHC on Windows XP, no problem at all. It's still running fine. Conclusion: 1) there does not seem to be any defective hardware 2) the problem clearly has to do with Windows10, and maybe (although I am not 100% sure yet at this point) only when GPUGRID is running. So today I will take a closer look into the energy savings settings, and I'll again run either Einstein or SETI tasks for a lenghty period of time. | |
ID: 50170 | Rating: 0 | rate: / Reply Quote | |
You don't want to run your GPU's in SLI mode while crunching WU's, it doesn't cause problems to have the bridge on. Just make sure SLI is turned off while running GPUGrid. | |
ID: 50171 | Rating: 0 | rate: / Reply Quote | |
...but I'm pretty sure it is a power saving issue with your monitor\GPU's. as I said, I need to do more testing with SETI and/or Einstein. But if my first impression from yesterday's testing is right, then the problem might come up only when crunching GPUGRID, and NOT when crunching the other projects. But this is not quite sure yet, probably I'll know more during the course of today. | |
ID: 50172 | Rating: 0 | rate: / Reply Quote | |
... probably I'll know more during the course of today. well, by now it seems pretty sure that the problem comes up only when crunching GPUGRID, but not with other projects like SETI and Einstein. I saw an interesting article in the Anandtech Forum: https://forums.anandtech.com/threads/gpu-tasks-are-causing-win10-machine-to-become-unresponsive-restart-fixed-by-disabling-sli.2526566/page-2 under the headline "GPU tasks are causing Win10 machine to become unresponsive/restart (fixed by Disabling SLI)" Well, on my machine, SLI is NOT enabled, anyway. What I am trying right now is to crunch with only 1 GPU instead with 2. So let's see what will happen. | |
ID: 50196 | Rating: 0 | rate: / Reply Quote | |
You posted earlier that "I got some kind of warning that the SLI bridge is missing." | |
ID: 50197 | Rating: 0 | rate: / Reply Quote | |
... I'd turn it off from the 3D settings page in the NVidia Control Panel. I looked it up now - it's deactivated there. What I am doing right now is to crunch only with 1 GPU instead of 2. Let's see what happens. | |
ID: 50198 | Rating: 0 | rate: / Reply Quote | |
My understanding is that NVidia cards are designed to power up using the 75W available from the PCIe slot, detect that the additional power cables are unconnected, and refuse to move out of a protective low-power state. Depends on the age of the card or family type. Don't try that trick with a Kepler card. I forgot to plug in the PCIe power connectors for the dual 670's and when I turned on the computer, both fans raced screamingly to full rpm's for a few seconds and the computer promptly shut down. Thankfully no damage done but scared me silly. | |
ID: 50199 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Monitor sometimes becomes black while crunching GPUGRID