Message boards :
Graphics cards (GPUs) :
Anoying reboots
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Hi there! When crunching for GPUGRID my computer reboots sometimes after 30min, after 2-5 hours. The computer turns off with all components still running (except the NVIDIA card I assume) and starts again on its own after ~30sec. -Since this does not happen crunching Seti-Cuda WUs, it might be GPUGRIDs fault. -On BOINC 6.4.x this didn't happen often. I connect the reboots back then with the times 4 CPU + 1 GPU(+0.25CPU) were in process but I'm not sure. # Now I'm running 3 WU on the CPU + 1 GPUGRID CUDA. -> 1hour no reboot so far. My question: If the above does not help, can I expect a solution by the following? - Right now 181.20 CUDA driver seems to be installed. Seems to be CUDA 2.1 huh? # I might try the CUDA 2.0 driver 178.08 # I always wondered why I can't use the 182.50 WHQL driver, I might try that too. # I'll try to complete the p700000-GIANNI WU (Lost 3 hours of process due to one of those reboots. Maybe the syP9764-SH2_US WU runs better? # My 5 month old NVIDIA might be broken so it has problems running GPUGRID CUDAs. Seti will be happy if this is the case. ^^ # I won't update NVIDIA bios - solution must be something different. You can see my system specs right? In addition to that - CPU is not overclocked (60°C) GPU overclocked by default. Reboot happened with downclocking to GTX280 specs from nvidia too. (68-78°C) And if you can't find my system specs: -_- Vista Ultimate 64 AMD Phenom X4 9950BE 4GB DDR2-Ram The reboots happened during browsing the internet and listening to mp3. No Games, no 3D-Programs, no CPU intensive processes besides Boinc. If comp reboots again while running only 3 CPU + 1 CUDA WU, I'll post it. Thanks for reading and thanks for help in advanced. |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Good, it seems I'm the only one with this problem :) I feel so special. hehe Now, 3 hours later the p700000-GIANNI_ WU completed, fortunately without error in the outcome. For the record That makes 4 hours without a reboot. I intend to crunch the syP9764-SH2_US_ WU together with 3 CPU WUs. If it works I hope to get another syP... WU to run it along with 4 CPU WUs to see if there is a difference to p700000.. WUs. - If it reboots then even with only 3 CPU WUs, it might be a compatibility issue between GPUGRID and Rosetta, and/or Seti and/or World Community Grid. # Just checked the PSU - If it is not broken, it should give just enough Power for my system (650 Watt, no USB coffee machine in use) # No Windows update was performed before reboots happened. # All Windows updates installed. # Screensaver is deactivated. My Gigabyte Mainboard MA-790X-DS4 was cheap, so if it can't cope with the bandwith needed for GPUGRID to run on a GTX280 card, that would be the most 'apprechiated' issue (besides a driver problem). |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
The syP9764.. WU completed without any reboot with 3 CPU WUs running. Sadly, while crunching p1040000-GIANNI_ the computer rebooted at 97.xx% progress and resumed at 95.xx%. # I noticed that Windows update just installed an update for Windows defender and # System Restore Points were created. # Almost at the same time, Itunes downloaded mp3s. I'm not sure if this causes problems. Likely not my guess. A kh29119-Jan2 WU is next - testing along with 3 CPU WUs also. |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
A reboot in the first 20 min of the kh29119-Jan2 WU along with 3 CPU Tasks urged me to try to change the driver to 182.50. Alongside, changed from using NVIDIA System Tools 6.02 to 6.03 (may or may not be part of the problem). Running GPUGRID + 4 CPU Tasks now. If the computer maintains stable, it would be the first time a new driver would have fixed a 'Windows issue' in my 9 years of having my own Desktop. I just wonder why to use the newest WHQL Driver would be better instead of the 'NVIDIA Driver with CUDA Support' from the link provided in the manual on the Get CUDA website. Btw.: The WHQL Driver seems to need more time to complete a WU IMHO. Not CUDA optimized but more stable? Since nobody replied to this topic so far, thanks for providing a public GPUGRID diary for me. lol |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Sorry nobody replied. I have never heard of reboots though! gdf |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Dear GDF, no problem. Did I use a wrong term? By reboot I mean the computer restarts by itself. The new driver did not fix it. Computer just restarted again. I don't tend to see a connection between running GPUGRID CUDAs along with CPU WUs from other projects anymore. Just downloaded 2 more GIANNI WUs and I'm expecting reboots while running them. I don't have proof of syP9764 WUs will never cause me reboots and I've decided NOT to abort all WUs I get until I get a syP9764 WU again. After completing the GIANNI Tasks, I'm off and I may come back when I get a new video card (This is my card from MSI I hope those who have the same don't have such problems) and/or a new CPU/mainboard/RAM. Maybe even if new WHQL Driver are released.. =) |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Dear Diary, I just had a sudden restart, when I tried to tell you that I called my hardware dealer. They say my card is broken and they want to give me a new one. ( 4.5 years guarantee left O_o ). They didn't even ask about drivers. If they don't have the same card they give me the money back. This happened 4 years ago too and I came back with a much faster card. Back then, not even when I bought the card 9 month ago, I would never thought about having a video card replaced because of having problems to do scientific research. Sorry@all who don't think this is the right place for a diary. lol Couldn't resist. later, Moabiter Relying on Boinc to satisfy your need for happiness might result in temporary unhappiness. This can be applied to almost everything. |
|
Send message Joined: 1 Feb 09 Posts: 139 Credit: 575,023 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Well its hard to find a solution on this matters thats why i was reading but have no solution at hand. Most of the time these problems are either hardware related or driver related thats why i think people where not reacting |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Hi uBronan! Well, a new video card might be the only solution left if I want to run GPUGRID ;) Still, it would be great if I don't have to wait for a new video card. But now I can buy a cheap no CUDA supporting card and wait 3 days until a new high-end GPU is send to me. I'll try to do that next week. Just to clear things up: Sorry in advanced for my poor English. I read better than I type or spell. outside 19°C/day videocard 71°C - fanspeed 75% outside 5°C/night videocard 78°C - fanspeed 55% This GPU temperatur I read with Speedfan 4.37. Same with Nvidia System Monitor. So far I have not seen any spikes withing the 500ms display update time. Before I started running CUDA, I never saw GPU temp go above 69°C. Since I run CUDA, either Seti or GPUGRID (managed to never have WUs of both apps available), I am eager to keep it below 80°C. What I wrote about 68-78°C is the fluctiation between day (68) and night (78). CPU - I haven't come across a good and free tool to monitor CPU temps. Speedfan shows me max 62°C on a hot day but doesn't show the different temps between each core. No spikes I can see. memcheck86 I did not try yet. Vista's own memorytest promps no errors. My problem installing a new driver is no more. I had to unplug the TV. When Windows restarted ( when I wanted it to restart ) after uninstalling the driver I had to allow Vista to complete the installation of Vista's own driver for the video card. Without that it would say "you're running 32bit uninstaller.. no 64bit OS.." .. something like that. I downclocked it to 602/1296/1107MHz to 'undo' the factory overclocking. To downclock it even further never came to my mind. Since I have this computer, Avira AntiVir Premium is guarding me. No virus found in the last scan. |
|
Send message Joined: 19 Feb 09 Posts: 37 Credit: 30,657,566 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Im using rivatuner to monitor GPU temps. I have a pair of GTX 295's causing me massive issues atm. Could be temp related so i ran my fans up to 80% keeping temps around 67C I have seen them mid 70's when it was crashing most. Under vista i have a sidebar apps the displays rivatuners temps i use the nvidia tool for all controls. Edit: Ive had a heap of reboots too. Its usually a BSOD but one was windows update. Try watching the machine to see if you BSOD |
|
Send message Joined: 4 Apr 09 Posts: 450 Credit: 539,316,349 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
"good and free tool to monitor CPU temps" RealTemp: http://www.techpowerup.com/realtemp/ CoreTemp: http://www.alcpu.com/CoreTemp/ |
|
Send message Joined: 11 Dec 08 Posts: 43 Credit: 2,216,617 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Where I have had issues before is that if CPU temperatures reach close to 70 celcius on peak usage, my computer will shutdown or reboot. Curiously enough, this seems to happen more when GPUGRID is doing a download and preparing to start up again. A better fan setup seems to get rid of this problem for me. In the warmer summer months, I may just run GPUGRID with no other BOINC projects running just to keep the CPU temperatures down. |
Paul D. BuckSend message Joined: 9 Jun 08 Posts: 1050 Credit: 37,321,185 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I can be a PSU problem. During a D/L you now also have more disk activity. If the PSU is marginal the heavy computing load of the GPU added to the increased load of the disk drive may be enough to signal a power out event thus leading to the MB toggling reset and you get a reboot. Room temp plays a part because the increased room temp ups the fan speed for the same loads ... also more draw on the PSU ... |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Hi there! I should sheepishly go into a corner. I thought I got Speedfan figured out but that was not the case. To be honest, I have no good defense why I read 60°C as the CPU temperature. It was at 71°C most of the time I figured. Now, if something is broken, it might be the CPU. Maybe the Scythe Mugen prevented some seriouse damage. Something between 1-2 month ago, the CPU-fan was only able to run at 100rpm. I was even wondering why it can keep the CPU cool without running at atleast 500rpm. Thanks to all who kept pointing at temperatures!!! Now, I hope the GPU did not get damaged by the CPU. Afterall, I'm realy happy that I did not keep my Diary private. And if the problem still occurs with GPUGRID (still wondering why then everything else did not cause reboots) I will look out for a new AM2+ CPU instead of claiming guarantee. Checking CPU-fan = 1200rpm now; Core=55°C If I'm not mistaken again. Will take a good look at all the programs you mentioned. And thanks to all who kept me on the topic with your postings too! |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Room temp plays a part because the increased room temp ups the fan speed for the same loads ... also more draw on the PSU ... I think even more than the current drawn by the fans, high temperatures cause more power to be used simply as a physical byproduct of temperature on the electronics. As temperature goes up, electrical resistance drops, causing amperage to go up. Of course, that causes temperatures to rise even more, which causes resistance to drop more, which causes amperage to go up more, which causes temperature to go up even more, and so on. It's really a shame that there isn't any monitoring built into either power supplies or motherboards that let you know when you have a marginal power situation. The only clue you get is intermittent failures, and there's an awful lot of things that can cause that. Mike |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Not to be too optimistic, but the issue seems to be no more. If I read correctly , even the GPU temp is 10°C lower as before at the same GPU-Fan speed. I guess, I have to work for GPUGRID a few more days to make sure. =) I hope there is no long term damage. And maybe I'm wrong, but I think the temperatures in the case are an issue sometimes too. My PSU gets fresh air from the outside at the buttom of the case. A fan behind the PSU sucks air from the buttom and another fan provides fresh air at the front to cool the harddisks. One fan takes out warm air on the back behind the CPU cooler while some other fan pulls air from near the video card threw the CPU cooler up to the top. At top a fan outtake air. All fans are 120mm. The last fan is a side fan to intake air (230mm). I'm not sure about 2 more optional fans at the top could make a much better airflow. Relying on Boinc to satisfy your need for happiness might result in temporary unhappiness. This can be applied to almost everything. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Moabiter, you're really very probably seeing CPU temperature related problems. Let me tell you a bit: - the Phenom 9950 is a very hot and power consuming beast. It's got a TDP of 140W and actually needs that amount of power. - it also doesn't like high temperatures, and/or it's temperature is measured at cooler spots compared to Intel. - at work I had a X2 6000+ (90nm, 125W) and *caugh* under sustained load the machine would freeze or restart about once a week - I switched to a Phenom 9850 (125W) and the situation remained (also switched the board for a rather similar one) - during fall the freeze interval extended to 1 - 2 months, during winter it disappeared - in spring it reappeared and I mounted an Arctic Cooling tower cooler, but surprisingly neither temps nor noise went down significantly - when running prime95 I saw: when temps reported by Everst went to 60 - 65°C I started to get computation errors (max specified: 62°C) - problem was that the fan was already running at full blast! - this story was meant to show you how critical these hot headed chips can be regarding temperature. Now on to your case: GPU-Grid works the GPU harder than SETI and thus heats the interior of your case even more. This increases cpu temps and this likely pulled you above the stability threshold. That's why running 3 instead of 4 cpu tasks helped: now the cpu generated less heat itself. - what I did: using "K10Stat" I can set power profiles for my cpu. I downclock it and concurrently lower its voltage. This way I can choose how much power it is allowed to consume, so that it stays at~55°C and the noise stays tolerable. If you like I could tell you how to use it, you'd just have to invest a few hours into stability testing to avoid buying a new cpu. - some values: 2.50 GHz | 1.30 V | 125 W (stock setting, much too much power draw!) 2.40 GHz | 1.175V | 98 W (barely manageable) 2.30 GHz | 1.150V | 90 W (works) 2.20 GHz | 1.125V | 82 W 2.10 GHz | 1.10 V | 75 W (so far didn't have to go lower than this) MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 27 Mar 09 Posts: 13 Credit: 155,203 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Hi MrS! Thanks for sharing your experience! Very interesting! According to PCGamesHardware, a german computer magazin, for a Phenom 9950BE temperature should not go above 64°C. I also lied about my PSU. I checked it as I said, but I looked at a wishlist that was not uptodate. I planed on buying a 650W Power Supply, but when I had the opportunity to get 'whatever I want' as a gift I chose a BeQuiet Straight Power 700W. Expecting 8-10 more degrees in summer, I prefer to add more fans to the CPU-Cooler instead of lowering its clock. Up to 4x120mm fans can be assembled. I am sure, to have K10Stats as an option if nothing else helps is great. On the other hand, the possibility to do more damage using this program is not realy an option. For my Diary: outside: 22°C; CPU - 55°C fan 1257rpm GPU - 68°C fan 65% If I am not mistaken. GPUGRID + 4 CPU WUs running since I fixed the CPU fan. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It's quite possible that fixing the cpu fan speed will be enough to keep you stable. And I'd like to point out that what damages your hardware are voltage and temperature. That's why overclocking with increased voltages shortens the chips life span, whereas just increasing the clock speed doesn't change much. You can choose the reverse way and instead of overclocking you can lower the cpu voltage, be it underclocked or at stock clock speeds. This is actually better for your hardware and your power bill :) It doesn't matter for me if you choose to do so or not.. I just wanted to make your options clear and that there's no danger involved here (ecept failing a stress test + reboot / freeze if you set the voltage too low). MrS Scanning for our furry friends since Jan 2002 |
Michael GoetzSend message Joined: 2 Mar 09 Posts: 124 Credit: 124,873,744 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Interesting. I haven't bought an AMD CPU for nearly two years, ever since the Core2 line came out. The big reason I've been only buying Intel since then is the lower heat/power on the Core2 chips. Especially on laptops, this translates into much longer batter life. It would be interesting to find out if there's a correlation between the assorted problems some people are having and whether their CPUs are Intel or AMD. It could indeed be that these hard to track down problems with running GPUGRID are related to CPU temperature. Even the double width GPUs which vent most of their heat out the back still exhaust some of the cooling air back into the computer case, where it will contribute to CPU heating. |
©2025 Universitat Pompeu Fabra