Message boards :
Number crunching :
Building a Desktop for GPUGrid
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next
| Author | Message |
|---|---|
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks for the response, TJ. I have the back fan connected to CHA_FAN4 as Asus recommend. I missed that. Tomorrow morning I'll check that the rear fan is on CHA_FAN4. Have you made any settings for the fans in thermal Radar? No! I had thought I had to get into the BIOS to do that. Nice to know I can do it in Windows!! For each connector on the MOBO and thus fan, you can set a scheme. Not so in my version of Thermal Radar; 1.01.29. #4 I can set individually, but the other three are in one group. I set the standard scheme for all three options; fan4, fans1-3 and the CPU fan. When I get in there tomorrow I shall also try Dagorath's suggestion (thanks, Dagorath!) to disable the side and top fans so I have an uninterrupted cool air path from front to back. That makes sense since the case came with a funnel that directs air to the GPUs, both of which vent out the back. Then I shall play with Standard vs. Turbo on the CPU and front and rear fans and see if I can optimize CPU core usage (yesterday I went to bed with all eight cores active and this morning I found an ASUS warning message: CPU temperature 65C) |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That warning from Asus might not be cause for concern. Probably there is a setting in the Asus software that allows you to specify the temperature at which it will issue the warning and probably it's set to an arbitrary default of 65*C by Asus (assuming you haven't adjusted it yourself). 65 may or may not be appropriate for your CPU. What you need to do, since you are now officially a certified system builder :-) is find the specifications for your CPU and see what the official maximum operating is. You don't want to set the Asus alarm setting at that temperature because that is the temp where serious damage can start to occur. I like to keep my gear running no higher than 30*C below the official max while others say 10*C is good enough for them. The decision is yours but if were you I would set the Asus alarm at 20*C below max and strive to keep the temp 30*C below max. The point is you don't want the alarm going off when the CPU is not really in danger but you don't want it to wait until it's at the very brink of meltdown either, somewhere in between. Also, as flashawk pointed out a while ago, it depends on which sensor(s) the Asus software is reading. Complicated? Yes, but that's the life of a system builder :-) Glad to hear your case has a funnel! That alone will make a big difference. If the side fan is pulling air into that funnel then leave it on. You can buy (or even make) ducts that direct air to your CPU too if you think it's running too close to the max operating temp specified by AMD. Getting good cooling requires a little experimenting on your part since every case and the components inside have a unique physical geometry only you can work with directly. So yes, experiment with different fans being on and off and see if that increases the temps or lowers the temps. If some of your fans don't seem to be affected by settings in the software or don't have RPM readings then they might be fans that don't have a tachometer in them and don't allow speed control (they run at one speed). You can tell by counting the wires leading to the fan. Every fan has at least 2 wires for power, black is usually -, red is usually +. If it has only 2 wires then it has no tachometer and no speed control. If it has a third wire (usually white or yellow but could be other colors too) then that wire is usually the PWM speed control wire. If it has a 4th wire (usually blue) that wire is usually the tach wire. If it doesn't have a tach or PWM wire then any numbers you see in the software for that fan are bogus, ignore them. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
More reference: I run a program called Core Temp to monitor CPU temps, and it says that my Tj. Max is 100*C. I routinely run all 4 cores 75-87*C. And a story: About a week ago, somehow Precision-X got set at a hardcoded low-fan-value, while BOINC was running. I had noticed the GPU was severely downclocked, and when I saw why, my jaw dropped to the floor. The 660 Ti was literally at 99*C, and still chugging, despite a ridiculous 40% fan value. I think I got lucky that it survived. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The 660 Ti was literally at 99*C, and still chugging, despite a ridiculous 40% fan value. I think I got lucky that it survived. Ouch! Mine have spiked up above 80°C on occasion but never that high. My gpu-d script monitors GPU fan temps and controls the fan to maintain a target temperature. I just added a feature that suspends crunching if the temp somehow gets out of control and goes above 80°. It runs only on Linux at the moment but yesterday I found Python bindings to NVML (NVidia Management Library) which will allow the script to work on Windows too. It could solve some of the problems with the SANTI tasks if I added a feature to downclock the GPU when it sees a SANTI task come in and restore the clock to normal for tasks that don't need downclocking. It could tweak voltage too. BOINC <<--- credit whores, pedants, alien hunters |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If some of your fans don't seem to be affected by settings in the software or don't have RPM readings then they might be fans that don't have a tachometer in them and don't allow speed control (they run at one speed). You can tell by counting the wires leading to the fan. Every fan has at least 2 wires for power, black is usually -, red is usually +. If it has only 2 wires then it has no tachometer and no speed control. If it has a third wire (usually white or yellow but could be other colors too) then that wire is usually the PWM speed control wire. If it has a 4th wire (usually blue) that wire is usually the tach wire. If it doesn't have a tach or PWM wire then any numbers you see in the software for that fan are bogus, ignore them. Actually the 3rd wire is the signaling wire for the tachometer, and in that case RPM control is done by applying the PWM to the + wire, or by lowering the voltage on the + wire. The 4th wire is the PWM control wire, you can connect more than one (4-pin) fan to such fan connectors, provided that only one of them should be connected to the 3rd (signaling) wire, so the system could monitor only that fan's RPM, but can control the RPM of all fans connected to that single fan connector through the 4th wire. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I got in there this morning. Moved the rear fan cable to CHA_FAN4 (it was on #2), and disconnected the top and side fans. I've been running all eight CPU cores at 100% for two hours and the fan noise is fine! Here are the numbers: Regarding the CPU temperature, instead of reporting actual temperature, the latest AMD Overdrive reports "Thermal Margin". Thermal Margin indicates how far the current operating temperature is below the maximum operating temperature of the processor. My numbers are below. Right now (I think) I'm a happy bunny! |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
About 30 minutes ago I put the tower back in its hole. The thermal radar CPU temp has dropped from 61C to 59C and the AMD thermal margins are up to 20C. Note in my last post that the two thermal radar PCIe temps are 11C apart. They still are. The two GPUs are identical. Any thoughts? |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Tomba, your "ASUS warning message: CPU temperature 65C" may be a Bios setting. If so, you can probably set it to a higher level or disable the warning in the Bios. You can often control the CPU fan from the Bios. Settings are typically, Active, Passive/Silent, or Targeted (you set a temperature target and the fan adapts towards that). In the past I occasionally saw some OS software conflict with the Bios settings. Resolved using software or Bios updates. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Note in my last post that the two thermal radar PCIe temps are 11C apart. They still are. The two GPUs are identical. Any thoughts? What exactly are "PCIe temps"? Are those the temps reported by sensors situated on the mobo close to the PCIe slots? Or are they temps reported by sensors on the GPUs in the slots? If the latter then I would say... If the target temperature is the same on both cards (and I would not just assume that it is) then the hotter one is either starving for cool air or it's fan is defective and isn't spinning as fast as it should. By "starving for cool air" I mean either the air it's receiving is too hot to allow the card to maintain the target temp or the airflow is somehow restricted. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Note in my last post that the two thermal radar PCIe temps are 11C apart. They still are. The two GPUs are identical. Any thoughts? I think they are mobo sensors, thus: Tomorrow I'll get in there and check the airflow and perhaps switch the GPUs round to see if there's any difference. |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
About 30 minutes ago I put the tower back in its hole. The thermal radar CPU temp has dropped from 61C to 59C and the AMD thermal margins are up to 20C. One GPU is always hotter, I have seen in all my rigs with two GPU's. Its the primary one that is hotter. Same with same type/brand or different. I have two top fans blowing air out and 1 side fan pulling air in, and have CPU of 55°C and PCIe at 48 and 43°C. Same MOBO and CPU. And the case closed. Greetings from TJ |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Use a different motherboard. I've checked the ASUS website, and I didn't find any AMD motherboards which could accommodate four double-slot GPUs. But I've found a Gigabyte MB which does: GA-990FXA-UD7 Now there's a thought! Well spotted!!! That would let me add the GTX 660 that's upstairs, running in my old rig, to my new rig, and would give me the expansion capability I was looking for originally. I'm still within the Amazon 30-day return window. I'm tempted... :) |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Tomorrow I'll get in there and check the airflow and perhaps switch the GPUs round to see if there's any difference. You might notice a difference but my bet is that you won't. I think the difference is more likely due to some warm running chip/component on the mobo close to the sensor near the hotter slot. Switch the cards if you want to verify but I wouldn't worry about the one temp being higher than the other. As for TJ's report that one GPU is always hotter than the other... I have noticed the same thing and like he says it's always the one in the primary slot. That's most likely due to heat from the card in the slot below it being sucked into the primary card's fan. Rear exhaust fans reduce that problem significantly but don't eliminate it completely because there is still radiant heat emanating from the secondary card. I have the same problem with 2 of my cards and when I have spare time I am going to build a custom intake duct that fits between the 2 cards and pushes lots of cool air between them as well as into the intake fans. But first I have to replace the mobo those cards are on because my cat murdered it yesterday by dragging it's little catnip scented toy into the custom cabinet I am building and dropping it on the mobo. I checked it with my ohm meter and the ribbon on the toy is very conductive! He went in through a temporary opening I had not blocked because I thought he would never be able to reach that opening. I think he must have lept off the TV which is amazing because the TV is about 1 meter away and the hole is pretty small. I found him there crying because he couldn't figure a way to get down, lol. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Use a different motherboard. I've checked the ASUS website, and I didn't find any AMD motherboards which could accommodate four double-slot GPUs. But I've found a Gigabyte MB which does: GA-990FXA-UD7 I wouldn't plan on putting 4 GPUs on that mobo. They'll fit but I've been reading reports and info that indicate those mobos can't supply the current required by some video cards through the PCIe slot. The fact that the online stores I use here in Canada seem to be not re-stocking those boards kind of corroborates the notion they can't handle a big load through the PCIe slots. It might not even be able to handle 3 cards. Retvari and I were discussing it in my "What to build in 2014?" thread. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I wouldn't plan on putting 4 GPUs on that mobo. They'll fit but I've been reading reports and info that indicate those mobos can't supply the current required by some video cards through the PCIe slot. Is it not the case that today's GPUs are powered from the PSU, not the mobo? |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a problem. Normally I do a Noelia in 10-11 hours.This morning I found this Noelia had been running for 22+ hours and was only 55% complete. I aborted it and got another. This one is forecast to run for 50 hours. Below, EVGA Precision X shows a marked difference between my two 660s. I'd welcome some advice and guidance on what to do about this. Thanks. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a problem. It can be clearly seen from the GPU clock reading on the lower EVGA Precision display, that your GPU is downclocked to 324MHz. This is a safety feature of the GPU (driver), and it can be reset back to normal only by a system restart. BTW in that case, you don't have to abort the workunit, a system restart will restore its original processing speed. To avoid such situation in the future, you need to lower your GPU clock slightly (or increase the GPU voltage slightly). |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a problem. Thanks for responding, Retvari. I rebooted. No change. I reinstalled the Nvidia driver. No change. you need to lower your GPU clock slightly (or increase the GPU voltage slightly). Help me out on how to do that please... |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Are you sure the 41*C GPU is actually loaded? It looks like it isn't. You can double-click one of the graphs in the "Performance Log" at the bottom, to open the full list of graphs, so that you can view GPU Usage for both GPU1 and GPU2. |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Are you sure the 41*C GPU is actually loaded? It looks like it isn't. Thanks for responding, Jacob. After working out how EVGA Precision X logging works, I constructed this snapshot: So GPU 1 is loaded, but not very heavily!! It would be nice if there's a way to clone GPU2's settings to GPU1... |
©2025 Universitat Pompeu Fabra