The hardware enthusiast's corner

Message boards : Number crunching : The hardware enthusiast's corner
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 16 · Next

AuthorMessage
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53757 - Posted: 24 Feb 2020, 15:10:34 UTC

The problem:
One 24/7 processing computer loosing intermitently its network connection, thus not being able to report processed tasks, nor asking for new work.
The cause:
Its Wireless network card not fully inserted into PCIE x1 socket, resulting in an intermitent bad electrical contact problem.
The solution:
After checking, it was a mere mechanical problem.
It was corrected by dismounting the card from its mounting frame, and bending the fixing tabs in the proper (CW) direction.
As a result, the whole card was tilted in the direction of fully insert into PCIE x1 socket.
ID: 53757 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53771 - Posted: 25 Feb 2020, 20:51:44 UTC

In line with my last post:

A graphics card without extra power connector(s) is receiving all its power from the PCIE socket.
For example, this GTX 1650 rated TDP is 75 Watts, and it has no power connectors.
This requires a current of 6,25 Amperes from the +12 Volts supply. (12V x 6.25A = 75W)
For this reason, it is particularly important for this kind of cards the best possible electrical contact into PCIE socket.
Usually there is enough mechanical play at Graphics cards mounting frame to physically reseat its PCB to be deeply inserted into PCIE socket.
In my experience, this mechanical play can vary from about 0.5 to 1.5 milimeters (0.02 to 0.06 inches).
It is usually very easy, and it takes only a few minutes to reseat PCB this way.
Taking the same above mentioned graphics card as an example:
This is how its mounting frame looks like.
I'll loosen all frame's fixing nuts/screws.
Starting with the two hexagonal female-threaded nuts, marked as 1 and 2 in previous image, then finishing with all screws, here marked as 3 and 4.
Depending on the kind of card, there may be a lower or greater number of fixings, but usually they are easy to locate.
Once all fixings are loose, the mounting frame will show its mechanical play.
Holding PCB at its deeper position relative to mounting frame, all fixings are to be retightened now, starting again with the threaded nuts (here 1 - 2) and finishing with all screws (here 3 - 4).
The final result: Graphics card PCB has come down nearly 1 milimeter.
It can be appreciated when looking at Before and After images.

In a computer where its graphics card is intermitently being unrecognized, this could be a point to discard.
ID: 53771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53803 - Posted: 1 Mar 2020, 0:00:41 UTC

If keen on bricolage and informatics, how about mixing both?
I'm explaining a good example for this.

The only fan for this GTX 1650 graphics card started to fail, and I retired it momentarily from work to avoid it to become damaged due to overheating.
I asked myself: Should I claim for warranty, and wait perhaps a couple of weeks for the new card to arrive... and lose the fun for solving it by myself?
I doubt for about 10 seconds. This is self answered in this post.

I looked for something to help among my retired cards, and I found this Gigabyte GT640 GV-N640OC-2GI that I probably would not use any more.
I like Gigabyte cards because of their usually good design, constructive quality, and well dimensioned heatsinks and fans.

Comparing heatsink mounting spacings in both cards I found to be nearly identical. And Gigabyte heatsink's surface and fan were bigger than original PNY's ones (Ok!).
But comparing the components layout below heatsinks, some problems arised.
Gigabyte's heatsink was hitting several PNY's card components: One quartz crystal (Y1), one solid capacitor (C204), and one ferrite core choke (L15)
Here is when the bricolage part comes in play...
- Marquetry saw for metal cutting, to retire some problematic fins.
- Minidrill with ceramic milling piece, to make space into aluminum where needed.
And mechanical problems are solved.

Now it's time for applying [url=http://www.servicenginic.com/Boinc/GPUGrid/Forum/HE/GpuCoolerReplacement/06_Thermal paste.JPG]thermal paste[/url] and heatsink assembly.

One more adapt was needed, because fans connectors were not compatible. But a bit of soldering and heat shrink sleeve, and also it's solved.

After this, we can compare between Before and After .

Now this peculiar hybrid PNY-Gigabyte graphics card is working again!

A final question for users that may have experienced a similar situation: Is fan usually covered by card's warranty?
If so, is the whole card replaced by distributor, or the fan only?
Your experiences at this respect would be very appreciated.
ID: 53803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 87,795
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53806 - Posted: 1 Mar 2020, 9:32:19 UTC - in response to Message 53803.  

Does this card run at a lower temperature than before?

One more adapt was needed, because fans connectors were not compatible.
The original card doesn't have a 3rd pin (tachometer), so the card can't sense if the fan is not rotating. This is not a good setup for crunching.

A final question for users that may have experienced a similar situation: Is fan usually covered by card's warranty?
These cards are made for light gaming, not hardcore (7/24) crunching, so crunching (mining) isn't covered by warranty. But GPUs don't have an operating hours counter, so if you don't explicitly express on the RMA form that you used it for crunching, they will replace it. But the replacement will be the same quality, so I usually replace the fans (or the complete heatsink assembly) for a better one.

If so, is the whole card replaced by distributor, or the fan only?
It depends, but usually the whole card is replaced, then the broken card is sent to the manufacturer for refurbishing (replacing the fan in this case).
ID: 53806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53807 - Posted: 1 Mar 2020, 11:35:32 UTC - in response to Message 53806.  

Does this card run at a lower temperature than before?

Yes and no. Peak temperatures are about two degrees lower now, as new heatsink and fan are bigger than originals.
Explanation continues below.

The original card doesn't have a 3rd pin (tachometer), so the card can't sense if the fan is not rotating.

Right. This is by this card's design.
However, Fan % is temperature controlled.
And also by design, at full load card seems to "feel comfortable" at 78ºC. If temperature tends to lower this, also Fan % is lowered and temperature accomodates 78ºC again. But now Fan % at stability is about 10 % lower than with original heatsink/fan (60 % instead of previous observed 70%).

...they will replace it. But the replacement will be the same quality, so I usually replace the fans (or the complete heatsink assembly) for a better one.

I thought the same when evaluating solution.

This card is not installed in an easy environment: it is directly abobe a GTX 1660 Ti, in this double graphics card computer.
ID: 53807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53954 - Posted: 20 Mar 2020, 14:49:23 UTC

As of current restrictions in many countries due to COVID-19 impact:
It becomes important to solve our hardware problems by ourselves.
Please, feel free to share here your problems in a Symptom - Cause - Solution scheme, or your favorite self-learnt tricks.
It may be of great help to other colleagues.
Thank you in advance!
ID: 53954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53961 - Posted: 21 Mar 2020, 17:20:02 UTC
Last modified: 21 Mar 2020, 17:21:39 UTC

- Symptom: A computer controlling an important process suddenly switched off by itself. Repeated attemps to switch it on again resulted in switching off after a few seconds past.

- Cause: Two Processor heatsink's fixings had broken, causing it to tilt and loss tight contact with processor surface. As a self-protecive measure, system is switching off to prevent processor damage due to overheating.

- Solution:

* Plan A:
First attempt consisted of repairing the broken fixings with fast curing cyanocrilate glue.
After two hours curing, time to renew processor's thermal paste and reassemble heatsink.
Result: After about three minutes waiting, fixing springs overcame glued parts and they got broken again.

* Plan B:
Studying carefully the heatsink mounting hardware, there was a passthrough hole at every corner in a very suitable placing to solve the problem by means of strategically arranged cable ties.
Result: Cable ties are strong enough to keep necessary tension. Problem solved, and everything is working again!

Particular conditions for this case:
This case comes from a true intervention in the PC controlling a laboratory diagnostic instrument for celiac and autoimmunity diseases.
I had to carry out this intervention dressed in all necessary PPEs (Personal Protective Equipments), thus not being fully free to go and come for spare parts.
Solving the problem meant that diagnostic results for many patients, otherwise lost, were successfully retrieved.
I took it as a NOW or NEVER situation, and happily it was NOW.

Finally: Let this be my modest tribute to all those worldwide medical staff and field service colleagues, currently working in hard conditions due to Coronavirus crisis.
ID: 53961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54175 - Posted: 31 Mar 2020, 17:37:01 UTC

Finally, my adventure with Conductonaut thermal compound ended in an unexpected way.

For background, please, refer to my previous post dated on January 26th 2020.
On past March 29th, while a regular round of temperature checks, I found that the concerned GTX 1660 Ti card's temperature was 83ºC. (!)
Yes, it was running an ACEMD3 WU, but when I first tested Conductonaut this temperature was 60ºC...
I dismounted the GPU's heatsink and found that the original liquid-metal Conductonaut's state was converted in a soft-solid metal state.
On this new state, I observed some cracks and irregularities, explaining a bad thermal coupling and subsequent abnormal temperature raising.
It was hard to retire the altered compound, first using a plastic spatula, and then a fine polishing cotton.
I can reccomend this kind of silver cleaner, made of a fine polishing-compound impregnated cotton.
At the end, heatsink's copper surface recovered its original appearance.

I decided to replace Conductonaut using my regular non-conductive thermal paste, Arctic MX-2.
Manufacturer promises an eight years durability for it.
Based on my own experience, I've tested to last at least 4 years, because I usually prefer to preventively replace it after about this period.
It is easy to apply, due to its self-spreading ability.

After this, GTX 1660 Ti returned to work, now the temperature being reduced from previous 83ºC to 77ºC.

In a 24/7 working rig, it is advisable to check temperatures in a regular way, to prevent overheating on different components.
For sure, it will increase the life expectancy for the whole system.
ID: 54175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 348,486
Level
Trp
Scientific publications
watwatwat
Message 54176 - Posted: 31 Mar 2020, 17:50:32 UTC

Thermalright TF8 Thermal Compound Paste is the best I've used. It has the highest thermal conductivity at 13.8 W/mK. The best thing about it is that when you remove the CPU cooler after months of use it's still gooey and hasn't solidified like most others. It's the most expensive, until competition comes along. One wants the thinnest continuous layer you can get so use as little as possible and use the spatula to spread it out. I expect it can last for years.

https://www.amazon.com/gp/product/B07K442WXV/ref=ppx_yo_dt_b_asin_title_o08_s00?ie=UTF8&psc=1
ID: 54176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 87,795
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54184 - Posted: 1 Apr 2020, 0:48:13 UTC - in response to Message 54175.  

Finally, my adventure with Conductonaut thermal compound ended in an unexpected way.

For background, please, refer to my previous post dated on January 26th 2020.
On past March 29th, while a regular round of temperature checks, I found that the concerned GTX 1660 Ti card's temperature was 83ºC. (!)
Yes, it was running an ACEMD3 WU, but when I first tested Conductonaut this temperature was 60ºC...
I dismounted the GPU's heatsink and found that the original liquid-metal Conductonaut's state was converted in a soft-solid metal state.
This is very strange. I didn't experienced such change in the liquidity of the Conductonaut, and the temperatures of my CPUs / GPUs on which I've changed the thermal grease.
ID: 54184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54204 - Posted: 2 Apr 2020, 14:30:55 UTC - in response to Message 54184.  

This is very strange. I didn't experienced such change in the liquidity of the Conductonaut...

I guess that tested heatsink's core is not made of pure copper, but some kind of alloy not compatible with Conductonaut.
ID: 54204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54315 - Posted: 12 Apr 2020, 14:45:56 UTC

Derived from current COVID-19 regulations at Spain, requiring home confinement, a challenge arose:
Will I be able to build a new crunching rig from my stored spare/scrapped pieces?

I started by rescuing an ancient Pentium 4 system "stored" at top of a wardrobe.
I dismounted motherboard, PSU, peripherals, and I got that old minitower ATX chassis as starting point.

PSU: The old PSU had not proper connections for current mainboards.
I rescued two PSUs from my scrap drawer, one with failed electronics, and the other with failed fan...
I replaced defective fan by the working one, and the PSU problem was solved.

Motherboard, CPU, RAM: I had stored at spares drawer the ones leftover from my last hardware upgrade.
There was a new problem: Available chassis is an old model one, with PSU hanging directly above CPU location. But I found an original Intel low profile CPU heatsink, and problem was solved also.

From spares, I rescued my last remaining 120 GB SSD and a GIGABYTE GTX750 factory overclocked graphics card...
With all these and a bit of (free ;-) self-workmanship, the new rig is a fact without leaving home: Test passed!

New system Host ID: 540272

New system look:

One more detail: Due to the low power consumption (38 W TDP) graphics card and the reduced CPU heatsink, this is the only of my rigs with CPU running hotter (59 ºC) than GPU (53 ºC) at full load.
ID: 54315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54367 - Posted: 18 Apr 2020, 18:47:40 UTC

If we call severe to a problem that prevents a computer to start working.
If we call ridiculous to a trivial circumstance causing a severe problem.

This is one of the most severe-ridiculous problem I've ever found, and more than once.
It happened today in one of my rigs.
I'm documenting it this afternoon, and I'll publish the solution on tomorrow's afternoon.

- Symptom: Starting the system, it runs for some seconds, then it stops and nothing happens on following attempts to restart.
I opened this system, I made a quick contacts check, started again, and this time the start attempt succeeded (Fans turning, beep heard...) for a few seconds only.

- Cause: I started to think: PSU failure, CPU heatsink disengaged... and, If it was...? And it was it!

- Solution: ???

You have 24 hours to guess your favorite cause-solution.
ID: 54367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 87,795
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54369 - Posted: 18 Apr 2020, 19:16:08 UTC - in response to Message 54367.  

A stuck power button can cause this: first it turns on the system, but if it stays in the "pressed" state it will turn off the system after 4-5 seconds (hard power off).
ID: 54369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,496,283,595
RAC: 3,436,646
Level
Trp
Scientific publications
wat
Message 54370 - Posted: 18 Apr 2020, 22:28:50 UTC

Bad PSU
Bad motherboard
Bad memory
ID: 54370 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 54373 - Posted: 19 Apr 2020, 1:48:55 UTC - in response to Message 54370.  
Last modified: 19 Apr 2020, 2:15:43 UTC

Bad PSU
Bad motherboard
Bad memory


From my experience, the PSU is most likely to be problematic. Just sayin'.
ID: 54373 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54379 - Posted: 19 Apr 2020, 18:48:43 UTC - in response to Message 54367.  

- Symptom: Starting the system, it runs for some seconds, then it stops and nothing happens on following attempts to restart.
I opened it, I made a quick contacts check, started again, and this time the start attempt succeeded (Fans turning, beep heard...) for a few seconds only.

- Cause: Power On button got temporarily hooked, causing the PSU's hard stop feature to suspend supply after a few seconds.
On the tilt and maneuvers to contacts checking, Power On button disengaged, and then it got hooked again on next time it was pressed.

- Solution: Usually it is possible to access to Power On button switch, most of times by dismounting chassis front panel.
Here is an image of the affected switch at its mounting position, and once it is dismounted.
Nowadays, it is a normally-open push-button. A click must be heard when pushing it, and another click when releasing it.
Problem was solved by dispensing a few drops of ethanol and pushing it repeatedly until it became disengaged and moving freely.
Pretty trivial and ridiculous, but I'm sure that maaany computers have gone to workshop for a problem like this...

On Apr 18th 2020 | 19:16:08 UTC Retvari Zoltan wrote:
A stuck power button can cause this: first it turns on the system, but if it stays in the "pressed" state it will turn off the system after 4-5 seconds (hard power off).

Congratulations!
You have won an image of my special Gold - Medal to Outstanding Analyst.
(Well... Excuse me, it is not exactly gold, it is really high quality bronze ;-)

And my special thanks to Ian&Steve C. and Pop Piasa for participating.
ID: 54379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,496,283,595
RAC: 3,436,646
Level
Trp
Scientific publications
wat
Message 54388 - Posted: 20 Apr 2020, 19:48:27 UTC

finally I was able to finish up my newest GPUGRID system. It's one of my old SETI systems, but I needed to convert it from USB risers to ribbon risers (and motherboard swap) for the increased PCIe bandwidth requirements here.

CPU: Intel Xeon E5-2630Lv2 (6c/12t,2.6GHz)
MB: ASUS P9X79 E-WS
RAM: 32GB (4x8) DDR3L-1600MHz ECC UDIMM
GPUs: [7] EVGA RTX 2070
PSUs: 1200w PCP&C + 1200W HP server PSU






went with a 2U supermicro active CPU cooler so I had enough room for the ribbon risers on the 2 GPUs above it. replaced the 60mm fan on it with a Noctua one since even at 20% speed the stock fan was very noisy. the Noctua fan doesnt cool as well as the stock server fan that came with it, but it's enough for this 60W chip (temps in the 50's @65% load) and it's a lot quieter.
ID: 54388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54391 - Posted: 20 Apr 2020, 22:16:42 UTC - in response to Message 54388.  
Last modified: 20 Apr 2020, 22:30:04 UTC

🙌
ID: 54391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,424,836,510
RAC: 7,540,279
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54392 - Posted: 20 Apr 2020, 22:20:21 UTC - in response to Message 54388.  

I'm really impressed watching at your systems.
Thank you very much for your Masterclass.
That's what I would describe as high-level computer hardware engineering.
And your just newborn system is returning processed tasks like a charm...🙌
Congratulations!
ID: 54392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 16 · Next

Message boards : Number crunching : The hardware enthusiast's corner

©2025 Universitat Pompeu Fabra