Message boards :
Graphics cards (GPUs) :
Shot through the heart by GPUGrid on ATI
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I was planning out a new computer, a Maingear F131, with the express purpose of crunching GPUgrid. I already have a machine, a Maingear Shift Super Stock with Nvidia GTX 670's. For whatever reason, GPUGrid did not do well on this machine. I had to detach from the project. This machine kept failing. It was totally rebuilt three times, and what finally seems to have settled things down has been the removal of GPUGrid. Why this was a problem I have no idea. GPUGrid was the impetus for this machine. The machine is currently doing GPU crunching on EINSTEIN and SETI so far with no difficulties. Maybe various GPU projects just do not play well together. So, I wanted to plan the new machine with ATI cards. Apparently, GPUGrid does not run on any ATI cards. Am I correct about that, and is there any hope for the future? Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
hi, no big hopes for ATI. The old code which run OpenCL has been deprecated for a new one which is now cuda only. It is still technically possible to do OpenCL but it does require a lot of work. Only justified if AMD really brings a top card in. What was the problem with the old machine? WU failing? Temperature? We ended up designing and building our own GPU chassis so tired of having poor cooling. gdf |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It is still technically possible to do OpenCL but it does require a lot of work. Only justified if AMD really brings a top card in. Hmm, judging from the performance at most other projects I would say AMD does have very fast cards. What's a 7970? Wouldn't be too hard to make a list of Open_CL projects where the 7970 is top dog. |
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
O.K., now I know the score on ATI. So, I could go the other way: I could run GPUGrid as the only GPU project on the current machine, and run the others, EINSTEIN, SETI, and add MILKY WAY, on a machine with ATI. But, there remains the question, part of which I posed above: is there any reason that GPUGrip would create difficulties on any machine with decent Nvidia cards when it is the lone GPU project? After all, before I finally quit the project, it amassed 17 million credits. As I said above, this machine was rebuilt three times suspecting other problems than problems caused by any one project. It only calmed down once GPUGrid was no longer running. Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The GPU-Grid code is quite efficient and "compute dense", i.e. it taxes GPUs quite hard. The power consumption is significantly higher than running SETI or Einstein. That's why GDF asked you about temperatures. Prime candidate for your problems are overheating GPUs or insufficient PSU. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
MrS Thanks. I did run HW Monitor for a while. It never showed anything untoward regarding GPU temps. My power supply never failed, I was always able to get something to happen on failures, even if the machine refused to boot. And, you know, I did manage 17 million credits on GPUGrid before I gave it up. My GPU's are GTX 670's, air cooled. We tried sealed liquid coolers, but they repeatedly failed. Even if the GPU's did overheat, I would think that would have initiated a shutdown of the crunching before actual damage to the cards. I believe that is what happens when CPU's overheat, a machine will shut down prior to damage. Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
if by "the machine kept failing", you mean that it was hanging, then it's probably insufficient power supply. It the job was crashing after a while but the machine kept alive, it's probably temperature. gdf |
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
So, I guess what I would request is, what would be considered good Nvidia cards to put on a new machine, and what in the way of specifications for a power supply? Here is what the Maingear F131 Super Stock would have Intel® Core™ i7 3930K Six-core 3.2GHz/3.8GHz Turbo 12MB L3 Cache w/ HyperThreading. MAINGEAR EPIC 120 Supercooler (CPU cooling)- but that looks like a water cooled system and I have had bad luck with that, so I would ask for air cooling similar to what I have on the Shift Super Stock. Intel® Turbo Boost Advanced Automatic Overclocking 16GB Corsair® Dominator™ Platinum DDR3-1600 Extremely Low Latency 1.5V (4x4GB). The choices of Nvidia cards are way too plentiful 2x EVGA® GeForce™ GTX 680 SuperClocked 4GB Total GDDR5 In SLI w/PhysX [ENTHUSIAST - OC] 2x EVGA® GeForce™ GTX 680 FTW+ 8GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] 2x NVIDIA® GeForce™ GTX 680 4GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] 2x EVGA® GeForce™ GTX 670 8GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] 2x NVIDIA® GeForce™ GTX 670 4GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] 2x MSI® GeForce™ GTX 660 Ti Power Edition 4GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] 2x NVIDIA® GeForce™ GTX 660 Ti 4GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] 2x NVIDIA® GeForce™ GTX 660 4GB Total GDDR5 in SLI w/PhysX [ENTHUSIAST] The SLI can be disabled, but since BOINC wants device 0 and Maingear wants device 0, I have SLI enabled on the Shift Super Stock. If either was O.K. with device 1, then I would not be doing this. 1TB Western Digital VelociRaptor SATA 6G 10,000rpm 64MB Cache This was actually the last change made on the Shift Super Stock. This is alleged to be an "enterprise grade" (whatever that means) hard drive normally used in servers. Power supplies available are 850 Watt Corsair® AX850 80+ Gold Certified Modular Power Supply ROHS 660 Watt Seasonic® X-660 80+ Gold Certified Modular Power Supply ROHS So, this is a lot to ask, but I find people here really know their stuff. Any opinions, and I know they could only be seen as opinions, will be gladly received. I am not about the business of building any machine myself. I am only about the business of paying for what works. Thanks in advance for any and all replies. Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Did you have SLI enabled on you're last rig with all the problems? If so, that may have caused all you're problems, it must be disabled to crunch WU's on GPU Grid. I'm thinking you already knew that though. |
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
flashawk- We actually ran it both ways. In fact, with GPUGrid in the mix, we ran SLI, we ran two cards, we ran BOINC on Dev 0 only, Dev 1 only, we really tried everything, through two builds actually, and had a lot of success which was in both cases only temporary. The only thing that was not done was to connect the monitor to Dev 1. The builder was adamant that monitor be on Dev 0. The first build ran successfully for about ninety days, the second build for about 60 days, both running GPUGrid, each with SLI enabled and this disabled. The worst was SLI disabled and BOINC ignoring Dev 1, so that both BOINC and the rest of the computer were using Dev 0. I am sure that I said earlier, I was told that BOINC wants to be on Dev 0. The way it looks now, I am not going to change anything on the Shift Super Stock. Of course, as soon as I post this, I may eat my words. But, up until now, with SLI enabled, and thus running BOINC and the rest of the machine on what looks like Dev 0, running SETI and Einstein GPU WU's, plus a bunch of CPU only projects, everything is stable. And, this machine does nothing else. It was purchased with the express intent of running BOINC projects but with a focus on GPU work. My colleague has done the same thing, a Maingear F131 that all it does is crunch (that is an older style F131, the case is more standard and so GPU work can be questionable. Also, my colleague only runs WCG and he is on the WCG build of BOINC 6.10.58, still the standard at WCG). You know, we are talking about a lot of money devoted to this work. If I can get some good advice on what to put on the F131, then I will run GPUGrid on that machine. If not, then I will run Einstein, SETI, and a bunch of CPU only projects. One way or the other, the machine will be purchased. It will replace my oldest desktop, an i7-920 that is about 3-1/2 years old. It is apparently now accepted as a fact that i7's run hot. The topic is not new, but it has only recently been dealt with in any meaningful way with tthrottle to control temperature. This i7-920 ran BOINC at 100%/100% for along time with temperatures in the 90's C. Now, we know that is abusive. All of my machines are now controlled for CPU with tthrottle. So, this CPU is old before its time. There is work which is failing to finish, runt times that are ridiculous. I could have the CPU changed out, but it's like a car: what might be next. I want to get this settled very soon. I can still get Win 7; but I do not know how much longer that will be possible. Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
|
Send message Joined: 18 Jun 12 Posts: 297 Credit: 3,572,627,986 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You got GPU Grid to run on only one card at a time? Did you do the .xml file with <use_all_gpus>1</use_all_gpus> That's what I had to do to get it to run on both GPU's at the same time, look here in the first post. http://www.gpugrid.net/forum_thread.php?id=2123&nowrap=true#16463 You're hardware looks good, I prefer Seasonic PSU's but 660 seems weak, go for the Corsair 850, there very good PSU's too.
|
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, this CPU is old before its time. There is work which is failing to finish, runt times that are ridiculous. I could have the CPU changed out, but it's like a car: what might be next. Not at all like a car. Cars have moving parts that suffer from wear due to friction, CPUs do not. There is likely absolutely nothing wrong with the rest of that computer yet you'll scrap it. Abuse it, refuse to follow every advice given you because "it's not my style" then scrap good kit. Sad beyond words. In fact words fail me. My i7 runs at 60C with stock Intel fan and heat sink, no throttling. Why? Because I rub brains on my problems instead of money. Try it sometime. I can still get Win 7; but I do not know how much longer that will be possible. If the new GPUgrid apps about to come out are like the old then you'll get a 15% performance boost just by putting Linux on that rig. But... |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Woha, there's really a lot here: debugging the old machine, choosing components for the new one and that "old" i7 920. Debugging GPU-Grid It's common advice to disable SLI for any GPU-BOINC work. Sure it's safer this way.. but SLI is no black magic. Since a driver update (maybe 2 years ago) SLI just applies to Direct3D and OpenGL, as it should, and doesn't interfere with CUDA and OpenCL at all. That's why both settings showed similar behaviour for you. BOINC doesn't insist on running on GPU 0, it's rather that it can only use the GPUs which are not set to sleep by windows. And any non-SLI-ed GPU without a monitor attached and without the desktop extended to it is considered useless by Win and sent to sleep to save power. It's a shame it can't be waken up for GP-GPU work then. That's why you only crunched on 1 GPU with SLI disabled and the monitor disconnected. However, if you still had problems running GPU-Grid with only 1 of 2 GPUs running, then we can probably rule out the PSU. Anything beneath this is speculation at this point and we'd need to get a more precise error description and try different things. Not sure it's worth it if you're fine running other projects as well. Choosing a new rig If you're goping for air cooling make sure it's high-end - that CPU will put out a lot of heat. Good coolers can easily handle it, weaker ones won't. That Velociraptor is a fine drive for sure (though still much slower than a modern SSD), but totally useless for a dedicated cruncher. In fact, every decade-old 30 GB HDD would do, even 10 GB if you use linux or XP. Regarding the GPUs: the GTX660Ti currently has the best price-performance ratio, as it's practically as fast as the GTX670, yet considerably cheaper. Getting the 4 GB version over the regular 2 GB ones does nothing for crunching speed (and will probably not matter in the future either), but they don't offer these at all. GTX660 is also out of the question, as it's consideraly slower than GTX660Ti. GTX680 is a high-end option. I don't think it's worth the money just for BOINC, but I'd get an i7 3770K Quad instead of the six-core as well. Less CPU throughput, but a lot less power consumption, more energy efficient and about half the price. PSU: with 2 GTX660Ti I'd go for the Seasonic, with GTX680 superclocked you might want to go for the bigger unit. The worn-down i7 920 Did you ever clean the heat sink and fan? If not we probably foudn out why it's running so hot now. Is it a whimpy Intel stock cooler? While that CPU does put out some heat (almost as much as your new 6-core), mid- to high-end air coolers would handle this easily, at full throttle. And the fan could be set to "very silent", a setting fine for regular work but not suitable for 24/7 number crunching. Or asked in another way: is it loud? If not, the fan probably not even tries to cool the CPU down. Or.. did the fan fail? And you're right that a few years at ~90°C causes more stress to the CPU than normal work. However, these things can take a serious beating. I suppose it's just a matter of fixing the cooling and it will be good to go again. However, the CPU is not very power efficient by todays standards, soit might as well be retired from 24/7 crunching and be handed down to someone who just needs a regular desktop, or a gaming machine (yes, it's still pretty good at that). @Dagorath: don't be too harsh and give him a chance. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@Richard (mitrichr) You've been a good friend and you always will be. I apologize for being so harsh but my *best* friends on my list of friends are the ones who kick me in the butt when I'm a dummy. Obviously I get kicked a lot! So if I'm harsh and kicking your butt it's not because I don't like you it's because I know you can do better and I don't want to see you fail. BOINC doesn't insist on running on GPU 0, it's rather that it can only use the GPUs which are not set to sleep by windows. And any non-SLI-ed GPU without a monitor attached and without the desktop extended to it is considered useless by Win and sent to sleep to save power. It's a shame it can't be waken up for GP-GPU work then. That's why you only crunched on 1 GPU with SLI disabled and the monitor disconnected. I've heard of crunchers putting a dongle on their Video-card and I've never really understood why. Maybe now I do? Is the purpose of the dongle to imitate a monitor to make Windows think the card is attached to a monitor which then prevents Windows from putting the card to sleep? Would that help mitrichr (Richard)? Does Linux put a card to sleep if it doesn't have a monitor attached? |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A 660 Watt Power Supply is to les for two nVidia high-end cards, I found out myself. Greetings from TJ |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I agree with MrS, an i7-3770K with two GTX660Ti's would be a better option. The only reasons for getting an LGA 2011 CPU would be to have 12 threads or support 4 GPU's. I have an i7-3770K @4.2GHz, a GTX660Ti and a GTX470 FOC in the same system. When running two GPUGrid tasks and a few CPU tasks the system draws around 450W. A similar system with two GTX660Ti's would draw ~50W less power taking the systems total to ~400W. A quality 80+ 550W PSU or better is capable of supporting this, and in the past I ran two GTX470's on a 550W Corsair PSU without issue. An average PSU is a no-no for GPU crunching. Note also that better PSU's are less wasteful and save you on electric. My Corsair HX750 provides optimal power efficiency (91 to 92%) at around 400W to 550W @240V. PSU fan speed, and therefore noise is also controlled according to the power draw - the higher the power draw the nosier they get. So it's usually better to 'air on the side of caution' and get a PSU that can deliver more power than you need. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
MrS- Thanks for taking so much time to look at my issues. The i7-920 machine will go into the shop to have the CPU looked at. Either new thermal paste will be done, or if necessary and if possible a replacement, unless the cost is too high. Also, I will ask about better cooling. Regarding Dagorath, he is just a nasty harsh mean-spirited Canuck who should move to Florida. He does not even like Hockey. TJ- Very interesting about the power supply. The Shift Super Stock has a 1000 watt unit. skgiven- Thanks also for your comments. I have to say, I went into GPU crunching with very little knowledge, looking especially for the GPUGrid project. I am surprised I have gotten this far with GPU work. GPUGrid is very demanding; Milky Way requires DP cards, yet ATI is not all that popular. This field is still in its infancy. Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Regarding Dagorath, he is just a nasty harsh mean-spirited Canuck who should move to Florida. He does not even like Hockey. You just want me to come down there and teach y'all how to play hockey so you can eventually get a team together ;-) The pics you mentioned at Orbit@home... I'm working on them and will send you copies. As I mentioned the whole works is built mostly from scrap so it doesn't look pretty ATM, it needs paint which is sitting there in the corner exactly where I put it 2 months ago. I like painting even less than hockey but I'm making headway, for example I bought sandpaper last week and put there right beside the paint. Looking at buying a brush soon. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Regarding Dagorath, he is just a nasty harsh mean-spirited Canuck who should move to Florida. Nice to see you two are getting along :D And you're right, the field of GPU crunching is still developing in a pretty dynamic way. Lot's of opportunities and reward, but also some homework to do (to do it right). @SK: I think you mean "err on the safe side", as in "to error"? MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 8 Feb 12 Posts: 60 Credit: 17,816,440 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
After reading through all of this, it seems to me that there should be somewhere on the web site a statement of the minimum requirements to safely and successfully run WU's on this project. Specifically, cards, power supply, CPU, DRAM, maybe cooling. As I revealed, I have a very expensive and powerful computer which managed to work up 17 million credits, but which had to be rebuilt three times. That is a very costly situation, definitely to be avoided if possible. Please check out my blog http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
©2025 Universitat Pompeu Fabra