Simulation has become unstable

Message boards : Number crunching : Simulation has become unstable
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35278 - Posted: 23 Feb 2014, 15:15:57 UTC - in response to Message 35271.  

TThrottle is a great problem, however when using it with GPU-crunching it will results in errors. The program does not "slow down" the CPU and GPU usage at a certain rate, but let it run 100% and then only a few % and so on and then CPU and GPU stay cool(er). For at least with GPUGRID tasks this stopping and starting of the WU will let if fail after a while. That is the reason I don't use TThrottle anymore.
With MSI Afterburner you can set the speed of the fan for the GPU nicely and works on any card, any brand, nVidia as well AMD.

If your cards run nice and smooth at GPUGRID then they do on other projects as well you don't have to worry about that.

At last I will not defend anyone, but you can trust skgiven's advice. He it not always lengthy in his explanation and you have to search and try a bit for yourself. But he knows where he is talking about and I have used a lot from his knowledge myself.
Greetings from TJ
ID: 35278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35279 - Posted: 23 Feb 2014, 15:19:02 UTC - in response to Message 35271.  

skgiven.

First my setup.
Intel Core i5 Quad 2500K (Sandy Bridge) OC to 4.33GHz.
GTX760 OC to 1097Mhz and 1.2V

I use a small program TThtottle to prevent BOINC tasks driving the CPU temp above 75C and the GPU temp above 70C. Quite often the CPU reaches 75C but it is rare for the GPU to reach 70C.


WTF!!!! You've OCd your CPU to 4.33GHz and now you're throttling it back with TThrottle?????!!!!! That just doesn't make any sense. It's like building a super fast engine then slipping the clutch so the vehicle doesn't go too fast.

Of all the programs and BOINC projects I run only GPUGRID fails in any way so as far as I am concerned my system runs nicely.


Those other GPU projects are wussy projects. Any old FUBAR'd system, even yours, can crunch them. GPUgrid is the cream of the crop, the top dog amongst all GPU using BOINC projects. The admins and the dev here are what other GPU using projects only wish they could be. The only reason your system can't run GPUgrid is because you've FUBAR'd your system.

From what I have read it seems that the only option I have to get GPUGRID to run is to downclock the GPU. If I did that then it will impact the other BOINC projects adversely. So I will not downclock the GPU.


You have plenty of other options but it seems you've made up your mind that your system is the model of perfection and should run GPUgrid exactly the way it is.

It is, as you say, simple to change the various GPU parameters. However, it takes experience to do it properly.


Well then get the experience. Plenty of other people have done exactly that.

I suspect most people running GPUGRID are not experts nor wish to become one.


You don't need to become an expert but if that's the word you prefer to use then fine continue to delude yourself. But that isn't going to get you crunching GPUgrid.

Also, what you left unspoken, is that it takes many hours of checking, over several GPUGRID tasks, to ensure that the changes have the desired effect and also that other projects and programs still run as well as before.


That's debatable but let's say it's true. If you don't want to do the work then you don't get to run with the big dogs; you just sit on the porch with the pups and watch.

One final point. It is quite possible to write programs, either deliberately or by accident, that drive a component beyond its safe limits or seriously affect other programs. It looks to me that GPUGRID, in a perfectly reasonable desire for efficiency, is reaching or has reached that point. It is now up to the programmer to seriously consider what and how they are coding.


You obviously don't know spit about coding and the proof is that you're OCing your CPU then throttling it back. Your "advice" to the programmer here is a sad joke at best and the meanderings of a noob at worst.


In summary, getting GPUGRID tasks to run without error is not quite as simple as you make out.


Yes it is and if you would spend more time reading about it and thinking about it and stop wasting time arguing about it you would be half way there by now. I've seen several other non-experts do it, why can't you?

I have no doubt others will, perhaps vehemently, disagree with me. But this is my opinion just as you have yours.


You and skgiven both have opinions, that much is true. To say your opinion on this topic is as informed and valid as skgivens' opinion is, well, the thought just makes me ROFLMAO.

BOINC <<--- credit whores, pedants, alien hunters
ID: 35279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BobMALCS

Send message
Joined: 27 Nov 13
Posts: 4
Credit: 10,253,081
RAC: 0
Level
Pro
Scientific publications
watwat
Message 35367 - Posted: 26 Feb 2014, 20:47:00 UTC - in response to Message 35279.  

<sigh>

1 - You know nothing about my skills, ability, or achievements any more than I know about yours.

2 - I will NOT turn this thread into a flame-fest. You can, of course, do whatever makes you feel good.

3 - I will let the readers make up their own minds.

I'm out of here.

Bye.
ID: 35367 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35371 - Posted: 26 Feb 2014, 21:34:47 UTC - in response to Message 35367.  

<sigh>

1 - You know nothing about my skills, ability, or achievements any more than I know about yours.


Wrong. From you have told us about yourself and what you're doing it's easy to see you're advising on topics you know little about. Your monkey see monkey do (it works at project A therefore is has to work at project B too) solution is no solution at all. Call that a flame if you want, I call it the truth and I believe the more experienced crunchers here will agree with that opinion.

2 - I will NOT turn this thread into a flame-fest. You can, of course, do whatever makes you feel good.


If you think I am posting in this thread to make me feel good you're wrong again. I post to try to help you and others feel good by telling you what isn't likely to happen so that you can pursue a more realistic strategy for achieving success crunching here.

3 - I will let the readers make up their own minds.


How magnanamous of you.

I'm out of here.

Bye.


<yawn>

BOINC <<--- credit whores, pedants, alien hunters
ID: 35371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35387 - Posted: 27 Feb 2014, 21:00:21 UTC - in response to Message 35271.  

In summary, getting GPUGRID tasks to run without error is not quite as simple as you make out.

I have no doubt others will, perhaps vehemently, disagree with me. But this is my opinion just as you have yours.

I think you are right it is not simple. Even after reducing the temperature you may have problems; you will very likely have to reduce the clock. But since your card is over-clocked anyway, you are really just reducing the clock to the value that Nvidia specified, which they did for a reason. If you (or the factory) overclock the card, you take your chances. GPUGrid is not a gaming community, and maybe they should make more allowances for that when they design their programs. But I am not a gamer anyway, and only use these cards for GPUGrid, so how they perform on other projects is of no concern to me. Each person has his own tolerance for tweaking up the cards, so you must act accordingly.

But it is quite possible to get them stable; here are my results for a recently completed run on two GTX 660s on a PC running Windows 7 64-bit:
http://www.gpugrid.net/results.php?hostid=165674&offset=0&show_names=1&state=0&appid=

I have since moved the cards to a WinXP machine, where they run faster. But as a consequence, they are now a little unstable and I am having to play the tweaking game again. They will be stable again shortly, but whether you consider that fun or not is up to you.

ID: 35387 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35390 - Posted: 28 Feb 2014, 9:16:16 UTC - in response to Message 35387.  

Impressive Jim, those two so 660 stable.
I can get them stable for only a few days and then an error. I tweak the cards, one by one almost every week. Clocks are way down, that the cards run slower then they should. Mine are EVGA's and the fan can run to a maximum of 75%, that is the reason the master GPU is always almost at 75°C.
Greetings from TJ
ID: 35390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35391 - Posted: 28 Feb 2014, 10:28:45 UTC - in response to Message 35390.  

TJ,

Have you tried to reduce the RAM clock of your card as well?
ID: 35391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35394 - Posted: 28 Feb 2014, 11:58:49 UTC - in response to Message 35391.  

Also, I have had to increase the power limit (to 110%) of the cards using Nvidia Inspector. In fact, my most recent problem required that I increase the limit more than that. That required downloading the original BIOS using GPU-Z, modifying it using Kepler Bios Tweaker (I set the power limit to 137.5 watts), and then flashing the BIOS of my Zotac GTX 660 using nvflash. It is not for the faint of heart, and I don't think everyone should attempt it. But that is an extreme case because that card didn't have the greatest heatsink to begin with, and most cards don't require going into the BIOS. You can just use Nvidia Inspector (or MSI Afterburner or whatever else you want) for most cases.

Other cards require down-clocking the GPU, and maybe the memory too as RZ points out. So that is why it can get complicated; each card is an individual investigation if you want to get down to zero errors, and I don't think that is really necessary but mention it only to show that it is possible. A lot of the errors blamed on the work units are really due to instabilities of the card because it is being pushed too hard.
ID: 35394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35417 - Posted: 1 Mar 2014, 11:29:05 UTC - in response to Message 35391.  

TJ,

Have you tried to reduce the RAM clock of your card as well?

No I didn't yet. Good advice Zoltan, I will fiddle with that a bit too.
Greetings from TJ
ID: 35417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 35418 - Posted: 1 Mar 2014, 11:31:00 UTC - in response to Message 35394.  

Thanks Jim, I know the procedure of updating the GPU BIOS have all the tools but still not done it yet. If I get a second 780Ti I will put the 660 in an older system and then experiment with the BIOS too.
Greetings from TJ
ID: 35418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Simulation has become unstable

©2026 Universitat Pompeu Fabra