SANTI Errors

Message boards : Number crunching : SANTI Errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
tomba

Send message
Joined: 21 Feb 09
Posts: 497
Credit: 700,690,702
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34378 - Posted: 18 Dec 2013, 18:15:24 UTC - in response to Message 34365.  

I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed...

Please try the same, Tomba. Lower the clock speed of the offending GPU by 13 or 26 MHz and see if it helps too.

MrS


Dear MrS,

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity.

Why do you want me to penalize Natans, that give 10% more credit than Santis? Why don't you fix the Santi problem??

[I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause]
ID: 34378 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34381 - Posted: 18 Dec 2013, 20:19:04 UTC

If you check my posts you can see that I am nagging about the Santi WU's LR and SR since summer. All problems on the GTX660.
I got 770 in Augusts and that worked error free for 33 days consecutive. I built a new system to accommodate a GTX780Ti and put two 660's in the other system. That give error after 2 days running both Santi's. I replaced (on advise here) the 600Watt PSU to a 750Watt PUS and since then it is crunching error free for 4 days in a row.
It could be that Santi's and 660's don't work well together on a weak system (older MOBO, older BIOS) or a PSU with less overhead. I have no proof of this, but it is what I am seeing now.

@Tomba. If you have your new system ready with two 660's in and they run the Santi's smooth, then that would be little proof as you then have the same CPU and MOBO as I have my two 660's in running. Exciting.
Greetings from TJ
ID: 34381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34384 - Posted: 19 Dec 2013, 0:27:53 UTC - in response to Message 34378.  

I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed...

Please try the same, Tomba. Lower the clock speed of the offending GPU by 13 or 26 MHz and see if it helps too.

MrS


Dear MrS,

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity.

Why do you want me to penalize Natans, that give 10% more credit than Santis?


Hmmm. Take a wee hit on NATHANs for a big gain on SANTI? You're right, that is a preposterous proposal. <roll-eyes>

Why don't you fix the Santi problem??


Why don't you fix it yourself? Why don't you install Linux and get an 11 - 12% boost on all your tasks, if you're genuinely in the credit chasing game and don't want to waste electricity. Sorry, I don't wish to offend, but to me it just doesn't make sense to cry about inefficiency when you're running an antiquated POS opsys like Win7/8.

BOINC <<--- credit whores, pedants, alien hunters
ID: 34384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34385 - Posted: 19 Dec 2013, 1:28:01 UTC

Some work units are easy, some are hard. You have to live with it; I actually like the harder ones better, since they exercise my card more and may do more challenging science(?). At any rate, I have just "downgraded" a GTX 660 to a base clock of 967 MHz (with corresponding reductions in the boost and maximum clocks), but also boosted the voltage on the core up from 1.162 volts to 1.175 volts to get it stable, and increased the upper power limit to 115% max. If that is what needs to be done, OK with me; I don't expect the scientists to design their experiments for the weakest cards out there.
ID: 34385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Damaraland

Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34388 - Posted: 19 Dec 2013, 6:34:43 UTC - in response to Message 34385.  

I don't expect the scientists to design their experiments for the weakest cards out there.

I think you are missunderstunding me, I don't care if they are errors, I really don't care about how much credit I get. I don't expecto project to adapt to my old card.
My point is that if there are too many errors is worth to investigate if there a way to correct them to avoide them. Nobody (nor user, nor scientists) wants to waste electricity.
If there's some kind of units that don't fit in older cards we should know it.
ID: 34388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34389 - Posted: 19 Dec 2013, 9:26:45 UTC - in response to Message 34388.  
Last modified: 19 Dec 2013, 10:06:52 UTC

Yes, it would be convenient if they had a "worst case" work unit we could run, to see if our cards are stable on it. Then we could adjust the cards as necessary, or just accept the error rate for whatever it is. But I doubt that even the scientists know what the worst case really is, or what they will need in the future.

Many of the cards are way overclocked for the gamers, and just inherently have a higher error rate. But the ones they do complete successfully are valuable for the science. Only the scientists know the statistics for how many errors they are getting, and they have to decide whether it is good enough. You are right that they don't get any work done if everyone leaves the project, so they have to set a happy medium that everyone can live with (or at least enough people to get the work done).

I am not a gamer myself, but perhaps they could set some warnings out that would alert new users to the possibility of problems. Then at least it wouldn't come as so much of a surprise when they inevitably happen.
ID: 34389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34391 - Posted: 19 Dec 2013, 10:15:09 UTC - in response to Message 34378.  
Last modified: 19 Dec 2013, 10:19:10 UTC

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

You can change it back when the problematic Santi WUs are cleared from the queue.

In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity.

It is said before, that this problem is *not* general (the overall error rate is low for these workunits), so these errors caused by a specific problem in your system not by the project, therefore:
- the staff won't do anything about it (it may cause more errors than it fixes)
- it depends on you if you accept our advice, and try to fix *your* problem or you take the frustration caused by the wasted electricity.

Why do you want me to penalize Natans, that give 10% more credit than Santis?

Lowering the GPU clock is a safe way to try to fix this error. You can increase the GPU voltage instead (no penalty), but it's risky because it will increase the power used by the GPU i.e. the temperature of the GPU.

Why don't you fix the Santi problem??

Because there isn't a Santi problem from the project's point of view.

[I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause]

Me too. We appreciate that. That's why it is also important for us to fix your problem.
GeForce cards are made for gaming, not for crunching. Their factory settings lets the gamer get the maximum performance from the GPU, sacrificing some stability (there's no problem, when there's a glitch in a game when you play for 8 hours, but it will ruin an 8 hour long workunit).
If you lower your GPU's clock, and it makes your host capable of crunching all and every workunits error free, your RAC (your daily contribution) will be higher than when it's crunching a little bit faster, but some workunits failing in exchange (also your frustration will be lower).
ID: 34391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Coleslaw

Send message
Joined: 24 Jul 08
Posts: 36
Credit: 363,857,679
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34395 - Posted: 19 Dec 2013, 14:22:54 UTC - in response to Message 34391.  

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

You can change it back when the problematic Santi WUs are cleared from the queue.


I'm not caring about the points but rather stability of the systems involved. Until now, I didn't have to worry about running GPUGrid work units. I now have three boxes I have had to pull due to these work units. It isn't just 6xx series cards as I have pointed out. It also effects the GT430's as well. I run all of my cards at stock speeds and don't over clock my CPU's either. I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI. This is something that should be able to be fixed within the application. Or GPUGrid should allow users to decide whether to run SANTI or NATHAN work units when available via preferences. Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further.
ID: 34395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34397 - Posted: 19 Dec 2013, 15:54:24 UTC - in response to Message 34395.  

You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs.

You can change it back when the problematic Santi WUs are cleared from the queue.


I'm not caring about the points but rather stability of the systems involved. Until now, I didn't have to worry about running GPUGrid work units. I now have three boxes I have had to pull due to these work units. It isn't just 6xx series cards as I have pointed out. It also effects the GT430's as well. I run all of my cards at stock speeds and don't over clock my CPU's either.


You miss the point. We know you don't OC your cards but perhaps the manufacturer did. Anyway, all that irrelevant when a slight downclock or voltage boost will likely fix your problem. I say your problem because most of us aren't experiencing any problem with SANTI.

I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI.


That is a blatant exaggeration. Nobody expects "everyone else" to tweak their systems. They expect only the very few who have problems to tweak their system. Why do you ignore the many hundreds of systems on which SANTIs run with no problem?

Why should the admins tweak SANTI tasks or the app just to spare 1% of systems grief when doing so could mean SANTI starts crashing on the 99% that have no problem with current SANTI?

They could provide separate queues for SANTI but that creates more problems than it fixes because the next time your improperly configured system runs into what you think are bad tasks you'll want them to spend more time creating yet another queue. That makes no sense at all when you could solve the problem easily on your end.

This is something that should be able to be fixed within the application.


Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree.

Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further.


Technically 3 or 4 fits the definition of multiple but that is irrelevant when SANTI isn't a problem for 99% of systems. Go figure.

A better solution might be to run a script that watches your queue and aborts SANTI tasks the minute you receive one and continues aborting SANTI until you receive a NATHAN or whatever tasks work for you. The only potential problem I see with that solution is that if you abort too many tasks the server might make you wait 24 hours before it sends more, maybe but maybe not, I'm not sure how they have that configured.

Another possible option is a script that watches your queue and automatically tweaks your card one way just before it starts a SANTI and then tweaks it a different way when it receives a NATHAN.

BOINC <<--- credit whores, pedants, alien hunters
ID: 34397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Damaraland

Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34400 - Posted: 19 Dec 2013, 19:26:50 UTC - in response to Message 34397.  
Last modified: 19 Dec 2013, 19:27:11 UTC

This is something that should be able to be fixed within the application.


Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree.

Reading all this thread, I'm not sure if there's a problem with OC cards or complex units. I will stop my computer and read carefully the weekend to understand what's going on.
But I would like to point something.
Whats wrong if you want to be a "passive" cruncher? and you don't want to worry about Linux, scripts or whatever. what's wrong with this?
I think you can't expect to have every user on this proyect to be a geek on hardware.
Of course people with best machines are very familiar with this Project, but there's another profile of BOINC cruncher. People with multiple Project don't want to get in profound with technical specs.
I think an easy solution should be proposed for these people. Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO
HOW TO - Full installation Ubuntu 11.10
ID: 34400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34401 - Posted: 19 Dec 2013, 21:33:12 UTC - in response to Message 34400.  
Last modified: 19 Dec 2013, 21:35:53 UTC

This is something that should be able to be fixed within the application.


Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree.

Reading all this thread, I'm not sure if there's a problem with OC cards or complex units. I will stop my computer and read carefully the weekend to understand what's going on.
But I would like to point something.
Whats wrong if you want to be a "passive" cruncher? and you don't want to worry about Linux, scripts or whatever. what's wrong with this?


There is nothing wrong with that. Nobody here has said there is something wrong with that.

I think you can't expect to have every user on this proyect to be a geek on hardware.
Of course people with best machines are very familiar with this Project, but there's another profile of BOINC cruncher. People with multiple Project don't want to get in profound with technical specs. I think an easy solution should be proposed for these people.


I agree. For this problem with SANTI tasks crashing there is an easy solution. That solution is the solution proposed by Retvari. If that solution is too difficult for some people then they can ask for help implementing it. Asking the project devs to fix their problem is not, IMHO, a reasonable solution unless their problem also afflicts many other users. This problem with SANTI is limited to just a few users. How do I know that? I know because if it were a widespread problem a lot more people would be complaining and the admins would be able to see it in the stats they collect.

Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO


If you want a script then ask and I will try to provide one unless the project admins think it's harmful to the project. A script to auto-abort SANTI tasks as soon as they download is easy but maybe not the wisest approach. A script to adjust the clock down or the voltage up when you receive a problem task (SANTI for example) and return clocks/voltage to normal for other tasks would be harder to implement but I am sure there is a way.
BOINC <<--- credit whores, pedants, alien hunters
ID: 34401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Coleslaw

Send message
Joined: 24 Jul 08
Posts: 36
Credit: 363,857,679
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34402 - Posted: 20 Dec 2013, 4:39:39 UTC - in response to Message 34401.  

Dagorath, glad you are still trying to make a few suggestions and stick to that very narrow mindset. My reasoning for the app change rather than making users make tweaks locally for one sub project is more focused towards those machines that don't have easy access and can't be tweaked remotely on a day to day basis. As far as giving people the option to choose between SANTI and NATHAN adding more problems, that is yet to be seen. Until then, it is only opinion which really isn't worth arguing. In this case if the option was a choice, GPUGrid would still have 4 more GPU's from me crunching away. I'm sure others who are having difficulty would do the same. I have no idea the true numbers of people experiencing problems because not everyone posts in the forums. I don't even know if the techs here look at the work units I have aborted that were causing the BSOD's because they didn't get to "error out" and therefore would not show up that way. Instead it would show up as a user abort and someone else who didn't have BSOD issues could finish it. I'm not saying my cards might not be overclocked by the manufacturer. So, I can assure you that "point" was not missed. I just didn't address it in my above statement. I have made my choice in regards to tweaking my cards and have expressed my opinions (which is what they are regardless if you like them) on how I feel about the issue at hand. Please choose to ignore them if you don't like my approach.
ID: 34402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34403 - Posted: 20 Dec 2013, 7:31:49 UTC - in response to Message 34402.  

Coleslaw,

Take heart ol' chap, I love your ribald "ad hominem followed by vicious attack on a straw man" humor. I'm just glad you can still crack a joke even though Bruce kept the computers after he changed the locks. Maybe if you tell him you didn't realize the strap-on chaffs his hips and he doesn't have to wear it anymore he'll let you back in the house so you can tweak your rigs.

BOINC <<--- credit whores, pedants, alien hunters
ID: 34403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34406 - Posted: 20 Dec 2013, 13:44:28 UTC - in response to Message 34403.  
Last modified: 20 Dec 2013, 22:41:27 UTC

To quote Statler and Waldorf, "You're not old, but your ugly!"

Cat claws are retractable. If you really must, paw at each others non-dangly bits.
ID: 34406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jozef J

Send message
Joined: 7 Jun 12
Posts: 112
Credit: 1,140,895,172
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 34407 - Posted: 20 Dec 2013, 18:42:50 UTC - in response to Message 34378.  

As I wrote some time ago - this project is no longer under quality control ..
scientists have siesta ..
  Tomba I will recommend another project, this GPUGRID is stopped in time..Tomba here you lost a few days on this project, I am 6 weeks ... My computer fell due to a faulty tasks only four days after I went on vacation .. and I could not restart, physically reset was needed .. unfortunately..
Now numbering Collatz Conjecture and everything goes like on butter with absolutely no errors and increment the Tasks of RAC in BOINC is the most high..

TheSkyNet POGS-- trophies have unique entertainment factor, the other projects shall not.The absolute best in the world in BOINC. Still, they could do some interactive, screen saver as have some other BOINC projects, and it will be best boinc project ..

Web site TheSkyNet POGS shows all pages BOINC project develops as to be in the future...
ID: 34407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Damaraland

Send message
Joined: 7 Nov 09
Posts: 152
Credit: 16,181,924
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwat
Message 34408 - Posted: 20 Dec 2013, 19:04:05 UTC - in response to Message 34406.  
Last modified: 20 Dec 2013, 19:05:41 UTC

@skgiven
To quote Statler and Waldorf, "You're not old, but your ugly!
Cat claws are retractable. If you really must, paw at each others non-dangly bits.

I think the problem is not this. I think you are seeing this from a very narrow point of view.
I think there are many different kind of:
- profile of users (motivations)
- way they see problems with units (tolerance with errors)
- different kind of technical knowledge (hardware and sofware)
- appetite for problems (wishing to push hardware or find solutions as a hobby, time).

Whatever profile one might have or motivations, everyone adds, I wish "the project" could be more comprenhesive with all of them.
I'm sure that maybe that the profile of TOP 10 with huge riggs don't see any problem, and probably they contribute with 80% of computing power. But I beleave everyone adds.
I feel very stupid posting an error. I don't expect that everything is smooth, but if I post is becouse I give my time to help.
If you give the sensation that problems are not pursuived and investigated many people will quit and you will loose some users little by little. Of course others will come back. I left and came back.
To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me.
I clarify that I'm not complaining and I understand that the project has ilimited ressources. Maybe organizing everytalented people here could help.
As I wrote some time ago - this project is no longer under quality control ..
scientists have siesta ..

This is a huge xxxxxx, I bite my tongue. Jozef, you have no idea what's going on or what the problem is. In Spain there's 1% people have siesta, and it's been proved that this very good for the body and mental sharpness.
ID: 34408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34419 - Posted: 21 Dec 2013, 16:08:19 UTC - in response to Message 34408.  

To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me.


The admins have said they will look into it and IIUC, they have also indicated that it's not likely anything can be done and I believe the reasons have been covered. Therefore, tomba, I think you got exactly what you say you want.

In addition to the above, other solutions have been offered.

Narrow minded is as narrow minded does. A few of us have tried to broaden the options. That is what we have done.

Others have ignored all alternative options and focused upon the 1 option the admins have politely indicated they're not gonna get. Is that broad thinking or narrow thinking?

I am sorry if some volunteers installed hosts in remote locations and failed to do the smart thing and configure them to allow remote access and administration. Hopefully they can fix that and do better next time they setup a remote host.

BOINC <<--- credit whores, pedants, alien hunters
ID: 34419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34433 - Posted: 22 Dec 2013, 13:46:46 UTC - in response to Message 34341.  

I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time..

Just had the same thing happen with a SANTI_bax2 WU on one machine. It BSODed everytime BOINC started the SANTI WU. Even caused disk corruption once. 650ti GPU. If I suspended the WU the machine ran fine. Finally aborted the WU and DLed another SANTI_bax2 which is so far running OK. Before that WU the box in question had run for a VERY long time without a single crash. I've run 18 other SANTI_bax2 WUs without an issue. Wonder if perhaps some WUs got released with bad parameters? Maybe a corrupted DL?

As for my WU detailed above. It finished fine for the next guy to get it, interesting since his machine has TONS of errors. I cut the clocks by 25Mhz and 5 SANTI_bax2 WUs have since completed fine on that GPU. I suspect that perhaps this WU type stresses the GPU slightly more than most so that GPUs "on the edge" are more likely to error. Anyway, I've had 167 valid and 1 error lately (on 8 machines). I'd say the project is running pretty smoothly (at least here). Haven't had that strange bluescreening on any machine before or since.

History quiz: Does anyone remember when MS ballyhooed long and loudly that they had solved the "black screen of death"? Remember the solution?
ID: 34433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
candido

Send message
Joined: 12 Jun 11
Posts: 12
Credit: 150,069,999
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 34469 - Posted: 24 Dec 2013, 19:19:48 UTC
Last modified: 24 Dec 2013, 19:23:57 UTC

Have been crunching GPUGrid WU again since about a week ago with two machines, three since yesterday, and had no problems with SANTI, NATHAN, NOELIA or SDOERR. I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good.
Thanks for the suggestion
ID: 34469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34472 - Posted: 24 Dec 2013, 21:51:14 UTC - in response to Message 34469.  

I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good.

That's the spirit. Unfortunately you are never quite sure that you have done enough until you eventually don't get any more errors. But my GTX 660s are now working fine for me, and I hope they stay that way. With the variability we see in the work units, you never know though.
ID: 34472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : SANTI Errors

©2025 Universitat Pompeu Fabra