Message boards :
Number crunching :
SANTI Errors
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed... Dear MrS, You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity. Why do you want me to penalize Natans, that give 10% more credit than Santis? Why don't you fix the Santi problem?? [I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause] |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you check my posts you can see that I am nagging about the Santi WU's LR and SR since summer. All problems on the GTX660. I got 770 in Augusts and that worked error free for 33 days consecutive. I built a new system to accommodate a GTX780Ti and put two 660's in the other system. That give error after 2 days running both Santi's. I replaced (on advise here) the 600Watt PSU to a 750Watt PUS and since then it is crunching error free for 4 days in a row. It could be that Santi's and 660's don't work well together on a weak system (older MOBO, older BIOS) or a PSU with less overhead. I have no proof of this, but it is what I am seeing now. @Tomba. If you have your new system ready with two 660's in and they run the Santi's smooth, then that would be little proof as you then have the same CPU and MOBO as I have my two 660's in running. Exciting. Greetings from TJ |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I lowered the GPU clocks by 10mhz and so far all WU's have completed successfully..... finger's crossed... Hmmm. Take a wee hit on NATHANs for a big gain on SANTI? You're right, that is a preposterous proposal. <roll-eyes> Why don't you fix the Santi problem?? Why don't you fix it yourself? Why don't you install Linux and get an 11 - 12% boost on all your tasks, if you're genuinely in the credit chasing game and don't want to waste electricity. Sorry, I don't wish to offend, but to me it just doesn't make sense to cry about inefficiency when you're running an antiquated POS opsys like Win7/8. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Some work units are easy, some are hard. You have to live with it; I actually like the harder ones better, since they exercise my card more and may do more challenging science(?). At any rate, I have just "downgraded" a GTX 660 to a base clock of 967 MHz (with corresponding reductions in the boost and maximum clocks), but also boosted the voltage on the core up from 1.162 volts to 1.175 volts to get it stable, and increased the upper power limit to 115% max. If that is what needs to be done, OK with me; I don't expect the scientists to design their experiments for the weakest cards out there. |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I don't expect the scientists to design their experiments for the weakest cards out there. I think you are missunderstunding me, I don't care if they are errors, I really don't care about how much credit I get. I don't expecto project to adapt to my old card. My point is that if there are too many errors is worth to investigate if there a way to correct them to avoide them. Nobody (nor user, nor scientists) wants to waste electricity. If there's some kind of units that don't fit in older cards we should know it. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes, it would be convenient if they had a "worst case" work unit we could run, to see if our cards are stable on it. Then we could adjust the cards as necessary, or just accept the error rate for whatever it is. But I doubt that even the scientists know what the worst case really is, or what they will need in the future. Many of the cards are way overclocked for the gamers, and just inherently have a higher error rate. But the ones they do complete successfully are valuable for the science. Only the scientists know the statistics for how many errors they are getting, and they have to decide whether it is good enough. You are right that they don't get any work done if everyone leaves the project, so they have to set a happy medium that everyone can live with (or at least enough people to get the work done). I am not a gamer myself, but perhaps they could set some warnings out that would alert new users to the possibility of problems. Then at least it wouldn't come as so much of a surprise when they inevitably happen. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. You can change it back when the problematic Santi WUs are cleared from the queue. In the past month my "offending" GPU has processed 28 Nathans without one single error, vs. 31 Santis, seven of which stopped with errors that cost me 45 hours of wasted, expensive electricity. It is said before, that this problem is *not* general (the overall error rate is low for these workunits), so these errors caused by a specific problem in your system not by the project, therefore: - the staff won't do anything about it (it may cause more errors than it fixes) - it depends on you if you accept our advice, and try to fix *your* problem or you take the frustration caused by the wasted electricity. Why do you want me to penalize Natans, that give 10% more credit than Santis? Lowering the GPU clock is a safe way to try to fix this error. You can increase the GPU voltage instead (no penalty), but it's risky because it will increase the power used by the GPU i.e. the temperature of the GPU. Why don't you fix the Santi problem?? Because there isn't a Santi problem from the project's point of view. [I am not ashamed that I'm in the credit-chasing game. I just spent over €1000 for a rig that will boost dramatically my contribution to this most-worth cause] Me too. We appreciate that. That's why it is also important for us to fix your problem. GeForce cards are made for gaming, not for crunching. Their factory settings lets the gamer get the maximum performance from the GPU, sacrificing some stability (there's no problem, when there's a glitch in a game when you play for 8 hours, but it will ruin an 8 hour long workunit). If you lower your GPU's clock, and it makes your host capable of crunching all and every workunits error free, your RAC (your daily contribution) will be higher than when it's crunching a little bit faster, but some workunits failing in exchange (also your frustration will be lower). |
ColeslawSend message Joined: 24 Jul 08 Posts: 36 Credit: 363,857,679 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. I'm not caring about the points but rather stability of the systems involved. Until now, I didn't have to worry about running GPUGrid work units. I now have three boxes I have had to pull due to these work units. It isn't just 6xx series cards as I have pointed out. It also effects the GT430's as well. I run all of my cards at stock speeds and don't over clock my CPU's either. I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI. This is something that should be able to be fixed within the application. Or GPUGrid should allow users to decide whether to run SANTI or NATHAN work units when available via preferences. Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further.
|
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You are recommending I downgrade the performance of my "offending" GPU to accommodate Santi WUs. You miss the point. We know you don't OC your cards but perhaps the manufacturer did. Anyway, all that irrelevant when a slight downclock or voltage boost will likely fix your problem. I say your problem because most of us aren't experiencing any problem with SANTI. I have more than enough over head on my Power Supplies. I don't believe someones work around by modifying each machine is a justifiable argument for everyone else to have to go tweak their systems just to run SANTI. That is a blatant exaggeration. Nobody expects "everyone else" to tweak their systems. They expect only the very few who have problems to tweak their system. Why do you ignore the many hundreds of systems on which SANTIs run with no problem? Why should the admins tweak SANTI tasks or the app just to spare 1% of systems grief when doing so could mean SANTI starts crashing on the 99% that have no problem with current SANTI? They could provide separate queues for SANTI but that creates more problems than it fixes because the next time your improperly configured system runs into what you think are bad tasks you'll want them to spend more time creating yet another queue. That makes no sense at all when you could solve the problem easily on your end. This is something that should be able to be fixed within the application. Yep! And pigs should have wings so they can fly themselves to the slaughterhouse so I can stay at home and harvest money off my money tree. Since this is a known error and it is happening to multiple people with different cards, it certainly should be looked at further. Technically 3 or 4 fits the definition of multiple but that is irrelevant when SANTI isn't a problem for 99% of systems. Go figure. A better solution might be to run a script that watches your queue and aborts SANTI tasks the minute you receive one and continues aborting SANTI until you receive a NATHAN or whatever tasks work for you. The only potential problem I see with that solution is that if you abort too many tasks the server might make you wait 24 hours before it sends more, maybe but maybe not, I'm not sure how they have that configured. Another possible option is a script that watches your queue and automatically tweaks your card one way just before it starts a SANTI and then tweaks it a different way when it receives a NATHAN. BOINC <<--- credit whores, pedants, alien hunters |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is something that should be able to be fixed within the application. Reading all this thread, I'm not sure if there's a problem with OC cards or complex units. I will stop my computer and read carefully the weekend to understand what's going on. But I would like to point something. Whats wrong if you want to be a "passive" cruncher? and you don't want to worry about Linux, scripts or whatever. what's wrong with this? I think you can't expect to have every user on this proyect to be a geek on hardware. Of course people with best machines are very familiar with this Project, but there's another profile of BOINC cruncher. People with multiple Project don't want to get in profound with technical specs. I think an easy solution should be proposed for these people. Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO HOW TO - Full installation Ubuntu 11.10 |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is something that should be able to be fixed within the application. There is nothing wrong with that. Nobody here has said there is something wrong with that. I think you can't expect to have every user on this proyect to be a geek on hardware. I agree. For this problem with SANTI tasks crashing there is an easy solution. That solution is the solution proposed by Retvari. If that solution is too difficult for some people then they can ask for help implementing it. Asking the project devs to fix their problem is not, IMHO, a reasonable solution unless their problem also afflicts many other users. This problem with SANTI is limited to just a few users. How do I know that? I know because if it were a widespread problem a lot more people would be complaining and the admins would be able to see it in the stats they collect. Maybe "geeks" should help providing an script or whatever somepeople sugested. IMO If you want a script then ask and I will try to provide one unless the project admins think it's harmful to the project. A script to auto-abort SANTI tasks as soon as they download is easy but maybe not the wisest approach. A script to adjust the clock down or the voltage up when you receive a problem task (SANTI for example) and return clocks/voltage to normal for other tasks would be harder to implement but I am sure there is a way. BOINC <<--- credit whores, pedants, alien hunters |
ColeslawSend message Joined: 24 Jul 08 Posts: 36 Credit: 363,857,679 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dagorath, glad you are still trying to make a few suggestions and stick to that very narrow mindset. My reasoning for the app change rather than making users make tweaks locally for one sub project is more focused towards those machines that don't have easy access and can't be tweaked remotely on a day to day basis. As far as giving people the option to choose between SANTI and NATHAN adding more problems, that is yet to be seen. Until then, it is only opinion which really isn't worth arguing. In this case if the option was a choice, GPUGrid would still have 4 more GPU's from me crunching away. I'm sure others who are having difficulty would do the same. I have no idea the true numbers of people experiencing problems because not everyone posts in the forums. I don't even know if the techs here look at the work units I have aborted that were causing the BSOD's because they didn't get to "error out" and therefore would not show up that way. Instead it would show up as a user abort and someone else who didn't have BSOD issues could finish it. I'm not saying my cards might not be overclocked by the manufacturer. So, I can assure you that "point" was not missed. I just didn't address it in my above statement. I have made my choice in regards to tweaking my cards and have expressed my opinions (which is what they are regardless if you like them) on how I feel about the issue at hand. Please choose to ignore them if you don't like my approach.
|
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Coleslaw, Take heart ol' chap, I love your ribald "ad hominem followed by vicious attack on a straw man" humor. I'm just glad you can still crack a joke even though Bruce kept the computers after he changed the locks. Maybe if you tell him you didn't realize the strap-on chaffs his hips and he doesn't have to wear it anymore he'll let you back in the house so you can tweak your rigs. BOINC <<--- credit whores, pedants, alien hunters |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
To quote Statler and Waldorf, "You're not old, but your ugly!" Cat claws are retractable. If you really must, paw at each others non-dangly bits. |
|
Send message Joined: 7 Jun 12 Posts: 112 Credit: 1,140,895,172 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
As I wrote some time ago - this project is no longer under quality control .. scientists have siesta .. Tomba I will recommend another project, this GPUGRID is stopped in time..Tomba here you lost a few days on this project, I am 6 weeks ... My computer fell due to a faulty tasks only four days after I went on vacation .. and I could not restart, physically reset was needed .. unfortunately.. Now numbering Collatz Conjecture and everything goes like on butter with absolutely no errors and increment the Tasks of RAC in BOINC is the most high.. TheSkyNet POGS-- trophies have unique entertainment factor, the other projects shall not.The absolute best in the world in BOINC. Still, they could do some interactive, screen saver as have some other BOINC projects, and it will be best boinc project .. Web site TheSkyNet POGS shows all pages BOINC project develops as to be in the future... |
DamaralandSend message Joined: 7 Nov 09 Posts: 152 Credit: 16,181,924 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
@skgiven To quote Statler and Waldorf, "You're not old, but your ugly! I think the problem is not this. I think you are seeing this from a very narrow point of view. I think there are many different kind of: - profile of users (motivations) - way they see problems with units (tolerance with errors) - different kind of technical knowledge (hardware and sofware) - appetite for problems (wishing to push hardware or find solutions as a hobby, time). Whatever profile one might have or motivations, everyone adds, I wish "the project" could be more comprenhesive with all of them. I'm sure that maybe that the profile of TOP 10 with huge riggs don't see any problem, and probably they contribute with 80% of computing power. But I beleave everyone adds. I feel very stupid posting an error. I don't expect that everything is smooth, but if I post is becouse I give my time to help. If you give the sensation that problems are not pursuived and investigated many people will quit and you will loose some users little by little. Of course others will come back. I left and came back. To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me. I clarify that I'm not complaining and I understand that the project has ilimited ressources. Maybe organizing everytalented people here could help. As I wrote some time ago - this project is no longer under quality control .. This is a huge xxxxxx, I bite my tongue. Jozef, you have no idea what's going on or what the problem is. In Spain there's 1% people have siesta, and it's been proved that this very good for the body and mental sharpness. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
To finish and not making it too long. Just the feeling of something being done or an explanation why there's nothing can be done could be fine, at least for me. The admins have said they will look into it and IIUC, they have also indicated that it's not likely anything can be done and I believe the reasons have been covered. Therefore, tomba, I think you got exactly what you say you want. In addition to the above, other solutions have been offered. Narrow minded is as narrow minded does. A few of us have tried to broaden the options. That is what we have done. Others have ignored all alternative options and focused upon the 1 option the admins have politely indicated they're not gonna get. Is that broad thinking or narrow thinking? I am sorry if some volunteers installed hosts in remote locations and failed to do the smart thing and configure them to allow remote access and administration. Hopefully they can fix that and do better next time they setup a remote host. BOINC <<--- credit whores, pedants, alien hunters |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had to set my third computer so far to No New Work because of the SANTI work units causing BSODs. So far the systems I have had them on Windows 7 premium and professional x64. The cards were GT430, 650Ti, and 660Ti. From what I could see, it has been caused after the drivers crashing many times in a very short period of time.. As for my WU detailed above. It finished fine for the next guy to get it, interesting since his machine has TONS of errors. I cut the clocks by 25Mhz and 5 SANTI_bax2 WUs have since completed fine on that GPU. I suspect that perhaps this WU type stresses the GPU slightly more than most so that GPUs "on the edge" are more likely to error. Anyway, I've had 167 valid and 1 error lately (on 8 machines). I'd say the project is running pretty smoothly (at least here). Haven't had that strange bluescreening on any machine before or since. History quiz: Does anyone remember when MS ballyhooed long and loudly that they had solved the "black screen of death"? Remember the solution? |
|
Send message Joined: 12 Jun 11 Posts: 12 Credit: 150,069,999 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Have been crunching GPUGrid WU again since about a week ago with two machines, three since yesterday, and had no problems with SANTI, NATHAN, NOELIA or SDOERR. I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good. Thanks for the suggestion |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had a BSOD today while crunching a Nathan WU. I thought it was a software problem and installed the new drivers (331.82), which didn´t solve the problem. After reading these posts decided to lower the core clock and so far so good. That's the spirit. Unfortunately you are never quite sure that you have done enough until you eventually don't get any more errors. But my GTX 660s are now working fine for me, and I hope they stay that way. With the variability we see in the work units, you never know though. |
©2025 Universitat Pompeu Fabra