Message boards :
Server and website :
Server needs topping up
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Part of this might have been a result of a new user push. Until March 29, the user count was growing about 10 a day. This shows very clearly at the statistics sites - for example, http://boincstats.com/en/stats/45/project/detail/user. It will be interesting to see how many of the new users remain as 'active' users, or 'users with recent credit', over the weeks and months to come. A recent encounter with something close to the "new user experience" makes me rather pessimistic. I recently upgraded the video driver on two older hosts (43362, 43404). The GTX 750 Ti GPUs are a little marginal for returning long tasks within 24 hours - especially Gerard's latest offerings! - but do contribute successfully to this project. The significant result of upgrading the driver was to increase my capability from cuda60 to cuda65 - and I started being allocated cuda65 tasks on those machines for the first time. So I was braced for the BOINC server's generic handling of runtime estimates for a new app_version, and it was as bad as I expected. This isn't a criticism of the GPUGrid project - they have to use the server software provided by BOINC - but it makes for a very bad user experience. Both these hosts show a long-term APR of over 100 GFlops for the cuda60 long tasks, both averaged over more than 250 tasks. But on a version change, BOINC throws all that accumulated knowledge away, and starts at rock bottom all over again. And I mean rock bottom. I monitored 43404 most closely: BOINC started the new version off with an estimated speed of 2.1 GFlops, and gradually dropped it to 1.77 GFlops (probably those long Gerards again!). Long tasks at those speeds translate to estimated runtimes of 789 hours and 887 hours respectively - around 5 weeks. With a 5 day deadline, the BOINC client locally is clearly in deadline trouble, and reacts by preempting running GPU tasks from other projects to give priority to the GPUGrid task. Those new users will see the same behaviour: multi-week estimates, 5-day deadlines, and GPUGrid 'monopolising' (as they will see it) their GPUs. That sort of thing gives projects a bad name, but - I stress - it's not GPUGrid's fault. If the users persevere, and complete their initial 11 tasks, estimates will normalise and become realistic, and the 'high priority' running will go away - but how many users will have that patience? It took me over a week to nurse 43362 back to normality, and 43404 still isn't there yet. I have written - yet again - to David Anderson urging him to address this problem of initial speed estimates for GPUs, but I'm not optimistic. His algorithm is designed to cope with the steady-state estimates for hosts which have completed hundreds or thousands of short tasks, and he sees that it is adequate for that purpose. But it's incomplete. I would urge the project administrators of GPUGrid (any any other project administrators who read this) to monitor the drop-out rate for those newly-recruited users, and if it causes them any concern, to raise the subject with David Anderson directly. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dear Richard, As I am not sure if this thread about empty feeding servers is directly related to speed estimates, I would like to add that I have never seen multi-week estimates on any of my systems. From the start of a new system and through upgrades of software and of GPU hardware, I have not ever seen any estimate over 40 hours actually on any of my machines. I think I have actually noticed the exact opposite. I have noticed that sometimes (and it may be around updates and upgrades) that it will show something like 7 hours to complete and then end up with the countdown time not moving a second for a few seconds of more and lengthening the actual time to longer than the initial estimated time sometimes by 2-3 days. I have also noticed estimates of like 38-40 hours and then the countdown time counting 2-4 and even 5-8 second per actual second, and when that happens, it always ends up being an actual of exactly that amount of second divided by the estimation (meaning if it says there is 30 hours and it is counting 3 seconds per actual second, it will definitely finish in right around 10 hours.) Those are all long run only computers. Now I do have one short run only computer because it is a laptop with a Quadro K2100M in it that cannot make even the shorter of the long runs in the deadline, which unfortunately I have had to prove to myself by waiting them out, but even then, the countdown (estimated) time counted one second for every actual 3-10 seconds counted on the count up timer. So maybe I am not seeing what you are seeing, maybe I have newer cards that all estimate better than older ones and maybe you are speaking of older ones (that is probably told in the Cuda versioning that you mentioned, but I don't know the versions and the cards as well as most probably), and maybe I just missed it completely whenever it happened, but I don't think even if I saw it and then it actually finished much much quicker than the estimate, it would turn me off much, because I would know that the work is being done and the bug of estimating the time was something that was just a bug or something that needed time to estimate my actual time. I mean, when I have to run something else (outside of BOINC) to take up time on the GPU because GPUGrid is out of work, when it comes back and I don't catch it for a day or two, it will take an actual 40 some hours to finish a task that normally would have finished in 7 hours and then when I turn that other program off again, it still estimates these actuals into my tasks on that computer for a while and counts 4-10 seconds on the countdown for each actual second on the count up. Then after a few tasks, it gets back to normal again. My point is, I have never seen what you are describing, if I am understanding what you are describing correctly. Now, more to the point of this thread... The queues went dry on and off on April fools day, but went completely dry on the night of April 3rd (Eastern Time zone, maybe early morning on the 4th in Europe and Asia) and since then my laptop has not gotten any work at all long enough for me to notice it in the short queue, and of the 4 long queues workers, I have opened 2 up with the 780s in them to short and long, left the one with the 980s in it only open to accept longs, and it just so happened that the 4th died on the 4th (the heatsink popped loose from the motherboard and was blowing smoke from the processor, so I am not sure how long it ran and I hope the processor isn't dead or the motherboard and some new thermal paste can be applied and it will be alive, though I think by the time there is smoke, it is worse off than paste and popping the heatsink back on can fix.) So one computer down and out, one computer down from not getting anything, and 3 computers running somewhere between 3 a day and none for hours, everyone's RAC is about 3/4 what we started at on the 4th and dropping every hour. So I really hope more work comes out, these suspected new-comers that gobbled up new work will soon show that they are here to stay or give us back the work reassigned by the servers, and or that GPUGrid would even, to keep us happy and running, allow us to crunch on already run work units as a second and third validation of work to account for possible jitter of hardware, which is always present and possible, and most likely probable, at least during times when the queues are empty and there is no work to give out. I mean, would it be too difficult to feed us more than one time for each task when around 2500 tasks are fed to around 7,000 computers with recent credit with probably more than one GPU on several of those computers? If an adaptive task feeding server could be done to give 2500 tasks when 2500 computers are asking for work on 4000 (or more) GPUs/CPUs/Andriods, then give us 7500-10,000 task units by giving us those same 2500 tasks 3 and 4 times. I mean I think not getting work is a much bigger turn off for any project than getting bad estimated run times that end up being wrong and the work gets done in much shorter times. I mean if a 5 day deadline is set and a work unit is fed to me and the estimated time of finishing it says 336 hours, but then it finishes in 33.6 hours but my estimated time is dropping at the rate 10 times faster than real time, I think I get it and hang in there and learn to ignore it. But if I sign up for a project and never see work from it or only see work once a day or two and finish that work in 9 hours and the well runs dry every time I come for water, I find another well and leave this one behind. And if I learn that I am getting repeat work, it may sit a little wrong with me, but it sits better than not getting any. Being hungry and not getting food is worse than getting too much to eat and not finishing and it is much worse than being told you have too much to eat and then only getting enough to fill you. That is for sure! Now, the solution to that, I suppose, is to sign up for other project, set those projects to lower priorities than this one, and let it come back here when there is food..... but that leaves people like me out. People who 1) believe in one thing is best and more is too much (minimalists)(well minimalists willing to go all out maximum on one or two exclusive things) and also people want to see BOINC as it is kind of meant to be, which is a set it and forget it program, who now are being told, "Your house is cold because your computers are not running hard because GPUGrid is out of work and who knows when there will be more, but certainly, they could give you more work, because old work units can be reissued, but they won't do that and who knows why, so turn on a heater or run something else, even though you donated about $1,000 to them, spent money enough to buy 8 GPUs solely as Cuda based in order to run this project (which is no inexpensive thing to get 780's and 980s or even the Quadro in the laptop), and you only run their projects exclusively on purpose because you really believe in your work and none of the other projects, even though they are scientific or medical related, you don't believe in the people or the project itself." <---Well maybe nobody would ever ever say that, but you get the point. (BTW, I think a while back, I started talking to the project and not to Richard, so sorry for stating this was for you and then going off talking to you, Richard.)(Also, I am king of the run-on sentence, so please forgive me for running on and on.) But again, you see my point. Thanks for reading if any of you got through this rant/tired mumbling run-on. And please, if someone who runs or helps run this project reads this, reissuing tasks would solve a TON of user related frustrations when the well runs dry and may help eliminate some jitter in the task results themselves. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If an adaptive task feeding server could be done to give 2500 tasks when 2500 computers are asking for work on 4000 (or more) GPUs/CPUs/Andriods, then give us 7500-10,000 task units by giving us those same 2500 tasks 3 and 4 times. If the work units do not need validation, then your proposal is a horrible idea. I'm not here to waste my energy. I am here to complete the work that is available. To complete scientific work that can better humanity, and in as optimal away as we can. Why are you here? You make it sound like all you care about is that your GPUs are busy, the heat is kept up in your home, and the stats keep coming in. You may need to rethink your priorities, and join some additional projects. The moment a project decides to just reissue tasks for the sole purpose of "keeping devices fed", is the moment I leave the project. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPUGrid would even, to keep us happy and running, allow us to crunch on already run work units as a second and third validation of work to account for possible jitter of hardware, which is always present and possible, and most likely probable, at least during times when the queues are empty and there is no work to give out. Couldn't disagree with you more. I don't want (busy) work. Who wants to rack up electricity costs to run WU's that have already completed sucsessfuly without some solid justification. NO THANKS! |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Absolutely agree with you Jacob. There are other projects that need resources such as GPU's and CPU's without wasting them. |
|
Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
On that note, why don't you ease the overclock on your cards to prevent "jitter of hardware" since a lot of your results on the 980's at least contain "simulation has become unstable"??? |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
caffeineyellow5: Betting Slip is right, both of your GTX 980 computers are showing that message "Simulation has become unstable", which means (so far as I know) that the GPUs are clocked too high for their current voltage. You should be able to complete work units, without receiving that message at all. Try lowering your clocks. You'll even have more valid results! - Jacob |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dear Richard, Sorry about the hijack. After I posted, I did report myself to the moderators for being 'off topic', and suggested that they split off your post about new users, and my response, as the starting point for another discussion in a new thread. They didn't choose to do that, as you can see. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
OK I concede because that all makes sense. And yes, I do make it sound that way, but I am speaking more out of a tongue in cheek off humor than reality when I speak of heating the house and that sort of thing. But I think my experience with distributed projects may, and I stress MAY, be out-dated, but it may be based in reality. My experience with medical distributed projects goes back to the days of United Devices and the original cancer research project that they ran for Oxford and the National Foundation for Cancer Research. Back then it was all CPU and no GPU. It was also Pentium 3 and original Pentium 4 CPUs that were doing the work. So maybe the technology back then and the technology now has completely eliminated jitter between the hardware, but they needed several different computers to finish every work unit so that the jitter between them could be figured and only then could a single work unit be validated. So maybe now, a distributed project only needs to issue and return one copy of any work unit and either the technology on the user end is always completely trusted to always give back a proper result or the error detection in the process itself will detect every error possible OR the back-end "validator" servers can detect all the jitter of every unit based on one single result returned, but I doubt either of those are completely true. And yes, it may sound like I am asking for "busy work", but I think there are 2 sides to that. One, the practical one that you made that it takes electricity to run them and that does cost money AND that there are other projects available for people who are not as selective as I am for what I actually do want my computers doing all day and night. But the the second side is the ones I have stated, which are that I believe in this project and want to do all the work I can for it, I spent the money not for stats, but that stats are there as the indication of how much work you are actually doing for the medical science and if not for the stats, you would not know how much work you or anyone is doing, but would have to take the word of the project that "you are doing a good job guys", and that because the problem of varied results would still exist from hardware and software differences and jitter of usages, then the solution would be to compare multiple result and not take one result per work unit as the end result of validation. I don't believe technology has solved the problems of varied technologies. I actually think that the different cards, different software, different versions of software, different CPU/GPU combinations, different I/O rates and configurations, different everything in general has become more diverse and what people are doing with their computers since 2003 is much more diverse to the point that jitter in results is most likely a bigger problem now than back then! I am not doing this and spending all this time and money too heat the house, keep the GPUs full, and keep up my stats. I am doing all of this and care so deeply because the work is so valuable and important. All my stupid humor aside with heating the house and feeding the GPUs, the work, I honestly believe, needs validation of more than one run. At least 3 runs on different hardware/software/usage configurations, I believe is needed, even if it takes us longer to complete papers and whole tasks. Now the other side of this is the end users, and I was specifically talking to Richard's claims that we will lose users based on bad estimation times. The users will/would be much more turned off by a project if the project continues to run out of work much more than an estimated time thing that works itself out anyway. When you go for a job interview and the company says they have no work for you, you go find work somewhere else and rarely if ever do you come back to ask again if they have any work now, if you find it somewhere else. When the well is dry, you find water elsewhere. When mom smacks your hand when you reach for the cookies, you learn not to reach for the cookies. Get it? To retain the users that Richard is saying will leave because of his problem leave because they can't get work, Richard is proved right over an issue that never even happened. I was not "yelling him down", I was only stating a different point that was more relevant. So when my most productive computer goes from 9-12 units a day to 2-3 and some of my computers don't see a task for 2 days, I know that people coming onto the project will be leaving as fast as they are coming. So let's say they do leave at the rate Richard expects and let's say it is a combination of both his and my pointed reasons and some others... and then a month down the road, we need a lot of users in a short period of time to complete a lot of work units (which is known to happen), and those users are gone... the project is in danger of not completing the task and we are stuck maybe not doing the medical science needed in time OR it was given as a test to see what our computing power could do in order to gain a new scientist who needs our GPUs for their work, just because we could not prove we could do our work and his/her additional work in time? Again, coming from United Devices cancer research, the whole almost 2 year non-profit research that did yield great medical scientific results in folding, cancer, anthrax, and a few other areas, all also had a double reason, and that was to prove the United Devices GRID platform could be effectively used by large companies to complete large scale tasks by their proprietary, for profit, software. And after they proved it with us in the non-profit realm, they agreed with the partners to stop the project so they could move on and sell the software to companies. They are still in business today as Univa and they are still selling that software. And in addition to selling the software after proving it, they also spawned the idea of medical research through computational GRID computing which primarily WCG and Folding@Home picked up and rolled with and which eventually led to GPUGrid itself. Without the pioneers in the for profit company doing non-profit work in medical sciences, we would not be here having this discussion. So the idea that you retain large amounts of people for the time when they are needed for something bigger by simply continually giving them work when their computers ask for it is a proven reason to make sure more people stay around FOR THE SCIENCE TO GET DONE when it is needed by more people in shorter time periods. Proving that you are the best in the group of options brings more scientists to you to get their work done. And that proof only comes by a consistent large computer base to feed work to. So to all that, maybe there is other solutions that would keep you happy on your concerns and meet the concerns I bring up (which you may not even see as valid, but I can't not see as valid, since they are valid to me and valid based on my own experience and observation) and that might be that maybe all original work is set to use the current amount of GPU, which is like 65-75% or the GOPU and then all validation work is set to use like 10-15% and last longer per unit and have longer expiration dates. That would allow for validation, allow for the validation work to not cost a ton more than (but yes, more than) not using the GPU at all, and would keep the work units flowing to GPUs so that the "well" (as I have been referring to the feed server) doesn't run "dry". And, of course, an option in your "GPUGRID preferences" that you could reject all "Validation work units" by simply unchecking or checking a check box. And the stats would be less than original work or maybe not, but also based on time taken to complete and whatever else other distribution of points rules there are now with the added variable that it is not original unit work. I mean, if validation is needed, and I obviously think it is, then I would definitely leave that checkbox always checked, regardless of less points and longer run times, simply because 1) I know people will opt out of it for points and 2) I know it is needed for the science to produce more accurate results. And i would only hope that my computer never got the same validation or original unit twice, because then it might not add the needed amount of jitter needed to properly validate the unit such as multiple computers would for the same work unit. I really think this either at the very least needs discussion and not just 2 sides OR it needs for the scientists themselves (or someone who represents them and knows how the work is done and validated at the computational back-end level) to explain why the work units do not need multiple runs on differing computer configurations in order to validate that the current single run/validation server configuration really produces 100% accurate results 100% of the time. 99% correct in medical science is just about 100% wrong when it reaches the publication level or when it reaches the application on human beings level. You need to know you got the right result or you have to assume you got the wrong one and need validation. All experiments need to be proven by duplication before they can be validated as fact. So even if not for the validation of the work units, the validation of the scientific process itself. Hypothesis leads to proving it once, then it is theory. Theory if duplicatable becomes fact. What we are currently doing, unless someone says otherwise is making theory out of hypothesis, but not duplicating it to prove it is fact. Has the scientific process changed since I was in high school? If not, we need validation and the idea of If the work units do not need validation, then your proposal is a horrible idea.is not valid. Please consider. TY |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
All my stupid humor aside with heating the house and feeding the GPUs, the work, I honestly believe, needs validation of more than one run. I really think this either at the very least needs discussion and not just 2 sides OR it needs for the scientists themselves (or someone who represents them and knows how the work is done and validated at the computational back-end level) to explain why the work units do not need multiple runs on differing computer configurations in order to validate that the current single run/validation server configuration really produces 100% accurate results 100% of the time. If I recall correctly, the scientists/admins here have previously explained, in a thread, why they set the minimum quorum at 1. They are intentionally not verifying the work, and they have their reasons. I'll try to dig up the thread for you, but I'd encourage you to do some digging. ... the idea of "If the work units do not need validation, then your proposal is a horrible idea." is not valid. Please consider. TY What I said is absolutely correct, because if they don't need validation, then your proposal would only serve to waste. By the way, there certainly is more than "electricity costs" involved here. In fact, I don't even care about the electricity costs. I care more about the harm that computing does to the environment! And so, my philosophy is that we should compute as efficiently as possible, and not waste, in order to preserve the environment. I appreciate your concerns, I really do, and I understand where they're coming from. Validation makes sense, and people who love this project love to keep their devices busy doing work for this project. However, I think you need to reconsider your priorities, realize that they will sometimes run out of work, and have backup projects (who also desperately request your devices' usage) ready with 0-resource-shares. I swear to you I'm trying to be helpful. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
TY on the 980s. I didn't realize that was happening over the past few days on that on GPU. I don't know why other than they happen on Windows Updates and other software updates days. I will see what I can do about that GPU. I am not sure that I am overclocking it, but I do use the MSI Afterburner to keep the temp down, so maybe that is also affecting the clock in some way. All 3 are reading the same settings and only the one is having that error and only since the 2nd. I set the "Core Clock" into negative numbers now by a few Mhz, so we shall see if that does something for the Errors. But that is not the "jitter" I was referring to, although overclocking is another issue not many computers back in the day had either except the "enthusiast". Now everything comes out of the box overclocked, it seems. I was referring to the fact that if I do something on my computer and you do the same thing on yours, the results may have a slightly different byte-for-byte result if it takes both computers hours to perform. So not "jitter" of one task on one computer, but the kind of differing results that differing computers would create ever so slightly differing results and getting at least 3 of each and the more the better, would help "triangulate" where and when that "jitter" of differing results did occur between the results. Trusting one result without learning the jitter factor of the result can't lead to anything but a result that needs to be compared to another for verification, thus validating it. The "jitter" I was referring to is the jitter between rigs, not the jitter inside one rig. (Thanks again for bringing those errors and solution to my attention.) |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you provide full make and model information for the GPUs, we could probably find out more about their clocks. I usually compare against the wiki pages, ie: http://en.wikipedia.org/wiki/GeForce_900_series Anyway, they were likely factory-overclocked, too high, for intense applications like GPUGrid. Recommend taking Core clock down, in -20 MHz intervals, until it runs without problem. If you have further questions about that issue, you can put it in a new thread. By the way, hopefully you saw my prior post. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This may have been the post I was thinking of. https://www.gpugrid.net/forum_thread.php?id=3918&nowrap=true#38847 Re-reading it, it doesn't explicitly say that the GPUGrid scientists would not benefit from changing the minimum quorum value to a value higher than 1, but I'm sure they have it set at 1 for a reason. I'll PM Stefan - maybe he'll chime in here :) |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I saw after I wrote mine and am searching now. So far I found https://www.gpugrid.net/forum_thread.php?id=2892#23767 and https://www.gpugrid.net/forum_thread.php?id=3248#27987 and https://www.gpugrid.net/forum_thread.php?id=699#6300 Two say they ran old work units during an empty queue in order to validate the work. Kind of my idea exactly. The other says the error reporting allows 4 errors and then on the 5th it drops the work unit and fails it out. This seems to say that 4 errors is the "jitter" threshold and would therefore mean 1-4 errors needs the validation. Then I think down in the depths of this one for 2009, if nothing has changed since then, which it probably has and even improved over what he promises and describes, it is at least AN answer, if not the one I was looking for from GDF: https://www.gpugrid.net/forum_thread.php?id=901#8041 But I still KINDA like ExtraTerrestrial Apes' "intersting opportunity" in Message 8144 down in there before GDF's answer too. lol Yes, I see your point and I will concede the discussion to you if in fact validation is not needed. Thank you for this also, as I didn't find it in my searches. https://www.gpugrid.net/forum_thread.php?id=3918&nowrap=true#38847 I am happy with this discussion, that at least I got my thoughts "on paper" as it were and got great feedback. I do apologize again for using my stupid idiomatic humor to state my thoughts clearly originally. I would appreciate Stefan's additional feedback to the discussed points, but again, I concede the points I made to be less important than the ones made to get the project to where it is today. This has obviously been not only brought up, but discussed in different ways and the current path was chosen for specific and better purposes and reasons. I will take Nate's reissues as oddities and the general single issue/error checking/validation server path to be the best for the science we are all participating in today. TY again and again. 1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!" Ephesians 6:18-20, please ;-) http://tbc-pa.org |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And many thanks for sharing what you found. It's useful and informative. :) I'm not looking to win. I just challenge your assumption that validation is needed here on GPUGrid. It's actually a great question, and I hope a scientist/admin chimes in with an explanation of their current approach. Edit: GDF's 2009 explanation is sufficient, in my mind, meaning that they are actively choosing not to do BOINC-based quorum validation. However, I hope Stefan does reply to my PM, and maybe makes a FAQ post regarding GPUGrid's decisions on validation and minimum quorum. So... Have you got those GPUs prepared to work on backup projects, yet? I think it might be time, as the well will indeed run dry here occasionally :) I am attached to 35+ projects, even though my GPUs are setup to work for GPUGrid exclusively (by me setting my other GPU projects either to "don't use NVIDIA" or "0-resource-share backup project"). ... and once you have it set up, you don't need to manually fiddle with it. BOINC really is meant to be set it and let it do it's thing! |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think I will continue to manually, as needed, turn on and off the other non-BOINC project that I do when GPUGrid has no work for them to do. That is, instead of other BOINC projects set to zero priorities inside the client. Oh, if only all projects were wrapped in BOINC, right? lol |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
...it's the dry season. Application Unsent In progress Short runs 0 482 Long runs 0 2,201 FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A new week starts but still no work... :( |
|
Send message Joined: 21 Feb 09 Posts: 497 Credit: 700,690,702 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Couple of things... I am sure there are many, like me, who would welcome a status report from the project team. The silence is deafening. I am now running einstein@home, but that is not a project aimed at health issues. Is there such a project that can take advantage of my GPU investment? |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Poem@Home might suit your purposes. I run many (~30) projects, across my NVIDIA GPUs and my CPUs. For my NVIDIA GPUs, they typically work on GPUGrid and Poem@Home, with equal resource share. For backup projects (0 resource share), I use SETI@Home and Einstein@Home. |
©2026 Universitat Pompeu Fabra