Message boards :
Graphics cards (GPUs) :
Redundant results
Message board moderation
Previous · 1 · 2
| Author | Message |
|---|---|
|
Send message Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You started none of this workunits. You lost no time. ignasi |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@Talknuser Your post sounds as if you take the 0.05 CPU (5%) from the BOINC manager. This is just a number whose meaning I can not figure out (i.e. how it's generated, in earlier versions it was set by the project, but now it seems to be different on a per-host basis). To get the actual cpu usage you'd have to take a look at your task manager. Under linux I'd open a console and type "top", if I look for a task with relatively high cpu usage (should be the case). Now there should be a list of running tasks and I think the cpu usage is displayed per cpu core, i.e. with a quad core you can have 4 tasks at 100% each. Look for the GPU-Grid task (aecmd-something I think). I suppose you'll see between 30 and 50% usage of one cpu core. If you can't find the task, but you know a part of its name, you could use grep to search the output.. forgot the syntax, though. Oh, and it could be that modern linuxs also have some kind of task manager, which could be more convenient. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 19 Feb 09 Posts: 37 Credit: 30,657,566 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If its a KDE based distro try ksysguard its fairly good i use it on Kubuntu. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@Zydor & Michael There is no such thing as "instruction set intelligence".. that would devaluate the miracle which our brain is a little too much ;) However, there is such a thing as instruction set complexity. And the ability to execute complex instructions. And the ability to execute complex flow control instructions. The latter is what the CPU is made for: deal with all those branches and conditions (if, while etc.) quickly. Current GPUs also support such instructions (doesn't matter to which extend), but they are much slower at executing these than they execute "regular" code. If one wanted to make them more efficient for such code one would end up with an i7 with a wider vector unit attached. Or at least a Pentium 1 with a wide vector unit. Uh, sting me a Larrabee if we ever actually get a chip like that.. MrS Scanning for our furry friends since Jan 2002 |
X1900AIWSend message Joined: 12 Sep 08 Posts: 74 Credit: 23,566,124 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
589184 Redundant while crunched on it for hours ? O.k. I had a good run with the settings a few days, but I´ll switch this host now to folding@home. CPU time 2374.787 ... Outcome Redundant result Client state Cancelled by server Exit status -221 (0xffffffffffffff23) ... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x77E6000C |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Someone from the project team cancelled too many WUs while trying to fix the download problems (which were due to outdated WUs). MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 10 Jan 09 Posts: 3 Credit: 114,473,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here are my observations regarding the Redundant result issue: It seems the "IBUCH"-workunits are send out twice and whoever finishes the WU first gets credit and the other participant gets a cancelled by server/Redundant result error and gets no credit, even if that participant reports his/her result minutes later and well before the deadline. Example: http://www.gpugrid.net/workunit.php?wuid=466213 Note the "initial replication" parameter = 2. All the other WUs have an initial replication of 1. This is not fair for participants with older cards who will always loose out against the GTX 295's and will never get any credit for these type of WUs. I had several of these cases happen, so now I manually abort these type of WUs when I happen to notice one of them in my queue. It's ok to send to same WU to several participants, the Seti@HOME project does that by default, but everybody who completes the WU in time should get the credit he/she deserves. |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
This was cleared up a while back, there was a suspicion that was happening, but in fact it was shown at that time that the WU in question had not started on the machine in question. If it has returned as an issue I suspect they will jump on it, as that is not "as designed". The server cancel facility is only designed to run on WUs that have not started on a machine. Those that have started, still get the credit if successfully completed. If they get cancelled in mid-crunch then a bug has surfaced, the principle of complete if started was the intent when the facility was first implemented. Regards Zy |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
100% agreed with Zydor. Jurgen, how do you know that "that participant reports his/her result minutes later and well before the deadline"? Sure, the logged completion time is shortly after the first result is returned. But nowhere is it saying that work had laready started. Note the exactly 0s of cpu time.. even if WUs error out instantaneously they mostly register 1 - 3s of cpu time. So it looks like this result was aborted before it had started. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 10 Jan 09 Posts: 3 Credit: 114,473,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've seen this happen; an WU was 90+% complete, but when I check an hour later, somebody else reported results 10 minutes before my WU completed and I got the old "Redundant Result" stuff. I just received another of these WUs, # 475454. http://www.gpugrid.net/workunit.php?wuid=475454 So for the record: processing has started; I'm at 1%... I made a screenshot, not sure how to upload pictures. The WU was also sent to another participant with an GTX 295 - I'll be creamed for sure. ;-) Will babysit to see what happens and post a follow up. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'll be creamed for sure. ;-) Doesn't look that spectacular for now ;) MrS Scanning for our furry friends since Jan 2002 |
Maurice GouloisSend message Joined: 22 Feb 09 Posts: 10 Credit: 103,904,673 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi there, I'm experiencing such a suspect behaviour on my own. I have one machine attached to GPUGRID since months and I've recently removed the SETI project because of the sluggishness that it puts on my system. On that matter the GPUGRID is much better in not disturbing the other activities I have on this PC. So since about 10 days, this PC is dedicated to GPUGRID, and since then I've got more than a half of my WUs cancelled as "redundant results" and no credit. The problem is that this machine runs 24/7 GPUGRID (with a 8800GT which takes about a day to complete most WUs). As a test on the cancellation of started WUs, I've just suspended the current running WU to force the second one to start and then reverted so that it continues the first one with an earlier deadline. I'll see how these two ones behave after upload and maybe cancellation. I'll let you know about. Regards
|
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Could you make sure that your 2nd WU checkpointed at least once? I.e. when you shut down the BOINc client and restart it should not start at 0.000% again. Is it this WU? You can watch your wingman: after he returns his WU your WU would be finished the next time you contact the scheduler - if it has not started yet. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 10 Jan 09 Posts: 3 Credit: 114,473,253 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've seen this happen; an WU was 90+% complete, but when I check an hour later, somebody else reported results 10 minutes before my WU completed and I got the old "Redundant Result" stuff. I just received another of these WUs, # 475454. Update: the test was inconclusive, as I was the first user to succesfully finished crunching the WU. The other participant still shows as "In Progress"... only if the status changes to "success" and credits are also awarded we can conclude that there aren't any issues. I did notice that for some WUs that were distributed to more than one user, credit got awarded to more than one user, so at this time I now concur with Zydor that all works fine. |
Maurice GouloisSend message Joined: 22 Feb 09 Posts: 10 Credit: 103,904,673 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi there again, my problem is related to the task 25-KASHIF_HIVPR_n1_for_ba3-9-100-RND6818 (http://www.gpugrid.net/workunit.php?wuid=467482) that was blocking at 24.820% and avoiding the other tasks to start. I've cancelled it and I'll keep an eye on the next days.
|
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
... only if the status changes to "success" and credits are also awarded we can conclude that there aren't any issues. I did notice that for some WUs that were distributed to more than one user, credit got awarded to more than one user, so at this time I now concur with Zydor that all works fine. Well.. no. We already know that the system works as expected most of the time. For example take a look at my results.. there are a few redundant results, but the WU return times are so regular that I don't think the machine wasted any time on them. And there are quite a few succesful returns from 2 hosts and both got credit. The point was that people reported "I've been watching it and I know something went wrong". So we need to confirm an error, otherwise we still know it's fine :) (in one case I analyzed the tasks in detail and we found out that actually everything had been alright.. but now there were 1 or 2 new reports) Edit: Maurice, did you try restarting BOINC? If not you may want to try that first if another task hangs. Sometimes that's enough to get it going until it finishes. MrS Scanning for our furry friends since Jan 2002 |
Maurice GouloisSend message Joined: 22 Feb 09 Posts: 10 Credit: 103,904,673 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, I confirm that my problem was related to the blocked WU, everything ok since its cancellation.
|
©2025 Universitat Pompeu Fabra