Redundant results

Message boards : Graphics cards (GPUs) : Redundant results
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
ignasi

Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 8866 - Posted: 24 Apr 2009, 19:57:21 UTC - in response to Message 8861.  

You started none of this workunits.
You lost no time.

ignasi
ID: 8866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8896 - Posted: 25 Apr 2009, 12:52:49 UTC

@Talknuser

Your post sounds as if you take the 0.05 CPU (5%) from the BOINC manager. This is just a number whose meaning I can not figure out (i.e. how it's generated, in earlier versions it was set by the project, but now it seems to be different on a per-host basis).

To get the actual cpu usage you'd have to take a look at your task manager. Under linux I'd open a console and type "top", if I look for a task with relatively high cpu usage (should be the case). Now there should be a list of running tasks and I think the cpu usage is displayed per cpu core, i.e. with a quad core you can have 4 tasks at 100% each. Look for the GPU-Grid task (aecmd-something I think). I suppose you'll see between 30 and 50% usage of one cpu core.

If you can't find the task, but you know a part of its name, you could use grep to search the output.. forgot the syntax, though. Oh, and it could be that modern linuxs also have some kind of task manager, which could be more convenient.

MrS
Scanning for our furry friends since Jan 2002
ID: 8896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Clownius

Send message
Joined: 19 Feb 09
Posts: 37
Credit: 30,657,566
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 8900 - Posted: 25 Apr 2009, 13:04:55 UTC

If its a KDE based distro try ksysguard its fairly good i use it on Kubuntu.
ID: 8900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8901 - Posted: 25 Apr 2009, 13:06:59 UTC

@Zydor & Michael

There is no such thing as "instruction set intelligence".. that would devaluate the miracle which our brain is a little too much ;)

However, there is such a thing as instruction set complexity. And the ability to execute complex instructions. And the ability to execute complex flow control instructions. The latter is what the CPU is made for: deal with all those branches and conditions (if, while etc.) quickly. Current GPUs also support such instructions (doesn't matter to which extend), but they are much slower at executing these than they execute "regular" code.

If one wanted to make them more efficient for such code one would end up with an i7 with a wider vector unit attached. Or at least a Pentium 1 with a wide vector unit. Uh, sting me a Larrabee if we ever actually get a chip like that..

MrS
Scanning for our furry friends since Jan 2002
ID: 8901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile X1900AIW

Send message
Joined: 12 Sep 08
Posts: 74
Credit: 23,566,124
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 9000 - Posted: 27 Apr 2009, 17:22:00 UTC - in response to Message 8678.  

589184
Redundant while crunched on it for hours ? O.k. I had a good run with the settings a few days, but I´ll switch this host now to folding@home.

CPU time	2374.787
...
Outcome	Redundant result
Client state	Cancelled by server
Exit status	-221 (0xffffffffffffff23)
...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x77E6000C
ID: 9000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9029 - Posted: 27 Apr 2009, 21:44:09 UTC

Someone from the project team cancelled too many WUs while trying to fix the download problems (which were due to outdated WUs).

MrS
Scanning for our furry friends since Jan 2002
ID: 9029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jurgen

Send message
Joined: 10 Jan 09
Posts: 3
Credit: 114,473,253
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9960 - Posted: 19 May 2009, 3:13:34 UTC

Here are my observations regarding the Redundant result issue:

It seems the "IBUCH"-workunits are send out twice and whoever finishes the WU first gets credit and the other participant gets a cancelled by server/Redundant result error and gets no credit, even if that participant reports his/her result minutes later and well before the deadline.

Example: http://www.gpugrid.net/workunit.php?wuid=466213

Note the "initial replication" parameter = 2. All the other WUs have an initial replication of 1.

This is not fair for participants with older cards who will always loose out against the GTX 295's and will never get any credit for these type of WUs. I had several of these cases happen, so now I manually abort these type of WUs when I happen to notice one of them in my queue.

It's ok to send to same WU to several participants, the Seti@HOME project does that by default, but everybody who completes the WU in time should get the credit he/she deserves.

ID: 9960 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 9962 - Posted: 19 May 2009, 9:32:23 UTC - in response to Message 9960.  
Last modified: 19 May 2009, 9:39:36 UTC

This was cleared up a while back, there was a suspicion that was happening, but in fact it was shown at that time that the WU in question had not started on the machine in question. If it has returned as an issue I suspect they will jump on it, as that is not "as designed".

The server cancel facility is only designed to run on WUs that have not started on a machine. Those that have started, still get the credit if successfully completed. If they get cancelled in mid-crunch then a bug has surfaced, the principle of complete if started was the intent when the facility was first implemented.

Regards
Zy
ID: 9962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9972 - Posted: 19 May 2009, 19:40:34 UTC

100% agreed with Zydor.

Jurgen, how do you know that "that participant reports his/her result minutes later and well before the deadline"? Sure, the logged completion time is shortly after the first result is returned. But nowhere is it saying that work had laready started. Note the exactly 0s of cpu time.. even if WUs error out instantaneously they mostly register 1 - 3s of cpu time. So it looks like this result was aborted before it had started.

MrS
Scanning for our furry friends since Jan 2002
ID: 9972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jurgen

Send message
Joined: 10 Jan 09
Posts: 3
Credit: 114,473,253
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 9984 - Posted: 20 May 2009, 1:23:07 UTC - in response to Message 9972.  

I've seen this happen; an WU was 90+% complete, but when I check an hour later, somebody else reported results 10 minutes before my WU completed and I got the old "Redundant Result" stuff. I just received another of these WUs, # 475454.

http://www.gpugrid.net/workunit.php?wuid=475454

So for the record: processing has started; I'm at 1%... I made a screenshot, not sure how to upload pictures. The WU was also sent to another participant with an GTX 295 - I'll be creamed for sure. ;-)

Will babysit to see what happens and post a follow up.

ID: 9984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10018 - Posted: 21 May 2009, 9:46:57 UTC - in response to Message 9984.  

I'll be creamed for sure. ;-)


Doesn't look that spectacular for now ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 10018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Maurice Goulois
Avatar

Send message
Joined: 22 Feb 09
Posts: 10
Credit: 103,904,673
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10060 - Posted: 22 May 2009, 2:57:11 UTC

Hi there,

I'm experiencing such a suspect behaviour on my own. I have one machine attached to GPUGRID since months and I've recently removed the SETI project because of the sluggishness that it puts on my system. On that matter the GPUGRID is much better in not disturbing the other activities I have on this PC.

So since about 10 days, this PC is dedicated to GPUGRID, and since then I've got more than a half of my WUs cancelled as "redundant results" and no credit. The problem is that this machine runs 24/7 GPUGRID (with a 8800GT which takes about a day to complete most WUs).

As a test on the cancellation of started WUs, I've just suspended the current running WU to force the second one to start and then reverted so that it continues the first one with an earlier deadline. I'll see how these two ones behave after upload and maybe cancellation.

I'll let you know about.

Regards
ID: 10060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10069 - Posted: 22 May 2009, 21:04:09 UTC - in response to Message 10060.  

Could you make sure that your 2nd WU checkpointed at least once? I.e. when you shut down the BOINc client and restart it should not start at 0.000% again.

Is it this WU? You can watch your wingman: after he returns his WU your WU would be finished the next time you contact the scheduler - if it has not started yet.

MrS
Scanning for our furry friends since Jan 2002
ID: 10069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jurgen

Send message
Joined: 10 Jan 09
Posts: 3
Credit: 114,473,253
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10078 - Posted: 23 May 2009, 1:04:58 UTC - in response to Message 9984.  

I've seen this happen; an WU was 90+% complete, but when I check an hour later, somebody else reported results 10 minutes before my WU completed and I got the old "Redundant Result" stuff. I just received another of these WUs, # 475454.

http://www.gpugrid.net/workunit.php?wuid=475454

So for the record: processing has started; I'm at 1%... I made a screenshot, not sure how to upload pictures. The WU was also sent to another participant with an GTX 295 - I'll be creamed for sure. ;-)

Will babysit to see what happens and post a follow up.



Update: the test was inconclusive, as I was the first user to succesfully finished crunching the WU. The other participant still shows as "In Progress"... only if the status changes to "success" and credits are also awarded we can conclude that there aren't any issues. I did notice that for some WUs that were distributed to more than one user, credit got awarded to more than one user, so at this time I now concur with Zydor that all works fine.
ID: 10078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Maurice Goulois
Avatar

Send message
Joined: 22 Feb 09
Posts: 10
Credit: 103,904,673
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10080 - Posted: 23 May 2009, 2:54:04 UTC

Hi there again,

my problem is related to the task 25-KASHIF_HIVPR_n1_for_ba3-9-100-RND6818 (http://www.gpugrid.net/workunit.php?wuid=467482) that was blocking at 24.820% and avoiding the other tasks to start. I've cancelled it and I'll keep an eye on the next days.


ID: 10080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10089 - Posted: 23 May 2009, 14:45:18 UTC - in response to Message 10078.  
Last modified: 23 May 2009, 14:46:45 UTC

... only if the status changes to "success" and credits are also awarded we can conclude that there aren't any issues. I did notice that for some WUs that were distributed to more than one user, credit got awarded to more than one user, so at this time I now concur with Zydor that all works fine.


Well.. no. We already know that the system works as expected most of the time. For example take a look at my results.. there are a few redundant results, but the WU return times are so regular that I don't think the machine wasted any time on them. And there are quite a few succesful returns from 2 hosts and both got credit.

The point was that people reported "I've been watching it and I know something went wrong". So we need to confirm an error, otherwise we still know it's fine :)
(in one case I analyzed the tasks in detail and we found out that actually everything had been alright.. but now there were 1 or 2 new reports)

Edit: Maurice, did you try restarting BOINC? If not you may want to try that first if another task hangs. Sometimes that's enough to get it going until it finishes.

MrS
Scanning for our furry friends since Jan 2002
ID: 10089 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Maurice Goulois
Avatar

Send message
Joined: 22 Feb 09
Posts: 10
Credit: 103,904,673
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10183 - Posted: 26 May 2009, 10:42:39 UTC - in response to Message 10080.  

Hi,

I confirm that my problem was related to the blocked WU, everything ok since its cancellation.


ID: 10183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Graphics cards (GPUs) : Redundant results

©2025 Universitat Pompeu Fabra