Advanced search

Message boards : Number crunching : Workunit error - check skipped

Author Message
Profile Marty
Avatar
Send message
Joined: 8 Nov 08
Posts: 3
Credit: 241,804,865
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7639 - Posted: 19 Mar 2009 | 15:58:11 UTC

I completed the following WU without error (at least by http://www.gpugrid.net/workunit.php?wuid=305073):
http://www.gpugrid.net/result.php?resultid=413618

But i didn't get credit for it. Ok somebody else was faster, mainly because he got the WU sent earlier.

Why was the second WU sent out if first didn't error out or was overtime ?

Clownius
Send message
Joined: 19 Feb 09
Posts: 37
Credit: 30,657,566
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 7640 - Posted: 19 Mar 2009 | 16:08:31 UTC
Last modified: 19 Mar 2009 | 16:09:46 UTC

Looks like the server is setup to only take one good result and grant credit. Looks like the first cruncher timed out and it was resent to you but they still returned first and got the credit.
The workunit is now listed as an error due to too many results. If your lucky someone will manually credit it. But who knows.

Edit: Click able link to WU
http://www.gpugrid.net/workunit.php?wuid=305073

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 7644 - Posted: 19 Mar 2009 | 19:33:01 UTC - in response to Message 7640.

This certainly shouldn't be.
We set up for the first time a boinc parameter that creates n (2 in our case) results for a single WU. This improves by 50% the turnover for us. However, we checked the documentation and we understood that every result would get credit independently of the order of arrival.

If that's not the case we'll cancel these WUs.

Our intention is to grant the 100% of the successful results done within the 4 days.

ignasi

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7645 - Posted: 19 Mar 2009 | 19:36:13 UTC - in response to Message 7644.

I think that this is due to the a bug in BOINC.
I will check with D. Anderson.

gdf

Profile Marty
Avatar
Send message
Joined: 8 Nov 08
Posts: 3
Credit: 241,804,865
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 7649 - Posted: 19 Mar 2009 | 21:38:02 UTC - in response to Message 7644.
Last modified: 19 Mar 2009 | 21:41:04 UTC

This certainly shouldn't be.
We set up for the first time a boinc parameter that creates n (2 in our case) results for a single WU. This improves by 50% the turnover for us. However, we checked the documentation and we understood that every result would get credit independently of the order of arrival.

If that's not the case we'll cancel these WUs.

Our intention is to grant the 100% of the successful results done within the 4 days.

ignasi


I don't think this has to do with this new parameter since the initial replication shows up as 1. The WUs with the new parameter show up with initial replication of 2, in my opinion. I got a couple of those today.
I think the explanation given by Clownius is more likely what happened.
Unfortunate but hard to avoid if there the gap between overrun and resend is very small for WUs with runtimes between several hours and several days (depending on the GPU).

Clownius
Send message
Joined: 19 Feb 09
Posts: 37
Credit: 30,657,566
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwat
Message 7657 - Posted: 20 Mar 2009 | 2:08:36 UTC

minimum quorum 1
initial replication 1
max # of error/total/success tasks 3, 1, 6
errors Too many total results

I think you need to up the max number of total tasks. Basically with this WU once the second one went out someone was bound to loose out. That said i don't know server side BOINC real well. Just going by the error message on the workunit.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 7663 - Posted: 20 Mar 2009 | 10:00:43 UTC - in response to Message 7657.

It's a problem with the max_target_result configuration.
All the new ones will be fine.

Thanks for reporting it.

gdf

Post to thread

Message boards : Number crunching : Workunit error - check skipped

//