Advanced search

Message boards : Server and website : Daily result quote

Author Message
gianni
Send message
Joined: 11 Jul 08
Posts: 18
Credit: 105,098
RAC: 0
Level

Scientific publications
watwatwat
Message 43753 - Posted: 9 Jun 2016 | 9:02:02 UTC

<daily_result_quota> N </daily_result_quota>
Each host has a field MRD in the interval [1 .. daily_result_quota]; it's initially daily_result_quota, and is adjusted as the host sends good or bad results. The maximum number of jobs sent to a given host in a 24-hour period is MRD*(NCPUS + GM*NGPUS). You can use this to limit the impact of faulty hosts.

My understanding is that this option set the limit of good + bad results per day.
One user claims that there is no limit as long as the results are good.
Do you know?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,603,661,851
RAC: 8,790,049
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43754 - Posted: 9 Jun 2016 | 10:56:51 UTC - in response to Message 43753.
Last modified: 9 Jun 2016 | 11:16:30 UTC

I think that's effectively the case. The purpose of the quota is to limit the damage caused by faulty hosts - quota is reduced every time a task fails, and reaches a minimum of 1 per day. That allows a user to fix a faulty computer and re-start processing valid tasks.

Since quota is incremented every time a returned task validates, users at this project can always get a new task shortly (30 seconds) after reporting a successful run. At other projects, where validation relies on comparison with a second result, the quota may not increase immediately.

There is a second quota mechanism, associated with the runtime estimation component of CreditNew (mentioned briefly in CreditNew under database changes, but without further documentation. It's active here, and the values can be seen in the application details for each host). There's a table of 'host_app_version' with a field for 'max_jobs_per_day'. Those limits tend to be more generous than the daily_result_quota, and I don't know whether there's a defined order of precedence. The general view on project message boards is that the 'max_jobs_per_day' tool is unsuccessful at limiting faulty hosts, but I don't know if any project administrator has ever tested that theory directly with David Anderson.

Hope that helps.

Edit - there's a note in Trouble-shooting the job pipeline about using <debug_quota/> to show details of quota enforcement.

I don't know if there's a similar tool for max_jobs_per_day (or if max_jobs_per_day enforcements are logged by <debug_quota/>) - I suspect the answer is 'no' in both cases, but I can ask David Anderson if you like.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43755 - Posted: 9 Jun 2016 | 11:14:51 UTC - in response to Message 43753.
Last modified: 9 Jun 2016 | 11:19:38 UTC

From observation of my hosts application details and others.

First thing you need to know it is "Application Specific" If a host runs both long and short WUs they have a baseline of 10 on both applications or indeed, if you change the App everbodys goes back to the baseline of 10.


Baseline = 10
3 consecutive errored WUs are returned it goes down to 7
1 valid result returned sends it immediately back to 10
Every consecutive valid result adds +1
Return 20 consecutive valid results sends it to 30
1 errored result will return it immediately back to 10 and then -1 for each consecutive errored result.



When you send 10 errored results this will reduce you MAX WU to 1 per day but the server will actually send you 2 a day. (Don't ask me why)

Thing to notice is that no matter how many valid results you return one error will send you back to baseline 10

In addition, if you are on 1 a day because of sending errors 1 valid unit will return you to baseline 10

Another thing to note is an "error" includes:

Aborting a WU
Server cancellation of a WU

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,603,661,851
RAC: 8,790,049
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43756 - Posted: 9 Jun 2016 | 11:20:19 UTC - in response to Message 43755.

From observation of my hosts application details ...

Thanks. That's very clear description of the 'max_jobs_per_day' version of the mechanism - but if you search the server Wiki for the word 'quota', you don't find it.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43757 - Posted: 9 Jun 2016 | 11:30:32 UTC - in response to Message 43756.

From observation of my hosts application details ...

Thanks. That's very clear description of the 'max_jobs_per_day' version of the mechanism - but if you search the server Wiki for the word 'quota', you don't find it.


Hi Richard,

Unfortunately I am not a BOINC expert, in fact not anywere near.

I believe you have worked on BOINC in the past and maybe you still do so may be able to help with this problem.

I think reducing MAX GPU to 4 would not hurt anyone that sends mostly valid WUs but restrict WUs going to bad hosts.

It doesn't cure the problem but at least mitigates it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,603,661,851
RAC: 8,790,049
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43758 - Posted: 9 Jun 2016 | 11:50:44 UTC

After a quick peek at the code, it does look as if max_jobs_per_day movements are logged by <debug_quota/>.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,603,661,851
RAC: 8,790,049
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43763 - Posted: 10 Jun 2016 | 10:56:11 UTC - in response to Message 43753.
Last modified: 10 Jun 2016 | 11:16:13 UTC

Each host has a field MRD ...

We've found a note in the comments starting

https://github.com/BOINC/boinc/blob/master/db/boinc_db_types.h#L335:

"// DEPRECATED: only use is -1 means host is blacklisted", but that hasn't been documented in http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Joblimits. Continuing to search for the replacement.

Edit - it does look as if <daily_result_quota> can be used globally to limit the maximum number of tasks sent to each host attached to the project - but the stuff about an MRD for a single host is a bit misleading - you can't restrict individual hosts that way, unless you go all the way down to -1 and blacklist the host. Gianni, what were you actually trying to do with this field?

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 43794 - Posted: 19 Jun 2016 | 16:50:26 UTC - in response to Message 43755.

From observation of my hosts application details and others.

First thing you need to know it is "Application Specific" If a host runs both long and short WUs they have a baseline of 10 on both applications or indeed, if you change the App everbodys goes back to the baseline of 10.


Baseline = 10
3 consecutive errored WUs are returned it goes down to 7
1 valid result returned sends it immediately back to 10
Every consecutive valid result adds +1
Return 20 consecutive valid results sends it to 30
1 errored result will return it immediately back to 10 and then -1 for each consecutive errored result.



When you send 10 errored results this will reduce you MAX WU to 1 per day but the server will actually send you 2 a day. (Don't ask me why)

Thing to notice is that no matter how many valid results you return one error will send you back to baseline 10

In addition, if you are on 1 a day because of sending errors 1 valid unit will return you to baseline 10

Another thing to note is an "error" includes:

Aborting a WU
Server cancellation of a WU


Error rate can hardly can get out of an "orange" figure. Don't you think you could adopt a more "proactive" approach and set MAX WU to 4 and MIN to 0.

I say Min 0 because setting to 1 gives 2 and setting to -1 gives 0 so setting to 0 must give 1 ( pure logic) Impatient.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 44075 - Posted: 30 Jul 2016 | 23:28:26 UTC - in response to Message 43794.

After all this time you are not listening, are you? rhetorical.

Post to thread

Message boards : Server and website : Daily result quote

//