Message boards :
Number crunching :
No Bonus for finishing within 24 hours
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 26 Aug 11 Posts: 100 Credit: 2,889,109,686 RAC: 424,927 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My "Long runs" cases: The two WU's that have lower credit have occurred for the following reasons:- Task 6295312 - It was a re-issued WU and somebody else returned the same WU before you- you didn't qualify for the 24Hr bonus. Task 6281411 - you missed the 24 hour bonus deadline by 4 minutes. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This has happened again, see link: http://www.gpugrid.net/workunit.php?wuid=4432240 The 8th time for me, though it's been almost a year since the last time. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This time, I didn't get the bonus for 2 units, on 2 computers, on the same day. See links below: http://www.gpugrid.net/workunit.php?wuid=4479055 http://www.gpugrid.net/workunit.php?wuid=4477522 |
|
Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here, there must be a bug; http://www.gpugrid.net/result.php?resultid=6911246 WU completed within 24hrs yet only 111k v the usual 167k Given Boinc uses MySQL, would have thought it would be pretty straight forward to rectify? |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
In all 3 cases you fell victim to this old bug: TheFiend, a few posts above yours wrote: It was a re-issued WU and somebody else returned the same WU before you- you didn't qualify for the 24Hr bonus. Sorry! MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 4 Oct 12 Posts: 53 Credit: 333,467,496 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
hmm, not very reassuring, a year and a half since the first post and no solution. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think you could minimize the chance of this happening to your hosts by setting your workunit queue to as low as possible (0.1 or 0.01 days). Maybe you have done it already, because I see your hosts have only as many workunits in progress as many GPUs they have. It used to happen to my hosts as well, but I don't mind. This is not a big deal. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I already have it set at .05 days, which means that I have a little more than 1 hour between the time the current unit finishes and the new unit starts crunching, and some of these units take up to 12 hours to complete on a windows 7 computer. I don't think, cutting the margin down further is going to help the situation that much. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
If one sees the bonus credit as actual "bonus", things are not that bad. Sure, it's never nice not to get the bonus deespite deserving it, but the actual credit calculation was done for the base credits without bonus (actually, there's another bonus for choosing the long-runs over short-runs). MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I thought I resurrect a thread from the past about a problem that still annoys me. People get credit for late finishing tasks, while other people who were issued that task not getting full bonus for finishing that task with 24 hours, just because the first person finished before the second person, though after the 5 day deadline. Yes, this a minor issue, but it is woefully unfair and annoying. It makes the rules meaningless, and the project looks bad. name e26s102_e17s57p0f25-GIANNI_D3C36bCHL1-0-1-RND0879 application Long runs (8-12 hours on fastest card) created 28 Aug 2016 | 5:46:39 UTC canonical result 15258535 granted credit 351,400.00 minimum quorum 1 initial replication 1 max # of error/total/success tasks 7, 10, 6 Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 15258535 231723 28 Aug 2016 | 10:16:46 UTC 3 Sep 2016 | 6:53:42 UTC Completed and validated 479,614.86 32,224.01 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) 15266311 263612 2 Sep 2016 | 11:24:37 UTC 3 Sep 2016 | 10:32:59 UTC Completed and validated 64,702.78 64,575.41 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) http://www.gpugrid.net/workunit.php?wuid=11709767 |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
It happened on one of my hosts lately, and I've checked my other hosts to find such tasks. While doing this I came to the conclusion that my previous advice was wrong (to set a very short queue), and if you want to minimize the number of such tasks on your host you should set a cache size to the time it takes to process a workunit (or longer, but it doesn't matter). In this case there's a chance that the first host returns the result while it's in your queue, and the server will cancel it on your host. This way you can give the original host another 8-12 hours to process the workunit. This can be done only on the fastest hosts, as if you have a GPU which can finish a task just a little under 24h, then you can't have a +1 task long queue because you'll miss the deadline of the full bonus. This is an annoying bug, but it affects everyone to the same extent, so it does not cause imbalance in credit earnings. |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have noticed one task about a week back now that I knew went over the 5 day period and saw it got credit. I was confused but since it only happened once, I thought there might just be a reporting issue and might even get corrected as time moved on to remove the credit. Now I see the project gives the credit if returned late and I probably stole some points from someone who got the task after mine reached the 5 days and before I reported. I will keep this in mind on my weaker systems and make sure if a task will not complete, dump it early and let someone else get full credit. I just dumped one today after a day of running because it was going to go about 6 days. Had I known in the past that the project would still give credit, I would have dumped less, but not also with this credit averaging oddity as well. 1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!" Ephesians 6:18-20, please ;-) http://tbc-pa.org |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here is another example: http://www.gpugrid.net/workunit.php?wuid=11707537 15254583 358064 25 Aug 2016 | 15:03:32 UTC 30 Aug 2016 | 15:03:32 UTC Timed out - no response 0.00 0.00 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65) 15262169 158961 30 Aug 2016 | 15:34:38 UTC 4 Sep 2016 | 17:01:02 UTC Completed and validated 148,016.14 32,485.84 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) 15269533 263612 4 Sep 2016 | 16:30:31 UTC 4 Sep 2016 | 20:11:04 UTC Aborted by user 10,646.69 10,617.28 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65) If a crunchers misses the 5 day deadline, that task should be canceled before being sent out to the next host. Since there is no point to redundancy, I aborted the task, so I can crunch the next task. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Bedrich, When cruncher A misses the deadline, and cruncher B gets their copy, cruncher A's task is intentionally not cancelled. This is because it is possible that A will be Completed and Validated, even before B starts the task, and there is support for the server cancelling B's task in that situation. If anything, I'd like to see one of the tasks get cancelled, when the other has been Completed and Validated, regardless of whether it has been started. If GPUGrid isn't doing that, they should! |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Bedrich, We get really really whiny when running tasks are cancelled. Few projects do it anymore. ;-) I'd say cancel the WU that's past 5 days. The deadline is clear. Violators get madame guillotine! |
caffeineyellow5Send message Joined: 30 Jul 14 Posts: 225 Credit: 2,658,976,345 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Bedrich, I agree with Jacob on this or to give both full credit. I mean what you are saying is that letting someone else's computer do 5 day and 3 hours worth of work is less important than letting yours do 22 hours worth of work. And on top of that importance, it is more important that you not waste 3 hours worth of work, assuming it starts as soon as it is downloaded and doesn't sit in queue, instead of still the more important of the two. Bottom line, if I have a WU that the server takes back forcibly after a few hours and it wasn't near finished or half done and the reason was that someone had returned it faster who had been working on it for almost a week, then give it to them. Based on the nature of needing a 5 day deadline for the students doing the work after we give them results, the faster the result can be returned, the better as well. Why make me keep it for another day when someone just gave it back to you complete? Just get it in the server and start working on the results desired from the project instead of waiting for mine and rejecting the one already there. The only issue here is the credit. And if you can live with the credit glitch, then the way its done now is fine and cancelling the second one after the first one comes in is more desirable for the project. I can see the argument for cancelling the first one as the second one is being sent out at exactly the 5 day deadline since there is in fact a stated deadline, new action is being taken to fix the deadline being past, and the project does run on needing the WUs back in reasonable amounts of time and 5 days has been made that time. Students need to make progress on their school work and a week is a good time to wait for results. That makes sense. But then waiting a "potential" second week for a second volunteer to finish some work someone gave you after a week and a few hours makes less sense as well. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The way the BOINC manager & the task scheduler works is incoherent with the way you want tasks to be canceled (1), but the project still could modify its app to achieve the desired behavior (2). (1) Cancelling a task by the server needs communication between the server and the BOINC manager, but it's the BOINC manager's job to initiate this communication, and it does it only when it's needed. The reasons for calling the server (requesting new work to fill the queue, sending in the files of the result, reporting the result) does not include that if the task is in overdue, only if it does not started until the deadline (in this case the BOINC manager will cancel it on its own right, then will report it to the server as it now fits the reasons for calling the server). (2) The app should be modified to assess its own progress (compared to real world time, not to processing time), and cancel itself if the progress is too slow (either because the GPU is too slow, or the task does not allowed to run frequently enough to meet the 5-day deadline) Some of your suggestions require time travel at the given circumstances, as the workflow of this annoying behavior is the following: 1. the slow host requests work from the scheduler 2. the scheduler assigns the work for the host, and sends the files 3. the host begin processing 4. over 1 day the bonus is reduced to 25% 5. over 2 days the bonus is reduced to 0% 6. over 5 days the workunit sent out to another host 7. the slow host finish the task and sends in the result, so the credit given for the task has 0% bonus. It is accepted by the server, because the task is still active, as the second host did not finished it. 8. the fast host finish the task and sends in the result, but as the credit is already assigned to this result, the fast host will receive the same amount as the slow host (0% bonus), as there can be only one credit assigned to a given task. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 95,752 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Let’s just extend the deadlines and be honest about this. It’s a slippery slope. If you reward tardiness, and you are encouraging it. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
IMO GPUGrid should keep looking for a fix/work around; perhaps enable partial job progress reporting and use the trickle up system to delay a resend when steady progress (within reason) is observed or alter the minimum quorum when there is no reported progress after 48h to ensure full credit. If a task is sitting doing nothing on someone's computer for 2days without starting - just abort it and block the host with a Notice saying why. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2026 Universitat Pompeu Fabra