No Bonus for finishing within 24 hours

Author	Message
Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 44435 - Posted: 6 Sep 2016, 3:00:35 UTC - in response to Message 44430. Last modified: 6 Sep 2016, 3:01:30 UTC Let’s just extend the deadlines and be honest about this. It’s a slippery slope. If you reward tardiness, and you are encouraging it. At some projects if a WU is being run but is going to be late a post on the forum will get an extension. I'm all for that but it takes admin action and that's not going to happen here. Heck, we can't even get a fault tolerant app or a 3rd queue. As you say, extending the deadline would work fine and I'm all for it. If the admins don't want to extend it, it should be enforced IMO. All of us know (or should know) the deadline time. If your host can't make the deadline, abort the WU ASAP (hopefully we can all do simple math) and let a faster host run it. ID: 44435 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 44436 - Posted: 6 Sep 2016, 3:06:40 UTC - in response to Message 44435. Last modified: 6 Sep 2016, 3:15:44 UTC Beyond: Not everybody micromanages like you assume. In my case, I am attached to 60 projects, do work for about 15 of them, and have 9 long-running (~80 days average per task) RNA World VM tasks, all on the same machine, working about 20 tasks at the same time (9 RNA World, 6 CPU, 4 GPU, 1 non-CPU-intensive). I also game a lot, and also I keep GPU suspended a lot when I want the room to not be noisy (since it's also my home office, where I do my day job). So, sometimes my PC may approach or even slightly miss a GPUGrid deadline. So be it. What any project should aim for (and what BOINC's scheduler tries to satisfy for all projects), is to not waste resources. So ... When a task goes beyond deadline - If it hasn't been started yet on Client A, the client will abort it. - If it HAS been started on Client A, the client will work on it, unless the server says abort. The server will also assign to Client B - If Client A or Client B reports "completed" and it has passed validation (including any wingmen or quorum validation), then the server should tell any remaining clients to cancel. (Not sure if it's coded this way, but it should be, in regards to not wasting resources.). PS: If you look at the "sched_reply_www.gpugrid.net.xml" scheduler reply file, you'll see: <next_rpc_delay>3600.000000</next_rpc_delay> ... which means, according to https://boinc.berkeley.edu/trac/wiki/ProjectOptions ... that the project is being pinged every hour, to support cancelling when necessary. I know you're saying "But Client A had it's chance, it should just give up", but what if it was at 99.5% completion? Try to think about it from a resources perspective. So, try not to micromanage :) And try not to be whiny about credits. Just get the job done as well as you can, meeting your own personal needs. Mostly I make the deadlines, but I miss deadlines sometimes - that's my situation. I don't worry about it. ID: 44436 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 44440 - Posted: 6 Sep 2016, 14:13:04 UTC Jacob, if you read the thread you could see that I'm just agreeing with Bedrich, and still do. I also offered some possible solutions. The project supposedly needs fast result turnaround unlike most BOINC projects so the comparisons to them is not very useful. As far as maximizing resources, the largest gain there would be fixing the app to be fault tolerant. Yesterday I lost 15 WUs to a power outage, most of which were far along towards completion. That's in the neighborhood of 250 hours of GPU time wasted. The insult about credit whining is just lame, and I expect more of you than that. The insults about micromanaging is really about making sure that WUs finish by deadline. Apparently the BOINC people haven't done their job well enough to ensure that, since it's not happening. It doesn't take long to glance at BoincTasks to make sure that things are going correctly: perhaps 15 seconds a few times a day. Jacob, perhaps disagreeing with someone else doesn't have to include personal attacks. ID: 44440 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 44443 - Posted: 6 Sep 2016, 15:20:04 UTC - in response to Message 44440. I'm sorry. It wasn't meant to be a personal attack, honestly. I was just responding to the "we get whiny" post that I believe you posted earlier. It is unfortunate that GPUGrid tasks aren't more fault-tolerant to power outages. Worse yet, GPUGrid's deficiency here can cause other tasks to fail too - and I have some tasks that are approaching 400 days RunTime! :) It's BOINC's job to try to make tasks meet deadline, and it does a great job of it, when given the correct data for estimation. GPUGrid still uses a project-wide "Duration correction factor" (cringe), <rsc_fpops_est> values that are often generic and sometimes inaccurate (cringe), and only 2 real applications (buckets) to lump their tasks (cringe). BOINC can only do so much to guess how long a task will take, based on this info, and I believe it is doing the best it can. I agree with you that GPUGrid should make their tasks more fault-tolerant to power outages. And I think I agree with you that, if they're going to keep this "bonus credit" system (which I care nothing for), then they should correct it to work properly for the crunchers that receive resends. (ie: If cruncher B completes before cruncher A, give B bonus credit. If cruncher A completes before cruncher B, at least give cruncher B some credit for their time spent before the server cancelled their task). Are we having fun yet? Smile. My wife says I lack tact. I apologize for lacking tact in my prior post. ID: 44443 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level Scientific publications	Message 44445 - Posted: 6 Sep 2016, 17:21:36 UTC - in response to Message 44443. I'm sorry. It wasn't meant to be a personal attack, honestly. I was just responding to the "we get whiny" post that I believe you posted earlier. It is unfortunate that GPUGrid tasks aren't more fault-tolerant to power outages. Worse yet, GPUGrid's deficiency here can cause other tasks to fail too - and I have some tasks that are approaching 400 days RunTime! :) It's BOINC's job to try to make tasks meet deadline, and it does a great job of it, when given the correct data for estimation. GPUGrid still uses a project-wide "Duration correction factor" (cringe), <rsc_fpops_est> values that are often generic and sometimes inaccurate (cringe), and only 2 real applications (buckets) to lump their tasks (cringe). BOINC can only do so much to guess how long a task will take, based on this info, and I believe it is doing the best it can. I agree with you that GPUGrid should make their tasks more fault-tolerant to power outages. And I think I agree with you that, if they're going to keep this "bonus credit" system (which I care nothing for), then they should correct it to work properly for the crunchers that receive resends. (ie: If cruncher B completes before cruncher A, give B bonus credit. If cruncher A completes before cruncher B, at least give cruncher B some credit for their time spent before the server cancelled their task). Are we having fun yet? Smile. My wife says I lack tact. I apologize for lacking tact in my prior post. Thanks Jacob. I think it was Joe Biden that said his greatest lesson in working effectively in the senate was to stick to the issues and not to attribute imagined motivations to other people. I try to keep bringing that to mind, with limited success. I know that BOINC is pretty much doing what it can, was just feeling grouchy because I felt that I wasn't being understood. As you say, sometimes when a GPUGrid task fails it sometimes brings down other tasks with it. I think that Climate Prediction was also having that problem if I remember correctly. An improperly formed WU can also occasionally cause a machine to reset, not the usual result but a number of us have seen it happen. Most recently had it happen with one of the bad ADRIAs (also causing the WU on the 2nd GPU to fail). For me the biggest improvement that GPUGrid could make is to increase the fault tolerance of their application. The waste over all the affected users has to be astronomical. I know my case is unusual but upwards of 250 hours in a single power glitch? How does that help anyone (rhetorical question). The good news is that Zoltan posted something that I hope will help. It still does not in any way excuse the bad app behavior, but you do what you can. If I may repost his helpful tip for others that may not have seen it: Have you tried to turn off write caching for your disks? Windows key + R -> Devmgmt.msc <ENTER> -> Disk drives -> select your BOINC disk (double click) -> Policies tab -> Un-check (both) write caching option(s) -> OK -> Close device manager Another way to get to the disk policies is through Control Panel/Administrative Tools/Computer Management/Device Manager/Disk Drives. ID: 44445 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 253,643 Level Scientific publications	Message 44463 - Posted: 9 Sep 2016, 10:21:27 UTC after a quick review of all the above comments, my question is: which time span is being taken to determine the 24-hours limit or the 48-hours limit with regard to the extra bonus: from the time when the WU was downloaded viv-a-vis the time the finished WU is being uploaded? If so, in the BOINC computing preferences, under "store at least ... days of work", the smallest value possible should be entered, I guess. I am asking this because one of my hosts (GTX750Ti), for example, crunches the current Gerard tasks in about 23,4 hours. However, I did NOT get the 24hours bonus, probably because, as a result of my settings (0,2 days), this WU was downloaded a few hours before the previous WU got finished and uploaded. Is this right thinking, or am I missing something? ID: 44463 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 44465 - Posted: 9 Sep 2016, 11:07:56 UTC - in response to Message 44463. Last modified: 9 Sep 2016, 11:08:41 UTC after a quick review of all the above comments, my question is: which time span is being taken to determine the 24-hours limit or the 48-hours limit with regard to the extra bonus: from the time when the WU was downloaded vis-a-vis the time the finished WU is being uploaded? Yes. Uploaded & reported, so the <report_results_immediately> option should be turned on in the cc_config.xml file. If so, in the BOINC computing preferences, under "store at least ... days of work", the smallest value possible should be entered, I guess. Yes, so you can set it to 0. In this case the manager will ask for new task only when the actual one is finished, so there will be a little idle time between the finished task and the new task. It is advisable to have only one another GPU project on such hosts with 0 resource share to avoid the host running completely idle while there are no available work from GPUGrid. I am asking this because one of my hosts (GTX750Ti), for example, crunches the current Gerard tasks in about 23,4 hours. However, I did NOT get the 24hours bonus, probably because, as a result of my settings (0,2 days), this WU was downloaded a few hours before the previous WU got finished and uploaded. Is this right thinking, or am I missing something? You're right. ID: 44465 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 44466 - Posted: 9 Sep 2016, 12:04:17 UTC - in response to Message 44465. Last modified: 9 Sep 2016, 12:32:19 UTC I wanted to clarify a couple things, based on what I know. Setting <report_results_immediately> (a setting that affects all attached projects) to 1 in cc_config.xml, is NOT necessary if you have a new enough client, because GPUGrid tasks are already configured (by the project) to be reported immediately, as can be seen by inspecting a COPY of client_state.xml and seeing <report_immediately/> set for each GPUGrid <result> (task). It's actually best to NOT set <report_results_immediately>, so results from other projects can be sent in a single scheduler request, thus easing network traffic to those other projects. Regarding running a cache setting of "Store at least: 0 days", the BOINC client scheduler actually is setup to more-gracefully handle it. It doesn't let you run completely out of work, before asking for more work. Instead, it should start asking for work, when you have about 3 minutes of work left. I had David make this change a while back, because I told him it took about 3 minutes sometimes to ask enough projects for work, when attached to several projects that don't have any. So, with that 0 setting, you can expect it to start asking when a resource would go idle in 3 minutes. :) And yes, it's best to be attached to other GPU projects, in case GPUGrid doesn't have any work available, that way your GPU doesn't sit idle/wasted. Regards, Jacob ID: 44466 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 44467 - Posted: 9 Sep 2016, 14:22:27 UTC - in response to Message 44466. Last modified: 9 Sep 2016, 14:23:15 UTC ... GPUGrid tasks are already configured (by the project) to be reported immediately, as can be seen by inspecting a COPY of client_state.xml and seeing <report_immediately/> set for each GPUGrid <result> (task). Thanks Jacob, I didn't know that it'd been set by the project. ID: 44467 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,850,145,728 RAC: 75,320 Level Scientific publications	Message 44484 - Posted: 12 Sep 2016, 0:33:28 UTC This is ridicules, getting a task 4 seconds after it was completed by the previous host: http://www.gpugrid.net/workunit.php?wuid=11716407 15272295 369296 7 Sep 2016 \| 0:22:56 UTC 12 Sep 2016 \| 0:23:07 UTC Completed and validated 305,201.46 69,056.07 351,400.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) 15276588 263612 12 Sep 2016 \| 0:23:11 UTC 12 Sep 2016 \| 0:24:13 UTC Aborted by user 0.00 0.00 --- Long runs (8-12 hours on fastest card) v8.48 (cuda65) There was no point crunching it. ID: 44484 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,850,145,728 RAC: 75,320 Level Scientific publications	Message 44578 - Posted: 25 Sep 2016, 0:44:17 UTC Last modified: 25 Sep 2016, 0:50:31 UTC Here is an example of the opposite, finishing after the 24 hour deadline and getting the 24 hours bonus: name 1tfj-SDOERR_OPMcharmm1-0-1-RND6735 application Long runs (8-12 hours on fastest card) created 21 Sep 2016 \| 14:15:30 UTC canonical result 15295403 granted credit 368,676.00 minimum quorum 1 initial replication 2 max # of error/total/success tasks 7, 10, 6 Task click for details Computer Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Credit Application 15295403 30790 21 Sep 2016 \| 18:00:15 UTC 22 Sep 2016 \| 13:50:55 UTC Completed and validated 62,910.69 61,672.50 368,676.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) 15295404 172813 21 Sep 2016 \| 16:16:28 UTC 24 Sep 2016 \| 10:49:59 UTC Completed and validated 198,940.38 51,013.57 368,676.00 Long runs (8-12 hours on fastest card) v8.48 (cuda65) http://www.gpugrid.net/workunit.php?wuid=11731674 This, I find to be humorous! For the record, my computer finished within the 24 hour deadline, the other person's didn't, in case you didn't notice that. ID: 44578 · Rating: 0 · rate: / Reply Quote