Unsent tasks decreasing much more slowly

Author	Message
WPrion Send message Joined: 30 Apr 13 Posts: 109 Credit: 3,977,737,860 RAC: 6,051 Level Scientific publications	Message 54312 - Posted: 12 Apr 2020, 13:04:35 UTC I've noticed that the number of Unsent Tasks is decreasing at a much slower rate even though the number of tasks in progress is growing and the Current GigaFLOPS is approaching record levels. Tasks in progress had decreased from 300,000 to 250,000 in a few weeks, but now it is taking several days to decrease by only 1,000. What changed? Are additional new tasks being added or are the tasks being crunched now more difficult? ID: 54312 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54314 - Posted: 12 Apr 2020, 14:14:37 UTC - in response to Message 54312. Toni prioritized some batches before, those have run out. That made the number of unsent task to decrease more rapidly. Now it's back to the "normal" (almost 0) rate. It means that when these will run out, the decrease will be 100 times faster than the previous faster rate. ID: 54314 · Rating: 0 · rate: / Reply Quote

WPrion Send message Joined: 30 Apr 13 Posts: 109 Credit: 3,977,737,860 RAC: 6,051 Level Scientific publications	Message 54317 - Posted: 13 Apr 2020, 11:39:18 UTC - in response to Message 54314. Thanks! ID: 54317 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54579 - Posted: 4 May 2020, 19:27:18 UTC On March 10th 2020 \| 17:39:16 UTC Retvari Zoltan wrote at message #53884: I'm receiving many tasks which are the last one of their batch: 1nkvA00_450_0-TONI_MDADpr4sn-9-10-RND4090_0 Or near the end of their batch: 1gaxA04_348_0-TONI_MDADpr4sg-8-10-RND1850_0 Total number of tasks in the batch The sequential number of the given task within the batch (starting number is 0) I expect the number of unsent tasks in the queue will drop significantly during the next days. There are 305.826 unsent tasks as I wrote this. At this time, the number of unsent tasks is 243.556, as can be seen at Server status page. The last tasks I'm currently receiving are similar to: 3tekA00_320_3-TONI_MDADpr4st-8-10-RND9554_0 As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?) ID: 54579 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54604 - Posted: 7 May 2020, 6:01:35 UTC As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?) All my received WUs today are this kind. Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption. ID: 54604 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54607 - Posted: 7 May 2020, 9:45:18 UTC - in response to Message 54604. Last modified: 7 May 2020, 9:56:02 UTC As soon as series arrives 9-10 ones, it is predictable that unsent tasks will decrease again at a higher rate... (?) All my received WUs today are this kind. Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption. I'm sure that the number of unsent tasks will drop drastically in the next few days. The only question is the bottom of that drop. It depends on the priority of the tasks in the queue. If it's uniform, the number of unsent tasks will drop near 0, only the tasks stuck in slow or inactive hosts will remain in the queue (~1000 in this case). If there are lower priority tasks than the ones we receive now, then we will receive those soon. We will know if that's the case as they will have low sequence number (for example 3-10). In this case the number of unsent tasks will remain high. I guess there are no lower priority tasks, so the number of unsent tasks will drop near 0. Number of unsent task is 237.790 at the moment. (-4.773 ~2% drop in 3h 45m) ID: 54607 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 54608 - Posted: 7 May 2020, 11:39:02 UTC - in response to Message 54607. I prioritised tasks ending with _0: 1gaxA04_348_0 over the others (_1 to _4) T ID: 54608 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54612 - Posted: 7 May 2020, 18:21:56 UTC - in response to Message 54604. Last modified: 7 May 2020, 18:36:17 UTC Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption. Current reading is 222 460 that is -20 103 (8.28%) drop in 12h 20m = 27.17 / minute If this rate is constant, the present supply will last for 5 days 16 hours 28 minutes and 50.8 seconds. :) ID: 54612 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54613 - Posted: 8 May 2020, 6:19:07 UTC - in response to Message 54612. Last modified: 8 May 2020, 6:24:10 UTC Current reading is 242.563 unsent tasks. We will be soon confirming or discarding this assumption. Current reading is 222 460 that is -20 103 (8.28%) drop in 12h 20m = 27.17 / minute If this rate is constant, the present supply will last for 5 days 16 hours 28 minutes and 50.8 seconds. :) The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:) ID: 54613 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54618 - Posted: 8 May 2020, 20:08:45 UTC - in response to Message 54613. The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:) -1) Mr. Zoltan: Thank you very much for making this funny. I took screenshots that are confirming your data. Reduction in unsent tasks: 41.926 in this about 24H lapse. -2) Mr. Toni/GPUGrid's Team: Thank you very much for your continuous support. This high decreasing rate has been greatly facilitated by exceptionally good communications since yesterday's morning. Whatever you did in the transition from May 6th to 7th, it supposed a drastic change between extremely sluggish to very agile communications. Please, take note of the recipy. At he moment of writing this, scheduler is stopped. I guess that this high rate in returning results has caused a new momentary buffer disk overflow... ID: 54618 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54621 - Posted: 8 May 2020, 20:26:58 UTC - in response to Message 54618. Last modified: 8 May 2020, 20:30:03 UTC The current reading is 200,361 that is 42,202 (17.4%) decrease in 24h 10m = 29.10 / minute The rate is slightly increased. According to this new rate, the present supply will last 4 days 18 hours 44 minutes 6.94 seconds from now .:) At the moment of writing this, scheduler is stopped. I guess that this high rate in returning results has caused a new momentary buffer disk overflow... Note that the return rate was this high all along hence there are frequent disk buffer overflows. As new tasks created from the returned tasks the number of unsent workunits remain constant, so the return rate remain hidden from us, until the batches reach their final sequence number. ID: 54621 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54622 - Posted: 8 May 2020, 20:54:34 UTC - in response to Message 54621. Note that the return rate was this high all along hence there are frequent disk buffer overflows. As new tasks created from the returned tasks the number of unsent workunits remain constant, so the return rate remain hidden from us, until the batches reach their final sequence number. Yes, you're right, and I'm aware of it. Lately frequent schduler stops most probably keep relationship with this Optimized bandwith anouncement, and significantly raised number of crunchers... This combination has likely caused some bottleneck in project's resources. ID: 54622 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 769,991,668 RAC: 0 Level Scientific publications	Message 54624 - Posted: 9 May 2020, 3:59:19 UTC It looks like the server status page needs something added - free disk space - at least for this disk areas that receive uploads. That seems to be the current bottleneck in the project's resources. ID: 54624 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54631 - Posted: 9 May 2020, 10:53:25 UTC - in response to Message 54624. One more conclusion that could be drawn: - Taking Retvari Zoltan's current calculation: 29,1 average returned WUs per minute - Taking some calculations coming from this previous outage: 6,367 MB average per returned WU This results in 185,28 MB coming from finished WUs data returned to server per minute. That is: 260,55 GB of data to manage per day, counting only returned WU's data. (About 1 TB every 4 days) ID: 54631 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 54632 - Posted: 9 May 2020, 12:31:04 UTC - in response to Message 54631. What we don't know - at least, I certainly don't know, and I've not seen it described here, ever - is what exactly the processing path of that data is after our raw results are returned to the server. We do know that each of our tasks forms part of a sequential sequence of (currently) 10 tasks making up the entire job, and that at least some of our returned data is used to assemble the starting data for the next task in the sequence. Is it all used in that way? Once it's been used, does it need to be kept? If so, how long? Can it (any of it) be discarded once the next task in sequence has been created? Has been completed? Once the whole 10-task job has been completed? People in other threads have mentioned SETI as a comparison. There, the process is that the scientific data returned by each task is assimilated into a gigantic, 20-year, scientific database. And that once assimilation has taken place, our raw, returned, data is erased (usually within 24 hours). If we knew for certain that our returned data needed to be retained in quick-access online storage, say until the final paper had been accepted for publication following peer review, then I'd be prepared to contribute to a fundraising drive for additional disk spindles and a chassis to mount them in. But if the daily data is simply transferred over a slow link to an offsite backing store, then spindles aren't the answer: more drives would simply delay the need for an outage from a 5 day to a 10 day interval, and then extend that outage when it eventually arrived. ID: 54632 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54640 - Posted: 10 May 2020, 9:54:11 UTC Last modified: 10 May 2020, 9:54:52 UTC Project's scheduler is just up again, with 174.874 tasks left ready to send! ID: 54640 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54641 - Posted: 10 May 2020, 10:24:22 UTC All my stacked WUs have been reported as finished, and all (but one 8-10) the new WUs I've received are of the kind 9-10. So this topic is still on fire 🔥🔥🔥 ID: 54641 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 54642 - Posted: 10 May 2020, 11:44:52 UTC I have a couple of ghost tasks, so I suppose that many other ghost tasks are waiting to pass their deadline, so some 8-10 tasks will be re-send to other hosts. However the present supply (171,016) will last for about 4 days from now. ID: 54642 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 0 Level Scientific publications	Message 54650 - Posted: 11 May 2020, 6:53:32 UTC What is the ghost recovery procedure on this project? ID: 54650 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 13,083,686,510 RAC: 31,373 Level Scientific publications	Message 54651 - Posted: 11 May 2020, 8:36:40 UTC - in response to Message 54650. Ghost tasks are on GPUGRID's server side. After 5 days deadline is past, server will automatically clear ghost tasks on original host, and resend to another one. ID: 54651 · Rating: 0 · rate: / Reply Quote