JIT (Just In Time) or Queue and Wait?

Author	Message
Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 42112 - Posted: 6 Nov 2015, 12:08:46 UTC - in response to Message 42111. Last modified: 6 Nov 2015, 12:09:32 UTC The best idea expressed in this thread to encourage crunchers to lower their work buffer by creating shorter than 24h return bonus level(s). I think a 3rd bonus level of 75% for less than 12h would be sufficient, as a long workunit takes ~10.5h to process on a GTX970 (Win7). I don't think there should be a shorter period for higher bonus, as it would not be fair to create a level which could be achieved only with the fastest cards. But it could be debated, as there's a lot of GTX980Tis attached to the project. Even some of my host could achieve higher PPD if there was a shorter bonus level with higher percentages, but the throughput increase of the whole project matters, not the single cruncher's desire. ID: 42112 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 42113 - Posted: 6 Nov 2015, 16:18:49 UTC - in response to Message 42112. It looks like a judicious compromise that will encourage the desired behavior (insofar as we know it) without burdening anyone. ID: 42113 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 86,695 Level Scientific publications	Message 42117 - Posted: 7 Nov 2015, 11:36:13 UTC After reading through this thread, I think we should leave things the way they are, with one exception not mentioned here. Which is not allow hosts with old and slow cards to be able to download tasks. I would include the 200 series and earlier, the lower end 400 and 500 series, early Quadro, early Tesla, and the M series. These cards take days to complete tasks (sometimes finishing after the deadline), and often finish with errors. This really slows the project. ID: 42117 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 42118 - Posted: 7 Nov 2015, 12:09:50 UTC - in response to Message 42117. 200 series has been excluded for a while now. ID: 42118 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 2 Jan 09 Posts: 303 Credit: 7,322,550,090 RAC: 15,192 Level Scientific publications	Message 42119 - Posted: 7 Nov 2015, 12:14:21 UTC - in response to Message 42117. After reading through this thread, I think we should leave things the way they are, with one exception not mentioned here. Which is not allow hosts with old and slow cards to be able to download tasks. I would include the 200 series and earlier, the lower end 400 and 500 series, early Quadro, early Tesla, and the M series. These cards take days to complete tasks (sometimes finishing after the deadline), and often finish with errors. This really slows the project. You don't think the negative rewards for running those cards is enough, then why stop them by design? I think cutting off people willing to TRY is not a good idea, but letting them know up front that they will not get the bonus could be a good idea. As for 'slowing the project' Seti tried MANY years ago now, to only send resends to the top performing users, maybe they could try that here with the 980 cards? I think it could be done fairly easily by having those with the 980 cards look at the resends group prior to the 'new' group of units when they ask for new workunits. I'm guessing they aren't separated now, but a folder system could fix that. ID: 42119 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 86,695 Level Scientific publications	Message 42121 - Posted: 7 Nov 2015, 15:27:28 UTC - in response to Message 42118. 200 series has been excluded for a while now. Then this page needs to be updated: https://www.gpugrid.net/forum_thread.php?id=2507 ID: 42121 · Rating: 0 · rate: / Reply Quote

Betting Slip Send message Joined: 5 Jan 09 Posts: 670 Credit: 2,498,095,550 RAC: 0 Level Scientific publications	Message 42122 - Posted: 7 Nov 2015, 20:02:27 UTC - in response to Message 42121. 200 series has been excluded for a while now. Then this page needs to be updated: https://www.gpugrid.net/forum_thread.php?id=2507 Indeed ID: 42122 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 42125 - Posted: 9 Nov 2015, 4:38:49 UTC - in response to Message 42069. The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file. The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts. I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems. I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0". From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file. Regards, Jacob ID: 42125 · Rating: 0 · rate: / Reply Quote

Vagelis Giannadakis Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level Scientific publications	Message 42127 - Posted: 9 Nov 2015, 9:20:09 UTC - in response to Message 42125. The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file. The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts. I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems. I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0". From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file. Regards, Jacob The problem I see with this is that it would apply equally to far unequally capable cards. For a recent, mid/high end GPU with 4GB or more it may be OK to process 2 tasks at a time, but what about an older GPU with say 2GB? At the best it would make processing crawl, at the worst it would cause crashes. In the end, such an approach would also need to consider the type of GPU and amount of memory. I don't know how much complexity this would add to the scheduling logic and how more difficult it would make its maintenance. ID: 42127 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 42129 - Posted: 9 Nov 2015, 12:34:19 UTC - in response to Message 42127. Last modified: 9 Nov 2015, 12:35:11 UTC The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file. The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts. I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems. I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0". From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file. Regards, Jacob The problem I see with this is that it would apply equally to far unequally capable cards. For a recent, mid/high end GPU with 4GB or more it may be OK to process 2 tasks at a time, but what about an older GPU with say 2GB? At the best it would make processing crawl, at the worst it would cause crashes. In the end, such an approach would also need to consider the type of GPU and amount of memory. I don't know how much complexity this would add to the scheduling logic and how more difficult it would make its maintenance. I don't think you understand my proposal. I'm proposing a user web setting, that by default, would be set to "run at most 1 task per GPU", which is no different than today. But the user could change it, if they wanted to. Yes, they'd be responsible for knowing the types of GPUs they have attached to that profile venue/location. And the scheduling logic/changes shouldn't be too difficult - They would just need to "trump" the user setting, and use gpu_usage of 1.0, on any task they send out that they want back faster than if the user did 2-tasks-per-GPU. By default the web setting would function no different than how GPUGrid functions today. PS: At one time, I had a GTX 660 Ti and a GTX 460 in my machine.. and because of how BOINC server software works, it thought I had 2 GTX 660 Ti GPUs. And although the 660 Ti had enough memory for 2-per-GPU, the GTX 460 did not, and I had to set my app_config.xml to 1.0 gpu_usage. Times have changed. I now have 3 GPUs in this rig, GTX 970, GTX 660 Ti, GTX 660 Ti, and I can use 0.5 gpu_usage. But I'd prefer not have to use an app_config.xml file at all! ID: 42129 · Rating: 0 · rate: / Reply Quote

Vagelis Giannadakis Send message Joined: 5 May 13 Posts: 187 Credit: 349,254,454 RAC: 0 Level Scientific publications	Message 42130 - Posted: 9 Nov 2015, 14:09:04 UTC - in response to Message 42129. Yes, I understand better now. You mean, the user setting a value for more than one task per GPU (a setting that would apply in general) and the server overriding this only for forcing 1-to-1 whenever the need arises. ID: 42130 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 42131 - Posted: 9 Nov 2015, 15:49:11 UTC Yep, you got it! GPUGrid should only implement it, if they think the project throughput benefits would outweigh the dev costs. I can live with micro-managing the app_config.xml file, either way. I just think it sounds like a neat/appropriate feature for this project. ID: 42131 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 42176 - Posted: 16 Nov 2015, 22:13:03 UTC - in response to Message 42112. Last modified: 16 Nov 2015, 22:13:48 UTC It's a misconception to think that you can't have a small cache of work without negatively impacting on the project. Crunchers who regularly return work all make a valuable contribution. This should never be misconstrued as slowing down the research - without the crunchers GPUGrid would not exist. The turnaround problem stems from the research structure. Slow turnaround from crunchers is mostly down to crunchers who have smaller cards &/or don't crunch regularly. Many of these crunchers don't know what they are doing (Boinc settings) or the consequences of slow work return or having a cache. Other issues such as crunching for other projects (task switching and priorities), bad work units, computer or Internet problems are significant factors too. A solution might be to give optimal crunchers the most important work (if possible) and delegate perceived lesser work to those who return work more slowly/less reliably and to only send short tasks to slow crunchers. To some extent I believe this is being done. If return time is critical to the project then credits should be based on return time and instead of having 2 or 3 cut-off points it should be a continual gradient from 200% down based on the fastest valid return, say 8h for example being reduced by 1% every half hour down to 1% for a WU returned after ~4.5days. Would make the performance tables more relevant, add a bit of healthy competition and prevent people being harshly chastised for missing a bonus time by a few minutes (GTX750Ti on WDDM). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help ID: 42176 · Rating: 0 · rate: / Reply Quote

disturber Send message Joined: 11 Jan 15 Posts: 11 Credit: 62,705,704 RAC: 0 Level Scientific publications	Message 42229 - Posted: 28 Nov 2015, 3:05:38 UTC After reading these posts, I decided to set my queue time to 0.25 days. I have mismatched video cards, a 970 and a 660ti, so the queue is based on the slower card. I found that work returned by the 660ti was given less credit since the wu queue and compute time exceeded 24 hours. So this gave me the incentive to cut back on the number of waiting wu. So this thread was beneficial to me. I have smaller wu queue (1 to be exact) and end up with more credits. A win win for all. Thanks ID: 42229 · Rating: 0 · rate: / Reply Quote