JIT (Just In Time) or Queue and Wait?

Message boards : Number crunching : JIT (Just In Time) or Queue and Wait?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42112 - Posted: 6 Nov 2015, 12:08:46 UTC - in response to Message 42111.  
Last modified: 6 Nov 2015, 12:09:32 UTC

The best idea expressed in this thread to encourage crunchers to lower their work buffer by creating shorter than 24h return bonus level(s).
I think a 3rd bonus level of 75% for less than 12h would be sufficient, as a long workunit takes ~10.5h to process on a GTX970 (Win7).
I don't think there should be a shorter period for higher bonus, as it would not be fair to create a level which could be achieved only with the fastest cards. But it could be debated, as there's a lot of GTX980Tis attached to the project. Even some of my host could achieve higher PPD if there was a shorter bonus level with higher percentages, but the throughput increase of the whole project matters, not the single cruncher's desire.
ID: 42112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42113 - Posted: 6 Nov 2015, 16:18:49 UTC - in response to Message 42112.  

It looks like a judicious compromise that will encourage the desired behavior (insofar as we know it) without burdening anyone.
ID: 42113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,739,145,728
RAC: 86,695
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42117 - Posted: 7 Nov 2015, 11:36:13 UTC

After reading through this thread, I think we should leave things the way they are, with one exception not mentioned here. Which is not allow hosts with old and slow cards to be able to download tasks. I would include the 200 series and earlier, the lower end 400 and 500 series, early Quadro, early Tesla, and the M series. These cards take days to complete tasks (sometimes finishing after the deadline), and often finish with errors. This really slows the project.


ID: 42117 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42118 - Posted: 7 Nov 2015, 12:09:50 UTC - in response to Message 42117.  

200 series has been excluded for a while now.
ID: 42118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey

Send message
Joined: 2 Jan 09
Posts: 303
Credit: 7,322,550,090
RAC: 15,192
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42119 - Posted: 7 Nov 2015, 12:14:21 UTC - in response to Message 42117.  

After reading through this thread, I think we should leave things the way they are, with one exception not mentioned here. Which is not allow hosts with old and slow cards to be able to download tasks. I would include the 200 series and earlier, the lower end 400 and 500 series, early Quadro, early Tesla, and the M series. These cards take days to complete tasks (sometimes finishing after the deadline), and often finish with errors. This really slows the project.


You don't think the negative rewards for running those cards is enough, then why stop them by design? I think cutting off people willing to TRY is not a good idea, but letting them know up front that they will not get the bonus could be a good idea. As for 'slowing the project' Seti tried MANY years ago now, to only send resends to the top performing users, maybe they could try that here with the 980 cards? I think it could be done fairly easily by having those with the 980 cards look at the resends group prior to the 'new' group of units when they ask for new workunits. I'm guessing they aren't separated now, but a folder system could fix that.
ID: 42119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,739,145,728
RAC: 86,695
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42121 - Posted: 7 Nov 2015, 15:27:28 UTC - in response to Message 42118.  

200 series has been excluded for a while now.



Then this page needs to be updated:


https://www.gpugrid.net/forum_thread.php?id=2507



ID: 42121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42122 - Posted: 7 Nov 2015, 20:02:27 UTC - in response to Message 42121.  

200 series has been excluded for a while now.



Then this page needs to be updated:


https://www.gpugrid.net/forum_thread.php?id=2507





Indeed
ID: 42122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42125 - Posted: 9 Nov 2015, 4:38:49 UTC - in response to Message 42069.  

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.


I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0".

From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file.

Regards,
Jacob
ID: 42125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 42127 - Posted: 9 Nov 2015, 9:20:09 UTC - in response to Message 42125.  

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.


I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0".

From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file.

Regards,
Jacob


The problem I see with this is that it would apply equally to far unequally capable cards. For a recent, mid/high end GPU with 4GB or more it may be OK to process 2 tasks at a time, but what about an older GPU with say 2GB? At the best it would make processing crawl, at the worst it would cause crashes.

In the end, such an approach would also need to consider the type of GPU and amount of memory. I don't know how much complexity this would add to the scheduling logic and how more difficult it would make its maintenance.
ID: 42127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42129 - Posted: 9 Nov 2015, 12:34:19 UTC - in response to Message 42127.  
Last modified: 9 Nov 2015, 12:35:11 UTC

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.


I still would REALLY appreciate this option. That way, I can set the "Run up to this many tasks per GPU" setting to 2, and the server would generally send "gpu_usage 0.5", but in times where the server decides it'd be better for 1-at-a-time, it would ignore my setting and send "gpu_usage 1.0".

From what I gather, this is possible. And, if it the admins think it would benefit their throughput enough to be useful, I would appreciate its implementation, as then I could see my GPUs get "dynamically adjusted" as deemed appropriate by GPUGrid, instead of "micro-managed" by me with an app_config.xml file.

Regards,
Jacob


The problem I see with this is that it would apply equally to far unequally capable cards. For a recent, mid/high end GPU with 4GB or more it may be OK to process 2 tasks at a time, but what about an older GPU with say 2GB? At the best it would make processing crawl, at the worst it would cause crashes.

In the end, such an approach would also need to consider the type of GPU and amount of memory. I don't know how much complexity this would add to the scheduling logic and how more difficult it would make its maintenance.



I don't think you understand my proposal.

I'm proposing a user web setting, that by default, would be set to "run at most 1 task per GPU", which is no different than today. But the user could change it, if they wanted to. Yes, they'd be responsible for knowing the types of GPUs they have attached to that profile venue/location. And the scheduling logic/changes shouldn't be too difficult - They would just need to "trump" the user setting, and use gpu_usage of 1.0, on any task they send out that they want back faster than if the user did 2-tasks-per-GPU.

By default the web setting would function no different than how GPUGrid functions today.

PS: At one time, I had a GTX 660 Ti and a GTX 460 in my machine.. and because of how BOINC server software works, it thought I had 2 GTX 660 Ti GPUs. And although the 660 Ti had enough memory for 2-per-GPU, the GTX 460 did not, and I had to set my app_config.xml to 1.0 gpu_usage. Times have changed. I now have 3 GPUs in this rig, GTX 970, GTX 660 Ti, GTX 660 Ti, and I can use 0.5 gpu_usage. But I'd prefer not have to use an app_config.xml file at all!
ID: 42129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vagelis Giannadakis

Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 42130 - Posted: 9 Nov 2015, 14:09:04 UTC - in response to Message 42129.  

Yes, I understand better now. You mean, the user setting a value for more than one task per GPU (a setting that would apply in general) and the server overriding this only for forcing 1-to-1 whenever the need arises.
ID: 42130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42131 - Posted: 9 Nov 2015, 15:49:11 UTC

Yep, you got it! GPUGrid should only implement it, if they think the project throughput benefits would outweigh the dev costs. I can live with micro-managing the app_config.xml file, either way. I just think it sounds like a neat/appropriate feature for this project.
ID: 42131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42176 - Posted: 16 Nov 2015, 22:13:03 UTC - in response to Message 42112.  
Last modified: 16 Nov 2015, 22:13:48 UTC

It's a misconception to think that you can't have a small cache of work without negatively impacting on the project. Crunchers who regularly return work all make a valuable contribution. This should never be misconstrued as slowing down the research - without the crunchers GPUGrid would not exist.

The turnaround problem stems from the research structure. Slow turnaround from crunchers is mostly down to crunchers who have smaller cards &/or don't crunch regularly. Many of these crunchers don't know what they are doing (Boinc settings) or the consequences of slow work return or having a cache.
Other issues such as crunching for other projects (task switching and priorities), bad work units, computer or Internet problems are significant factors too.

A solution might be to give optimal crunchers the most important work (if possible) and delegate perceived lesser work to those who return work more slowly/less reliably and to only send short tasks to slow crunchers. To some extent I believe this is being done.

If return time is critical to the project then credits should be based on return time and instead of having 2 or 3 cut-off points it should be a continual gradient from 200% down based on the fastest valid return, say 8h for example being reduced by 1% every half hour down to 1% for a WU returned after ~4.5days.
Would make the performance tables more relevant, add a bit of healthy competition and prevent people being harshly chastised for missing a bonus time by a few minutes (GTX750Ti on WDDM).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 42176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
disturber

Send message
Joined: 11 Jan 15
Posts: 11
Credit: 62,705,704
RAC: 0
Level
Thr
Scientific publications
watwatwat
Message 42229 - Posted: 28 Nov 2015, 3:05:38 UTC

After reading these posts, I decided to set my queue time to 0.25 days. I have mismatched video cards, a 970 and a 660ti, so the queue is based on the slower card. I found that work returned by the 660ti was given less credit since the wu queue and compute time exceeded 24 hours. So this gave me the incentive to cut back on the number of waiting wu.

So this thread was beneficial to me. I have smaller wu queue (1 to be exact) and end up with more credits. A win win for all.

Thanks
ID: 42229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : JIT (Just In Time) or Queue and Wait?

©2026 Universitat Pompeu Fabra