JIT (Just In Time) or Queue and Wait?

Message boards : Number crunching : JIT (Just In Time) or Queue and Wait?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42029 - Posted: 26 Oct 2015, 9:26:58 UTC
Last modified: 26 Oct 2015, 9:27:45 UTC

I would like to propose a JIT policy be used on this project and I will attempt to articulate why.

In view of this projects need for a "FAST TURN AROUND" of WU's would a policy of ONE WU per GPU and a GPU only gets another when it has returned the last one. This simple policy would surely speed the throughput of WU's especially since GERARD has expressed on this forum a desire to do that and I would think he considers it an important goal.

This would also end the inefficiency of having a WU cached on one machine for several hours that could be RUNNING on another machine that can't get any work.

We could get a faster throughput of WU's and from that an increasing availability of new WU's because new WU's are generated from returned results.

Does that make sense to you?
ID: 42029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42030 - Posted: 26 Oct 2015, 12:42:03 UTC - in response to Message 42029.  
Last modified: 26 Oct 2015, 13:00:20 UTC

It does make sense, but there are issues with that approach which make it impractical.

Tasks take a long time to upload. ~6 to 10min for me (Europe), usually longer from US to Europe and for people on slower broadband connections.
I have 2 GPU's in each of two systems. Returning 2 WU's per GPU per day I would lose 50 to 80min of GPU crunching/day. For some people it could be several hours/day. Too much.

If a new task would not download until an existing task was at 90% that might be the happy medium.

Some people also run 2 WU's at a time on the same GPU. This increases overall throughput, but requires more tasks to be available.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 42030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42031 - Posted: 26 Oct 2015, 13:43:57 UTC - in response to Message 42030.  

I take your point SK but without sounding to be too harsh my suggestions are aimed at "project benefits" NOT "user benefits" although I'm sure some of the difficulties you mention could be mitigated.

My point of view is that, the project benefits are more important and while a few users may suffer I wouldn't let them become a bottleneck.

As for running 2 WU's on one GPU that has no project benefit whatsoever and is only designed to raise RAC. In fact it actually slows the project down
ID: 42031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,801,881,008
RAC: 50,765
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42032 - Posted: 26 Oct 2015, 16:04:52 UTC

Sorry! I completely disagree with this suggestion! I wanted to propose quite the opposite: Three WUs per GPU!

I recently acquired two GTX 970 and put them in the same computer (quite an investment) and as many suggested I am running two WUs at the same time, just to get a better load on them! Nothing to do with your suggested RAC optimization, as you might well know if your read the forums and the experiences of GTX 980 (TI) owners.

Now with the new policy of GERARD of small batches I have a huge problem! I do receive four WUs in a short time and all finish more or less at the same time after 18 hours… (within 24 hours) and then they get stuck for several hours in the up-load queue, with luck it solves itself over the night or until I come back and up-load those units by hand!

It is because the Internet Provider from Spain applies out dated Internet Polices on its subsidiary branches in the Southern Hemisphere: High Fees, slow up- and down load speeds, guarantying only 10% of the announced speeds, and penalizing the up-load speed even more.

This translates in a minimum up-load time of 30-40 minutes for each WU (if it is one by one, and not in parallel) and often gets interrupted by other WUs starting to up-load from my other computers or from the receiving up-load server at GPUGRID.net and then it kicks in the awful BOINC policies of logarithmic waiting time for each stop and finally they do not up-loaded at all until I intervene manually.

If we had three or four WUs per GPU, at least I would be able to keep crunching until the first WUs have been up-loaded (automatically or by hand), but at the moment, the GPUs just sit idle!

Don’t suggest a secondary or alternative BOINC Project, I do have already some with recourse sharing set on 0%, this is not quite a solution:
1) I do have an emphasis on GPUGRID,
2) There are not a lot of projects that have short turnaround times (or do penalize short WUs like GPUGRID does itself), which is a necessity as it serves only for occiping the GPUs until there are new WUs from GPUGRID available.

There shall be more WUs available per GPU and not less for those crunching out-side Europe or North America and for those who have a slow Internet Connection!!
ID: 42032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42034 - Posted: 26 Oct 2015, 18:42:15 UTC

I like the JIT queue too, but as an option. That is, if people have good upload/download speeds and want to run only 1 work unit at a time, they could do it. Otherwise, they would use the normal queue.

But I doubt that they have enough work for both at the moment. In fact, I get the impression they mentioned the fast turn-around at all mainly because they don't. But if the work permits, I would opt for it myself.
ID: 42034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42036 - Posted: 27 Oct 2015, 19:36:11 UTC
Last modified: 27 Oct 2015, 19:36:32 UTC

Here's my thought:

Set the server to default to "1 task per GPU"
... but allow a web profile override, so the user could change it to "2 tasks per GPU", or even "3 tasks per GPU" (to better support the case where they actually run 2-per-GPU but are uploading).

Recap: "1 per GPU" by default, but overrideable by user to be 2 or 3.

However, for hosts that are not connected to the internet 24/7, the default setting may cause them to crunch less.

Honestly, I like the setting where it's at right now (server sends 2 per GPU), but wish we could have a web user override setting, for 3 per GPU.
ID: 42036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42062 - Posted: 1 Nov 2015, 0:11:42 UTC - in response to Message 42032.  

I recently acquired two GTX 970 and put them in the same computer (quite an investment) and as many suggested I am running two WUs at the same time, just to get a better load on them! Nothing to do with your suggested RAC optimization


Whatever you wish to delude yourself with by running 2 WU's on one GPU you are not only depriving other machines of work but also slowing down the return of results to create other WU's and so ultimately you are creating a bottleneck.
ID: 42062 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42063 - Posted: 1 Nov 2015, 0:20:46 UTC

It's not that simple.

Sure, in situations where 1) the availability of work units depends on others being completed, AND 2) there are no work units available ... then doing 2-at-a-time can cause a decrease in overall project efficiency.

HOWEVER

In situations where work units are readily available ... then doing 2-at-a-time can cause an increase in overall project efficiency, because we can get them done ~10% quicker.

SO ... Please don't fault the users doing 2-at-a-time. If the admins believe that 1-at-a-time will help them more, then they could/should publicly request that, and some of us would be persuaded to revert to 1-at-a-time. But, in the meantime, if I can get work units done faster by 2-at-a-time, increasing my machines overall throughput, then I'm going to do that.

:)
ID: 42063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42065 - Posted: 1 Nov 2015, 0:56:03 UTC - in response to Message 42063.  

Sure, in situations where 1) the availability of work units depends on others being completed, AND 2) there are no work units available ... then doing 2-at-a-time can cause a decrease in overall project efficiency.


I disagree its exactly that simple as you have pointed out as both conditions 1 and 2 of your statement are applicable at this time.

In situations where work units are readily available ... then doing 2-at-a-time can cause an increase in overall project efficiency, because we can get them done ~10% quicker.


The conditions in this situation are not applicable at this time and WOW a maybe 10% quicker.
ID: 42065 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42066 - Posted: 1 Nov 2015, 1:21:54 UTC

This project usually has tasks. So, the more common scenario is the scenario where 2-at-a-time would help.

I have temporarily set mine to 1-at-a-time, until plenty of work units are available.

Lighten up, please.
ID: 42066 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42067 - Posted: 1 Nov 2015, 1:33:00 UTC - in response to Message 42066.  

This project usually has tasks. So, the more common scenario is the scenario where 2-at-a-time would help.



Since this project benefits from speed of return of each WU there will NEVER be a time or situation when a small increase in throughput at the (large) expense of speed will be of any benefit to this project but only to the user in RAC terms.

BTW the original post was about caching WU's by one user at the expense of another user and the project. It was SK that raised the dubious benefits of running 2 WU's on one GPU.
ID: 42067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42068 - Posted: 1 Nov 2015, 1:38:33 UTC
Last modified: 1 Nov 2015, 1:39:08 UTC

there will NEVER be a time or situation when a small increase in throughput at the (large) expense of speed will be of any benefit to this project


You are incorrect. If plenty of jobs are available, and a host can increase throughput by 10% on their machine, then that helps the project, especially if that is the normal scenario.

Like I said... I'll change to 1-at-a-time temporarily, during this non-normal work outage, but will eventually change back to 2-at-a-time.

Perhaps the best approach would be for the project server to stop handing out 2-at-a-time, when the queue is very near empty. Just a thought. There are problems with that approach, too, though.
ID: 42068 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42069 - Posted: 1 Nov 2015, 13:27:48 UTC - in response to Message 42068.  
Last modified: 1 Nov 2015, 13:55:48 UTC

Re-reading your initial first post, I do believe that there is a compromise that can be made server-side, but I don't know exactly what it is.

Perhaps if the server:
1) detects/estimates that it will come close to running out of work
AND
2) the remaining work units are of the type where new ones would be generated upon the completion of the existing work units
THEN ... the server could switch to a "hand out 1-per-GPU" mode.

If those conditions aren't met, it could switch back to "2-per-GPU" mode.

But the client will still crunch at their same x-per-GPU setting, which sort of sucks, since generally 2-at-a-time is better, but in cases like we have right now, 1-at-a-time is better. Ideally, it'd be best if that too could be controlled server-side, and I don't think it's outside the realms of possibility.

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.

The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.

I'd love to hear the admins respond to this proposal. I think it's a great compromise, that could fix multiple problems.
ID: 42069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42070 - Posted: 1 Nov 2015, 14:31:06 UTC - in response to Message 42069.  
Last modified: 1 Nov 2015, 14:38:27 UTC

The server already, by default, currently says "run 1-at-a-time" (gpu_usage = 1.0), but imagine an additional user web setting that says "Run up to this many tasks per GPU", where the user can change the default from 1 to another value like 2 (gpu_usage = 0.5). Basically, then, the server can choose gpu_usage 0.5 when there's plenty of work available, but choose gpu_usage 1.0 when in "1-per-GPU" mode. And the user wouldn't need an app_config.xml file.
There is such a user profile setting at the Einstein@home project.
But ideally this setting should be done by the server, as you say:
The whole thing would be dynamically controlled, server side. Complicated, but possible, I think. And it would surely increase throughput, during droughts.
However, there are such hosts which don't need to run more than one workunit to achieve maximal GPU usage, so this behavior could be disabled by the user. Beside this, if the project wants to prioritize its throughput I think the server should make use of the host's profile, especially the "average turnaround time" to decide which host is worthwhile to receive urgent workunits. This will handicap the hosts with lesser GPUs, but surely will decrease the overall processing time. I don't think the participants with lesser GPUs will appreciate much this though.
It's like Formula-1: if you over-complicate the rules, it will hurt competition.
ID: 42070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42074 - Posted: 1 Nov 2015, 16:20:12 UTC - in response to Message 42070.  
Last modified: 1 Nov 2015, 16:25:18 UTC

This will handicap the hosts with lesser GPUs, but surely will decrease the overall processing time. I don't think the participants with lesser GPUs will appreciate much this though.
It's like Formula-1: if you over-complicate the rules, it will hurt competition.

I am perfectly willing to give up my GTX 660 Ti and GTX 750 Ti's on this project for the moment, and get faster cards later. I can always use the slower cards elsewhere. The project needs to do what is best for them, though it will alienate some people, and they need to consider that.

Folding handles it by awarding "Quick Return Bonus" points that rewards the faster cards more than they would normally be. Also, they overlap the download of the new work unit with the finishing of the old work when it reaches 99% complete. I think that is why the Folding people never bought into BOINC, but developed their own control app. Maybe BOINC could adopt some part of it?
ID: 42074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42076 - Posted: 1 Nov 2015, 16:50:48 UTC

I'm sorry guys but I think you are over thinking my original post.

One WU per GPU and you don't get another one until

you begin to upload last one
last one is 99% complete

Nobody needs to be alienated, leave the project, or have theit contribution or card questioned.

It really is that simple.
ID: 42076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42078 - Posted: 1 Nov 2015, 17:11:00 UTC - in response to Message 42076.  

Betting Slip wrote:
One WU per GPU and you don't get another one until

you begin to upload last one
last one is 99% complete
This is ok, but the BOINC client-server architecture is not working this way, because there's the concept of "reporting completed tasks": it's not just the host has to upload the result, it has to be reported to the server to begin further processing (awarding credits for it, comparing results from different host for validation, creating and issuing new tasks, etc). That's why the "report_results_immediately" option is recommended for GPUGrid users. But there's no preemptive way for sending workunits to hosts in BOINC.

Jim1348 wrote:
Folding handles it by awarding "Quick Return Bonus" points that rewards the faster cards more than they would normally be.
It's the same for GPUGrid.Jim1348 wrote:
Also, they overlap the download of the new work unit with the finishing of the old work when it reaches 99% complete. I think that is why the Folding people never bought into BOINC, but developed their own control app. Maybe BOINC could adopt some part of it?
That's one possibility, but not quite a realistic one.
It is clear, that BOINC is not the perfect choice for GPUGrid (or for any project), if there's a shortage of work.
ID: 42078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42079 - Posted: 1 Nov 2015, 17:44:24 UTC - in response to Message 42078.  
Last modified: 1 Nov 2015, 17:44:59 UTC

This is ok, but the BOINC client-server architecture is not working this way, because there's the concept of "reporting completed tasks": it's not just the host has to upload the result, it has to be reported to the server to begin further processing (awarding credits for it, comparing results from different host for validation, creating and issuing new tasks, etc). That's why the "report_results_immediately" option is recommended for GPUGrid users. But there's no preemptive way for sending workunits to hosts in BOINC.



OK, so lets get back to basics;

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

Project is then as fast as it can get given the resources that are available to it.

Now I will wait until someone comes up with an objection due to their internet connection is slow, etc. There is only so much this project can do to keep everyone happy, which, as we know is impossible.

This project should implement policies that increase and benefit the efficiency of the project and scientists.
ID: 42079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42080 - Posted: 1 Nov 2015, 18:01:37 UTC - in response to Message 42079.  
Last modified: 1 Nov 2015, 18:03:25 UTC

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

OK with me, but I can (and do) accomplish that now with zero resource share. However, they could grease the wheels a little by providing more granularity in their bonus system. They are probably not prepared to do a full "Quick Return Bonus", which calculates the bonus on a continuous exponential curve, but they could provide more steps. For example, rather than just 24 and 48 hour bonus levels, they could start at 6 hours and have increments every six hours until maybe 36 hours. That would keep an incentive for the slower cards, while rewarding the faster cards appropriately. They can work out the numbers to suit themselves, but the infrastructure seems to be more or less in place for that already.
ID: 42080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Betting Slip

Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 42081 - Posted: 1 Nov 2015, 18:12:50 UTC - in response to Message 42080.  

One WU per GPU and you don't get another one until

Last WU is uploaded and credit is granted.

OK with me, but I can (and do) accomplish that now with zero resource share. However, they could grease the wheels a little by providing more granularity in their bonus system. They are probably not prepared to do a full "Quick Return Bonus", which calculates the bonus on a continuous exponential curve, but they could provide more steps. For example, rather than just 24 and 48 hour bonus levels, they could start at 6 hours and have increments every six hours until maybe 36 hours. That would keep an incentive for the slower cards, while rewarding the faster cards appropriately. They can work out the numbers to suit themselves, but the infrastructure seems to be more or less in place for that already.


I'll certainly +1 that
ID: 42081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : JIT (Just In Time) or Queue and Wait?

©2026 Universitat Pompeu Fabra