4 GPUs, 4 CPUs, running dry of wu

Author	Message
[AF>HFR>RR] ThierryH Send message Joined: 18 May 07 Posts: 22 Credit: 6,623,223 RAC: 0 Level Scientific publications	Message 3402 - Posted: 27 Oct 2008, 7:31:36 UTC I have a machine with a 4 cores CPU (Q6600) and 4 GPUs (2x9800GX2). Due to the one wu per cpu, my machine is running dry several hours each time a wu is finished. When wu is finish, scheduler starts upload result and asks for a new wu. For the server, wu isn't finish yet and answer the limitation. At this moment, on the client, next access to server is deleyed for several hours and the machine is running dry. ID: 3402 · Rating: 0 · rate: / Reply Quote

[BOINC@Poland]AiDec Send message Joined: 2 Sep 08 Posts: 53 Credit: 9,213,937 RAC: 0 Level Scientific publications	Message 3404 - Posted: 27 Oct 2008, 8:25:39 UTC Last modified: 27 Oct 2008, 8:26:45 UTC (Sry for bad english) I have written about similar problem in other thread (I have 3x280GTX). And 4 sure there will be more ppl (as me and u) with this specific problem. My suggestion is to rise (rise up?) WU/daily quota (I need MINIMUM 12, but I would like to have more - if some errors...) and WU per GPU (up to 2 or 3 per GPU - to have bigger stock in case...). ID: 3404 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 3406 - Posted: 27 Oct 2008, 8:36:27 UTC - in response to Message 3404. The limit is already 12. Also we will be able to test a new feature to assign 1 wu per GPU soon instead of per CPU. g ID: 3406 · Rating: 0 · rate: / Reply Quote

[AF>HFR>RR] ThierryH Send message Joined: 18 May 07 Posts: 22 Credit: 6,623,223 RAC: 0 Level Scientific publications	Message 3408 - Posted: 27 Oct 2008, 8:46:16 UTC - in response to Message 3406. GDF, I think that 1 wu per gpu can't solve this trouble because of the scheduler work. 2 will be fine. Thanks ID: 3408 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 3423 - Posted: 27 Oct 2008, 22:22:05 UTC - in response to Message 3408. Yes, we will schedule more than 1 per GPU. gdf ID: 3423 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 3425 - Posted: 27 Oct 2008, 22:26:38 UTC Yes, 2 WUs per GPU is definitely needed. The current BOINC is totally unaware of how long a GPU-WU takes and it does not "remember" the server message "Mate, you've had enough! Finish your meal first and report back to me to get some more." So it keeps requesting new WUs until it gets timeouts of several hours. Finishing a WU does not reset this timeout, so there would be really lot's of idle time with 1 WU per GPU. MrS Scanning for our furry friends since Jan 2002 ID: 3425 · Rating: 0 · rate: / Reply Quote

Edboard Send message Joined: 24 Sep 08 Posts: 72 Credit: 12,410,275 RAC: 0 Level Scientific publications	Message 3885 - Posted: 18 Nov 2008, 20:22:56 UTC Last modified: 18 Nov 2008, 20:23:42 UTC I have a similar problem. I have two GPUs in a PC with a two cores CPU. I have been waiting, client upgrade after client upgrade, the "2 or more WU/GPU" option, but it doesn't arrive. ID: 3885 · Rating: 0 · rate: / Reply Quote

Krunchin-Keith [USA] Send message Joined: 17 May 07 Posts: 512 Credit: 111,288,061 RAC: 0 Level Scientific publications	Message 3888 - Posted: 18 Nov 2008, 21:49:14 UTC - in response to Message 3402. I have a machine with a 4 cores CPU (Q6600) and 4 GPUs (2x9800GX2). Due to the one wu per cpu, my machine is running dry several hours each time a wu is finished. When wu is finish, scheduler starts upload result and asks for a new wu. For the server, wu isn't finish yet and answer the limitation. At this moment, on the client, next access to server is deleyed for several hours and the machine is running dry. If you have any always on connection, set your interval to 0. Also you can force the cleint to return results immediately. You could use this in your case until the gpu prefs are in place. You need to put in your cc_config.xml in options file the command for this. This should force it to report immediately and I would think it would recognize at that time more work is needed and the dry spell will be shorter, only while the upload is taking place. As soon as the upload finished it will be reported and that should let the server know you now have room for 1 more. <report_results_immediately>1</report_results_immediately> ID: 3888 · Rating: 0 · rate: / Reply Quote

fractal Send message Joined: 16 Aug 08 Posts: 87 Credit: 1,248,879,715 RAC: 0 Level Scientific publications	Message 3891 - Posted: 18 Nov 2008, 22:50:08 UTC - in response to Message 3425. Yes, 2 WUs per GPU is definitely needed. I think n+1 WU's per system is a better compromise. It keeps fewer WU's in circulation while eliminating downtime between units. Of course, this reduces to 2 WUs per GPU when you only have one GPU. But, four GPU's would get five WU's. Four in process and one ready to go. ID: 3891 · Rating: 0 · rate: / Reply Quote

[BOINC@Poland]AiDec Send message Joined: 2 Sep 08 Posts: 53 Credit: 9,213,937 RAC: 0 Level Scientific publications	Message 3893 - Posted: 19 Nov 2008, 0:03:00 UTC - in response to Message 3891. Yes, 2 WUs per GPU is definitely needed. I think n+1 WU's per system is a better compromise. It keeps fewer WU's in circulation while eliminating downtime between units. Of course, this reduces to 2 WUs per GPU when you only have one GPU. But, four GPU's would get five WU's. Four in process and one ready to go. n*2 would be better (I mean more logically) ;) ID: 3893 · Rating: 0 · rate: / Reply Quote

Krunchin-Keith [USA] Send message Joined: 17 May 07 Posts: 512 Credit: 111,288,061 RAC: 0 Level Scientific publications	Message 3895 - Posted: 19 Nov 2008, 15:02:36 UTC - in response to Message 3893. Yes, 2 WUs per GPU is definitely needed. I think n+1 WU's per system is a better compromise. It keeps fewer WU's in circulation while eliminating downtime between units. Of course, this reduces to 2 WUs per GPU when you only have one GPU. But, four GPU's would get five WU's. Four in process and one ready to go. n2 would be better (I mean more logically) ;) I assume by n you are meaning # of GPU's No n2 is not. That would assume all the GPU's are fast. If the user had four slower GPUs on a quad core, he would get 8 tasks, 4 running and 4 on standby. Quite possibly by the time the 4 running finish within the deadline, the 4 on standby would have no way to finish by thier deadline. The best would be # gpus + 1 extra. As soon as one finished, it could start the one on stadby, report and another would get downlaoded for standby ready for the next gpu which finishes. Even better would be to let the client or user determine how many extra to have, within a limit, so they could not get 99 extra, only a max of say 1 extra per gpu, but if they kept running into dealine trouble, the 1 per gpu could be reduced to 1 per host. The server can monitor average return time of work and could be made to adjust for this. Giving more work extra work to faster host that can return 2 tasks per gpu within the dealine and only 1 extra task per host for hosts that cannot or even none in case of really slow hosts that need all the time within the dealine to return a task. ID: 3895 · Rating: 0 · rate: / Reply Quote

Edboard Send message Joined: 24 Sep 08 Posts: 72 Credit: 12,410,275 RAC: 0 Level Scientific publications	Message 3896 - Posted: 19 Nov 2008, 15:18:23 UTC - in response to Message 3895. Last modified: 19 Nov 2008, 15:20:29 UTC ···The best would be # gpus + 1 extra·· That would be better that what we have now, but IT IS NOT THE BEST. When I had a PC with two GTX280 (OC) I was doing 1 WU every 5.5 hours each GPU and sometimes I stuck wiht both GPUs iddle for 9 hours. If I had had 2 WU's in cache, it would have been ideal for me. And think about people with 3 or 4 GPUs... ID: 3896 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 3904 - Posted: 19 Nov 2008, 21:10:33 UTC When I suggested 2n I had fast current gen (G92+) and future cards in mind, but I admit Keith has a point about slower hosts wanting less work at once. So I suggest as a solution for the medium time frame, when it is possible to have a separate cache setting for the GPU, to create an account setting which lets the user specify the number of WUs to cache. The limit could be between 0 and 2n. A good long term solution would be as Keith suggested, having some smart system determine the amount of work that a host can handle, possibly supplemented by a user preference (e.g. a larger cache for users who have fast cards and restricted but reliable internet access). MrS Scanning for our furry friends since Jan 2002 ID: 3904 · Rating: 0 · rate: / Reply Quote

[BOINC@Poland]AiDec Send message Joined: 2 Sep 08 Posts: 53 Credit: 9,213,937 RAC: 0 Level Scientific publications	Message 3921 - Posted: 22 Nov 2008, 2:24:33 UTC - in response to Message 3895. Last modified: 22 Nov 2008, 2:28:56 UTC I assume by n you are meaning # of GPU's Sure :). But I have to agree with Edboard: That would be better that what we have now, but IT IS NOT THE BEST. When I had a PC with two GTX280 (OC) I was doing 1 WU every 5.5 hours each GPU and sometimes I stuck wiht both GPUs iddle for 9 hours. If I had had 2 WU's in cache, it would have been ideal for me. And think about people with 3 or 4 GPUs... I`m exactly in the same situation. Just it`s 3x280GTX + OC 10%. I`m sorry, but I can`t imagine how you can manage `enough much` work for them, to do not let them stay idle without n*2. Let`s imagine, n+1: 1 WU is finished. OK. After (e.g.) 1 hour next WU is finished... After (e.g.) 30 min next one is ready... Do you know what I mean? And right now (just now) I wondered one solution. It could be possible (n+1) if BM will use `hard rule/policy` to connect e.g. every 15 min. to the server. Anyway sometimes GPUs could stay idle, but 1 min. to maximum 15 min. I can understand :). ID: 3921 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Volunteer moderator Volunteer tester Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level Scientific publications	Message 3923 - Posted: 22 Nov 2008, 11:49:40 UTC - in response to Message 3921. And right now (just now) I wondered one solution. It could be possible (n+1) if BM will use `hard rule/policy` to connect e.g. every 15 min. to the server. Anyway sometimes GPUs could stay idle, but 1 min. to maximum 15 min. I can understand :). That should work, with some modification of the contact-server algorithm. The benefit would be to assure minimum wu-return latency while keeping idle time reasonably small. The drawback is that people wouldn't have much of a cache, depending on GPU speed and number. So I imagine in case of the occasional network hiccup people would complain that their GPUs enter idle mode too quickly. At least I would be pissed if I had 3 GTX280 sitting idle for hours.. ;) MrS Scanning for our furry friends since Jan 2002 ID: 3923 · Rating: 0 · rate: / Reply Quote

[BOINC@Poland]AiDec Send message Joined: 2 Sep 08 Posts: 53 Credit: 9,213,937 RAC: 0 Level Scientific publications	Message 3933 - Posted: 23 Nov 2008, 0:56:04 UTC Last modified: 23 Nov 2008, 1:06:01 UTC Yep, problems with internet connection can always occur. Just before ur last post I thought about using <report_results_immediately>, but it will not help when no connection ;) :P. Well, then I`m back with n*2. But I`m still thinking ;). Mby better solution will be to use a lot of small WU`s (I mean something like WU with crounching time about 1hour at 9800GTX), with rules typical as for other projects? It will be automatically great solution for ppl with slow graphic cards, who are asking for longer deadline (as Krzychu P. - just to remind). ID: 3933 · Rating: 0 · rate: / Reply Quote

Krunchin-Keith [USA] Send message Joined: 17 May 07 Posts: 512 Credit: 111,288,061 RAC: 0 Level Scientific publications	Message 3949 - Posted: 23 Nov 2008, 16:50:36 UTC Well, been giving this some thought. Everyone has a this works best for me, unfortunately they are all different. Maybe once actual gpu prefs are separated from cpu prefs things will be better. It is clear that gpu's need their own prefs, separate from cpu's, especially since the gpu is now being set to run all the time, or at least the two can share the same setting, but treated as two separate elements within the client's work fetch policy. Now I'm thinking that even a limit of ngpu*2 would not be enough for some of the faster (5 hour) gpus's especially if they have a connect problem or only connect once a day, an always on connection cannot be assumed. They could easily use 5 per gpu per day and easily complete those within deadline. Even a three day connect interval could be used and still they could complete 15 per gpu within deadline. The limit needs to be based on a per gpu capability to complete work within the deadline, cache/connect interval, and not on a fixed max allowed. Since we have now between a 1 (slow gpu) to 20 (fast gpu) task need, within deadline, we cannot have a fixed limit for all. Even users with multiple gpus of different speed need a different number of tasks per day per gpu. Each gpu per host needs to be treated separately from the others. BOINC needs to have a separate cache for gpu's just as it has for cpu's. All the prefs need to be used for both cpu and gpu cache's (such as connect interval), meaning not combined into one as some are now. If a user has specified a 24 hour cache, BOINC needs to have 24 hours of cpu work per cpu PLUS 24 hours of gpu work per gpu, adjusted to resource shares of course if you have multiple projects where cpu projects effect only the cpu cache and gpu projects effect only the gpu cache, and adjusted by the DCF to be sure that many tasks can be completed within deadline. The cpu cache needs to be reduced of course by the time the cpu part of the gpu tasks will use, which currently is different for different os's, linux only being 2% usage and windows being 40% useage. That coule be on a per hosts basis and averaged jsut as current DCF is for cpu projects. Again adjustments need to be made for slower gpu's that will not finish two tasks within the deadline, they would only get one at a time per gpu, otherwise the second waiting task would not finish within the deadline once it starts. Any gpus that can finish more within the deadline can have more, up to cache size if that many can be finished within deadline, per gpu. I'm sure BOINC can be made to do this, it will just take some time and effort to get the server scheduler and client work fetch and scheduler all fine tuned. I will pass this whole thread on to the boinc developer, if these changes have not already been started, at least this info will help shape what may come. ID: 3949 · Rating: 0 · rate: / Reply Quote