Python apps for GPU hosts 4.03 (cuda1131) using a LOT of CPU

Author	Message
Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 59320 - Posted: 25 Sep 2022, 17:12:25 UTC - in response to Message 59315. It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download? WCG and LHC have it. ID: 59320 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59321 - Posted: 25 Sep 2022, 17:50:05 UTC - in response to Message 59319. Last modified: 25 Sep 2022, 17:58:23 UTC You did not read what I actually wrote before making your snide remark. That will not fix anything. I read it. Maybe you didn’t explain the problem well enough or include enough relevant details about your current configuration. GPU assignment happens through BOINC. It’s nothing to do with the science app. The application gets assigned which GPU to use and sticks to it, it cannot just “decide” to use a different GPU if the one pre-selected is already is use by another process. This is the case for all projects and not something specific to GPUGRID, so it’s a little strange that you would think that GPUGRID could somehow act this way when no other project does. If you want the Python tasks to use separate GPUs, you can employ a strategy like I outlined in my previous post. Or just reconfigure it to use 1 task per GPU. Not sure what issue you’re having with CPU resources. In earlier testing with a 8-core Intel CPU, I had no problem running 3 tasks on a single GPU. If my 8c CPU had enough resources, surely your 18-c one does as well. So it’s down to your BOINC settings and other projects being run most likely. ID: 59321 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59322 - Posted: 25 Sep 2022, 18:09:52 UTC - in response to Message 59320. I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download? WCG and LHC have it. I can't say for LHC, but I do not see any setting like that for WCG. in the device profiles there is only the similar WU cache setting in terms of days. if LHC has this setting, it's the exception to what every other BOINC project does. but of course. BOINC is open source. the code is freely available to change whatever you don't like and compile your own. ID: 59322 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level Scientific publications	Message 59323 - Posted: 25 Sep 2022, 18:22:18 UTC - in response to Message 59315. It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download? WCG does that. Only one that I know of. The app defaults to Device 0 but you can kick it off that device by using an exclude_gpu statement in cc_config. ID: 59323 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59324 - Posted: 25 Sep 2022, 19:00:17 UTC - in response to Message 59323. Last modified: 25 Sep 2022, 19:44:32 UTC It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download? WCG does that. Only one that I know of. The app defaults to Device 0 but you can kick it off that device by using an exclude_gpu statement in cc_config. Again, WHERE? I don’t see anything like that in WCG settings. Only day cache settings. This app does not default to device 0. It will use whatever device that BOINC tells it to. This isn’t like the SRBase app that is hard coded for device0. Saying “defaults to 0” in reference to “uses the first GPU available” when all GPUs are available is redundant in a normal system with GPU0 allowed to be used. 0 will always be first and always the default. but it will still use the others as more tasks spin up or the first gpu isnt available ID: 59324 · Rating: 0 · rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,250,309,169 RAC: 2,157 Level Scientific publications	Message 59325 - Posted: 25 Sep 2022, 19:05:06 UTC - in response to Message 59320. It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download? WCG and LHC have it. PrimeGrid, too. You can also set the cc_config tag <fetch_minimal_work>0\|1</fetch_minimal_work> (fetch one job per device) to receive no work as long as something's running (at least in theory - didn't try it myself). - - - - - - - - - - Greetings, Jens ID: 59325 · Rating: 0 · rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,250,309,169 RAC: 2,157 Level Scientific publications	Message 59326 - Posted: 25 Sep 2022, 19:11:00 UTC - in response to Message 59324. Again, WHERE? I don’t see anything like that in WCG settings. Only day cache settings. https://www.worldcommunitygrid.org/ms/device/viewBoincProfileConfiguration.do?name=Default Just scroll down. - - - - - - - - - - Greetings, Jens ID: 59326 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59327 - Posted: 25 Sep 2022, 19:19:07 UTC - in response to Message 59326. Again, WHERE? I don’t see anything like that in WCG settings. Only day cache settings. https://www.worldcommunitygrid.org/ms/device/viewBoincProfileConfiguration.do?name=Default Just scroll down. thanks for the precise reply. i finally see it now. (I had landed this page prior, but missed the exact section) not infinitely configurable, but provides limits up to 64 tasks. wont let you limit to say 100 if you wanted. but better than nothing. but this is still the exception rather than the rule for BOINC. only projects using customizations of the BOINC server platform are going to be doing this. ID: 59327 · Rating: 0 · rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,250,309,169 RAC: 2,157 Level Scientific publications	Message 59328 - Posted: 25 Sep 2022, 19:29:05 UTC - in response to Message 59319. Last modified: 25 Sep 2022, 19:32:30 UTC That way it will mix the projects, one from each on one GPU. 0.6+0.4=1. But 0.6+0.6>1 so it won’t start two from GPUGRID on the same GPU. It will go to the next GPU with open resources. That will not fix anything. I'm not certain what the exact problem is you are referring to, so I can only give as many pointers as I can imagine. This might or might not help: You might just try to set the GPUGrid tasks to <gpu_usage>0.6</gpu_usage> and don't run a second project. I actually don't know what happens then, but I'm interested in it. So, if you try, please tell us. :-) Else experiment with the tags <fetch_minimal_work>, <ngpus>, <max_concurrent>, try to set <gpu_usage> to 1.1 or 2.0, or whatever else comes to mind regarding client configuration. You might also setup two Boinc instances and give one GPU to each of them. edit With Boinc client 7.14.x you could also set resource share and the Boinc caches to 0. That way you should not get any work as long as a device is occupied. end edit - - - - - - - - - - Greetings, Jens ID: 59328 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59329 - Posted: 25 Sep 2022, 19:39:10 UTC - in response to Message 59328. Last modified: 25 Sep 2022, 19:42:28 UTC You might just try to set the GPUGrid tasks to <gpu_usage>0.6</gpu_usage> and don't run a second project. I actually don't know what happens then, but I'm interested in it. So, if you try, please tell us. :-) this is exactly what my suggestion was. I've done it. what happens is that BOINC will run 1 GPUGRID task on the GPU, but not two. and allow another project to run alongside GPUGRID. in BOINC's resource accounting logic it sees that it has 0.4 of the GPU resources remaining/free. so it can only fill that spot with tasks defined to use 0.4 or less. that is why it wont spin up a second job, because another GPUGRID task is defined to use 0.6 and that's too large to fit in the 0.4 "hole", and also the reason I suggested to run other project's tasks at 0.4. now if you also want that secondary project to not be able to run 2x on the same GPU, then you're sort of stuck, and probably need to employ the multiple client solution. but personally I don't like to do that. but you're right, if this doesn't solve his problem, then he should be more specific about what the problem actually is. ID: 59329 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level Scientific publications	Message 59330 - Posted: 25 Sep 2022, 21:25:46 UTC - in response to Message 59324. Last modified: 25 Sep 2022, 21:26:26 UTC This app does not default to device 0. It will use whatever device that BOINC tells it to. This isn’t like the SRBase app that is hard coded for device0. On my multi-gpu hosts, the task always run on BOINC device #0 when only running a single task. BOINC assigns device #0 to the most capable card. But I don't want the task on my best cards because I want them to run other gpu projects. So I exclude gpus #0 and #1 so that the python tasks run on my least capable card since card capability has almost no effect on these tasks. ID: 59330 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59331 - Posted: 25 Sep 2022, 23:24:16 UTC - in response to Message 59330. it’s likely that with the crazy long estimated run times, the task is going into high priority mode. In that case it has equal chance to interrupt any GPU BOINC grabs the first one. A BOINC function of high priority mode, not the app specifically trying to run on GPU0 for any particular reason. ID: 59331 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level Scientific publications	Message 59332 - Posted: 26 Sep 2022, 2:23:46 UTC - in response to Message 59331. Last modified: 26 Sep 2022, 2:27:48 UTC it’s likely that with the crazy long estimated run times, the task is going into high priority mode. In that case it has equal chance to interrupt any GPU BOINC grabs the first one. A BOINC function of high priority mode, not the app specifically trying to run on GPU0 for any particular reason. My PythonGPU tasks always run in high-priority mode from start to finish. I've never seen a python task *EVER* occupy any gpu other than #0 *unless* I explictly prevent it using a gpu_exclude statement. Countless tasks have been run through my gpus with countless opportunities to run on gpu's #1 and #2. But I have seen brand new tasks in "waiting to run state", wait until gpu#0 comes free of running Einstein and MW tasks so it can jump onto gpu #0. ID: 59332 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59333 - Posted: 26 Sep 2022, 2:51:45 UTC - in response to Message 59332. Last modified: 26 Sep 2022, 3:08:34 UTC I know. I’m saying it’s the High priority, and the fact that you’re only running one task, that’s putting it on GPU0. Not inherent to the application. High priority will supersede ALL lower priority tasks and make all devices available to it. So it picks the first one. Which is 0. If you allowed two tasks to download and run (at 1x task per GPU, with no excludes), the second task will run on the next GPU. They won’t all stack up on GPU 0 or anything like that if it were because of the application. It’s just running where BOINC tells it to under the circumstances. How do you think any tasks run on my second GPU? I’m not doing anything special. Just only running these tasks and everything acts normally as you’d expect since all tasks are equally “high priority” and it’s business as usual. ID: 59333 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level Scientific publications	Message 59334 - Posted: 26 Sep 2022, 4:33:28 UTC OK, I understand what you are describing. I on the other hand have never seen any other behavior so am describing only what I have seen. I've set the tasks up for 0.5 per gpu and still only see tasks on #0. So how do you get your tasks to *not* run high-priority? I haven't figured out that magic recipe yet. ID: 59334 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59338 - Posted: 26 Sep 2022, 13:20:43 UTC - in response to Message 59334. I've set the tasks up for 0.5 per gpu and still only see tasks on #0. That’s not what I meant. Leave it at 1.0 gpu_usage but allow Pandora to download 2 tasks. Since GPU 1 will be occupied, it will spin up a task on GPU 2. Setting 0.5 gpu_usage is the same situation with high priority. They will both go to the first GPU. So how do you get your tasks to *not* run high-priority? I haven't figured out that magic recipe yet. Mine are running high priority. But when all six tasks are labeled as high priority, it doesn’t really make any difference. They run the same as if there’s no high priority because there’s no low priority tasks for them to fight with. This system ONLY runs PythonGPU. No other tasks and no other projects. ID: 59338 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level Scientific publications	Message 59339 - Posted: 26 Sep 2022, 13:35:39 UTC - in response to Message 59338. I already tried the download limit set to 2. But I was still at 0.5 gpu_usage because sharing cards with Einstein, MW and WCG. Just spun up both tasks on gpu#0. I am not arguing that the tasks will always run on gpu#0 in all situations, just that I have not been able so far to get them to move anywhere else for my computing environments. I am going to stop messing with the configuration as I have finally reduced the large impact on the cpu task running times when I moved to 1 python task running with 0.5 gpu_usage. ID: 59339 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 59342 - Posted: 26 Sep 2022, 14:37:49 UTC - in response to Message 59339. there's no problem with you setting your config that way with excludes if that works for you. I just wanted it to be clear that the reason it's going to GPU0 is because of high priority and BOINC behavior, NOT anything inherent to or hard coded in the application itself. the root cause is the incorrect remaining calculation time, but the direct cause is how BOINC manages tasks in high priority mode. the app is just doing what BOINC tells it to. ID: 59342 · Rating: 0 · rate: / Reply Quote

captainjack Send message Joined: 9 May 13 Posts: 171 Credit: 4,610,046,466 RAC: 260,573 Level Scientific publications	Message 59344 - Posted: 26 Sep 2022, 20:49:08 UTC Keith, I hope I am understanding your question correctly, Have you set the <use_all_gpus> flag in the cc_config.xml file to "1"? Requires a restart if the flag needs to be turned on. ID: 59344 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level Scientific publications	Message 59345 - Posted: 26 Sep 2022, 21:11:18 UTC - in response to Message 59344. Keith, I hope I am understanding your question correctly, Have you set the <use_all_gpus> flag in the cc_config.xml file to "1"? Requires a restart if the flag needs to be turned on. Thanks for the reply. No that is all fine. This has do with some advanced configuration stuff beyond the standard client. Ian and I were discussing stuff having to do with our custom team client. You don't even had to have that options flag in your cc_config file if your gpus are all the same. They all get used automatically. This daily driver has (3) 2080 cards of the same make and they all get used without a config parameter. You only need that parameter if your cards are dissimilar enough that BOINC considers them of different enough compute capability. Then BOINC will only use the most capable card if you don't use the parameter. ID: 59345 · Rating: 0 · rate: / Reply Quote