Message boards :
Number crunching :
Python apps for GPU hosts 4.03 (cuda1131) using a LOT of CPU
Message board moderation
Previous · 1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. WCG and LHC have it. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
You did not read what I actually wrote before making your snide remark. That will not fix anything. I read it. Maybe you didn’t explain the problem well enough or include enough relevant details about your current configuration. GPU assignment happens through BOINC. It’s nothing to do with the science app. The application gets assigned which GPU to use and sticks to it, it cannot just “decide” to use a different GPU if the one pre-selected is already is use by another process. This is the case for all projects and not something specific to GPUGRID, so it’s a little strange that you would think that GPUGRID could somehow act this way when no other project does. If you want the Python tasks to use separate GPUs, you can employ a strategy like I outlined in my previous post. Or just reconfigure it to use 1 task per GPU. Not sure what issue you’re having with CPU resources. In earlier testing with a 8-core Intel CPU, I had no problem running 3 tasks on a single GPU. If my 8c CPU had enough resources, surely your 18-c one does as well. So it’s down to your BOINC settings and other projects being run most likely.
|
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download? I can't say for LHC, but I do not see any setting like that for WCG. in the device profiles there is only the similar WU cache setting in terms of days. if LHC has this setting, it's the exception to what every other BOINC project does. but of course. BOINC is open source. the code is freely available to change whatever you don't like and compile your own.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. WCG does that. Only one that I know of. The app defaults to Device 0 but you can kick it off that device by using an exclude_gpu statement in cc_config. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. Again, WHERE? I don’t see anything like that in WCG settings. Only day cache settings. This app does not default to device 0. It will use whatever device that BOINC tells it to. This isn’t like the SRBase app that is hard coded for device0. Saying “defaults to 0” in reference to “uses the first GPU available” when all GPUs are available is redundant in a normal system with GPU0 allowed to be used. 0 will always be first and always the default. but it will still use the others as more tasks spin up or the first gpu isnt available
|
|
Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,248,809,169 RAC: 0 Level ![]() Scientific publications ![]()
|
It would really be donor-friendly to give us the ability to specify the number of WUs we want DLed for a given Preferences group as many other BOINC projects do. It just needs to be turned on. I’ve not seen this functionality on any other BOINC project. They all go by your local cache (number of days) setting in BOINC itself. What project let’s you explicitly specify number of tasks to download?
PrimeGrid, too. You can also set the cc_config tag <fetch_minimal_work>0|1</fetch_minimal_work> (fetch one job per device) to receive no work as long as something's running (at least in theory - didn't try it myself). - - - - - - - - - - Greetings, Jens |
|
Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,248,809,169 RAC: 0 Level ![]() Scientific publications ![]()
|
Again, WHERE? I don’t see anything like that in WCG settings. Only day cache settings. https://www.worldcommunitygrid.org/ms/device/viewBoincProfileConfiguration.do?name=Default Just scroll down. - - - - - - - - - - Greetings, Jens |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
Again, WHERE? I don’t see anything like that in WCG settings. Only day cache settings. thanks for the precise reply. i finally see it now. (I had landed this page prior, but missed the exact section) not infinitely configurable, but provides limits up to 64 tasks. wont let you limit to say 100 if you wanted. but better than nothing. but this is still the exception rather than the rule for BOINC. only projects using customizations of the BOINC server platform are going to be doing this.
|
|
Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,248,809,169 RAC: 0 Level ![]() Scientific publications ![]()
|
That way it will mix the projects, one from each on one GPU. 0.6+0.4=1. But 0.6+0.6>1 so it won’t start two from GPUGRID on the same GPU. It will go to the next GPU with open resources. That will not fix anything. I'm not certain what the exact problem is you are referring to, so I can only give as many pointers as I can imagine. This might or might not help: You might just try to set the GPUGrid tasks to <gpu_usage>0.6</gpu_usage> and don't run a second project. I actually don't know what happens then, but I'm interested in it. So, if you try, please tell us. :-) Else experiment with the tags <fetch_minimal_work>, <ngpus>, <max_concurrent>, try to set <gpu_usage> to 1.1 or 2.0, or whatever else comes to mind regarding client configuration. You might also setup two Boinc instances and give one GPU to each of them. *edit* With Boinc client 7.14.x you could also set resource share and the Boinc caches to 0. That way you should not get any work as long as a device is occupied. *end edit* - - - - - - - - - - Greetings, Jens |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
this is exactly what my suggestion was. I've done it. what happens is that BOINC will run 1 GPUGRID task on the GPU, but not two. and allow another project to run alongside GPUGRID. in BOINC's resource accounting logic it sees that it has 0.4 of the GPU resources remaining/free. so it can only fill that spot with tasks defined to use 0.4 or less. that is why it wont spin up a second job, because another GPUGRID task is defined to use 0.6 and that's too large to fit in the 0.4 "hole", and also the reason I suggested to run other project's tasks at 0.4. now if you also want that secondary project to not be able to run 2x on the same GPU, then you're sort of stuck, and probably need to employ the multiple client solution. but personally I don't like to do that. but you're right, if this doesn't solve his problem, then he should be more specific about what the problem actually is.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
On my multi-gpu hosts, the task always run on BOINC device #0 when only running a single task. BOINC assigns device #0 to the most capable card. But I don't want the task on my best cards because I want them to run other gpu projects. So I exclude gpus #0 and #1 so that the python tasks run on my least capable card since card capability has almost no effect on these tasks. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
it’s likely that with the crazy long estimated run times, the task is going into high priority mode. In that case it has equal chance to interrupt any GPU BOINC grabs the first one. A BOINC function of high priority mode, not the app specifically trying to run on GPU0 for any particular reason.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
it’s likely that with the crazy long estimated run times, the task is going into high priority mode. In that case it has equal chance to interrupt any GPU BOINC grabs the first one. A BOINC function of high priority mode, not the app specifically trying to run on GPU0 for any particular reason. My PythonGPU tasks always run in high-priority mode from start to finish. I've never seen a python task EVER occupy any gpu other than #0 unless I explictly prevent it using a gpu_exclude statement. Countless tasks have been run through my gpus with countless opportunities to run on gpu's #1 and #2. But I have seen brand new tasks in "waiting to run state", wait until gpu#0 comes free of running Einstein and MW tasks so it can jump onto gpu #0. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I know. I’m saying it’s the High priority, and the fact that you’re only running one task, that’s putting it on GPU0. Not inherent to the application. High priority will supersede ALL lower priority tasks and make all devices available to it. So it picks the first one. Which is 0. If you allowed two tasks to download and run (at 1x task per GPU, with no excludes), the second task will run on the next GPU. They won’t all stack up on GPU 0 or anything like that if it were because of the application. It’s just running where BOINC tells it to under the circumstances. How do you think any tasks run on my second GPU? I’m not doing anything special. Just only running these tasks and everything acts normally as you’d expect since all tasks are equally “high priority” and it’s business as usual.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
OK, I understand what you are describing. I on the other hand have never seen any other behavior so am describing only what I have seen. I've set the tasks up for 0.5 per gpu and still only see tasks on #0. So how do you get your tasks to not run high-priority? I haven't figured out that magic recipe yet. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I've set the tasks up for 0.5 per gpu and still only see tasks on #0. That’s not what I meant. Leave it at 1.0 gpu_usage but allow Pandora to download 2 tasks. Since GPU 1 will be occupied, it will spin up a task on GPU 2. Setting 0.5 gpu_usage is the same situation with high priority. They will both go to the first GPU. So how do you get your tasks to not run high-priority? Mine are running high priority. But when all six tasks are labeled as high priority, it doesn’t really make any difference. They run the same as if there’s no high priority because there’s no low priority tasks for them to fight with. This system ONLY runs PythonGPU. No other tasks and no other projects.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I already tried the download limit set to 2. But I was still at 0.5 gpu_usage because sharing cards with Einstein, MW and WCG. Just spun up both tasks on gpu#0. I am not arguing that the tasks will always run on gpu#0 in all situations, just that I have not been able so far to get them to move anywhere else for my computing environments. I am going to stop messing with the configuration as I have finally reduced the large impact on the cpu task running times when I moved to 1 python task running with 0.5 gpu_usage. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
there's no problem with you setting your config that way with excludes if that works for you. I just wanted it to be clear that the reason it's going to GPU0 is because of high priority and BOINC behavior, NOT anything inherent to or hard coded in the application itself. the root cause is the incorrect remaining calculation time, but the direct cause is how BOINC manages tasks in high priority mode. the app is just doing what BOINC tells it to.
|
|
Send message Joined: 9 May 13 Posts: 171 Credit: 4,594,296,466 RAC: 171 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Keith, I hope I am understanding your question correctly, Have you set the <use_all_gpus> flag in the cc_config.xml file to "1"? Requires a restart if the flag needs to be turned on. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Keith, Thanks for the reply. No that is all fine. This has do with some advanced configuration stuff beyond the standard client. Ian and I were discussing stuff having to do with our custom team client. You don't even had to have that options flag in your cc_config file if your gpus are all the same. They all get used automatically. This daily driver has (3) 2080 cards of the same make and they all get used without a config parameter. You only need that parameter if your cards are dissimilar enough that BOINC considers them of different enough compute capability. Then BOINC will only use the most capable card if you don't use the parameter. |
©2025 Universitat Pompeu Fabra