Message boards :
News :
Experimental Python tasks (beta) - task description
Message board moderation
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 50 · Next
| Author | Message |
|---|---|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One of my machines started a Python task yesterday evening and finished it after about 24-1/ 2hours. How come that a runtime (and CPU time) of 1,354,433.00 secs (=376 hrs) is shown: https://www.gpugrid.net/result.php?resultid=33030599 As a side effect, I did not get any credit bonus (in this case the one for finishing within 48 hrs). |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
One of my machines started a Python task yesterday evening and finished it after about 24-1/ 2hours. The calculated runtime is using the cpu time. Has been mentioned many times. It’s because more than one core was being used. So the sum of each core’s cpu time is what’s shown. You did get 48hr bonus of 25%. Base credit is 70,000. You got 87,500 (+25%). Less than 24hrs gets +50% for 105,000.
|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPUGRID seems to have problems with figures, at least what concerns Python :-( I just wanted to download a new Python task. On my Ramdisk there is about 59GB free disk space, but the BOINC event log tells me that Python needs some 532MB more disk space. How come? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
GPUGRID seems to have problems with figures, at least what concerns Python :-( probably due to your allocation of disk usage in BOINC. go into the compute preferences and allow BOINC to use more disk space. by default I think it is set to 50% of the disk drive. you might need to increase that. Options-> Computing Preferences... Disk and Memory tab and set whatever limits you think are appropriate. it will use the most restrictive of the 3 types of limits. The Python tasks take up a lot of space.
|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
no, it isn't that. I am aware of these setting. Since nothing else than BOINC is being done on this computer, disk and RAM usage are set to 90% for BOINC. So, when I have some 58GB free on a 128GB RAM disk (with some 60GB free system RAM), it should normally be no problem for Python to download and being processed. On another machine, I have a lot less ressources, and it works. So no idea, what the problem is in this case ... :-( |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Or BOINC doesn't consider a RAM Disk a "real" drive and ignores the available storage there. Could be BOINC only considers physical storage to be valid. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Or BOINC doesn't consider a RAM Disk a "real" drive and ignores the available storage there. no, I have BOINC running on another PC with Ramdisk - in that case a much smaller one: 32GB |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
another question - I think I read something concerning this topic somewhere here, but I cannot find the posting any more (maybe though I am mistaken): Is there the possibility to limit (by app_config.xml) the number of CPU cores Python is using? The reason why I am asking is that on that machine onto which Python can be downloaded, I have also another project (not GPU) running, and when Python fills up the number of available cores, the CPU is busy with 100% which slows things down, and also heats up the CPU much more. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
No. You cannot alter the task configuration. It will always create 32 spawned processes for each task during computation. If the task is interfering with your other cpu tasks then you have a choice, either stop the Python tasks or reduce your other cpu tasks. All you can do for making the Python task run reasonably well is assign 3-5 cpu cores for BOINC scheduling to keep other cpu work off the host. You can do that through a app_config.xml file in the project directory. Like this: <app_config> <app> <name>PythonGPU</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>3.0</cpu_usage> </gpu_versions> </app> </app_config> |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... thanks, Keith, for your explanation. Well, I actually would not need to put in this app_config.xml as in my case; the other BOINC tasks don't just asign any number of CPU cores by themselves. I tell each of these projects by a seperate app_config.xml how many cores to use (which I was, in fact, also hoping for Python). So I have no other choice than to live with the situation as is :-( What is too bad though is that obviously there are no longer any ACEMD tasks being sent out (where it is basically clear: 1 task = 1 CPU core [unless changed by an app_config.xml]). |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Or BOINC doesn't consider a RAM Disk a "real" drive and ignores the available storage there. Now I tried once more to download a Python on my system with a 128GB Ramdisk (plus 128GB system RAM). The BOINC event log says: Python apps for GPU hosts needs 4590.46MB more disk space. You currently have 28788.14 MB available and it needs 33378.60 MB. Somehow though all this does not fit together: in reality, the Ramdisk is filled with 73GB and has 55GB available. Further, I am questioning whether Python indeed needs 33.378 MB free disk space for downloading? I am really frustrated that this does not work :-( |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
... You are not understanding the nature of the Python tasks. They are not using all your cores. They are not using 32 cores. They are using 32 spawned processes A process is NOT a core. The Python task use from 100-300% of a cpu core depending on the speed of the host and the number of cores in the host. That is why I offered the app_config.xml file to allot 3 cpu cores to each Python task for BOINC scheduling purposes. And you can have many app_config.xml files in play among all your projects as a app_config file is specific to each project and is placed into the projects folder. You certainly can use one for scheduling help for GPUGrid. A app_config file does not control the number of cores a task uses. That is dependent soley on the science application. A task will use as many or as little cores as needed. The only exception to that fact is in the special case of plan_class MT like the cpu tasks at Milkyway. Then BOINC has an actual control parameter --nthreads that can specifically set the number of cores allowed in the MT plan_class task. That cannot be used here because the Python tasks are not a simple cpu only MT type task. They are something completely different and something that BOINC does not know how to handle. They are a dual cpu-gpu combination task where the majority of computation is done on a cpu with bursts of activity on a gpu and then computation repeats that action. It would take a major rewrite of core BOINC code to properly handle this type of machine-learning, reinforcement learning combo tasks. Unless BOINC attracts new developers that are willing to tackle this major development hurdle, the best we can do is just accommodate these tasks through other host controls. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page. That is what is limiting your Downloads. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page. I had removed these checkmarks already before. What I did now was to stop new Rosetta tasks (which also need a lot of disk space for their VM files), so the free disk space climbed up to about 80GB - only then the Python download worked. Strange, isn't it? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 4,772 Level ![]() Scientific publications
|
The reason Reinforcement Learning agents do not currently use the whole potential of the cards is because the interactions between the AI agent and the simulated environment are performed on CPU while the agent "learning" process is the one that uses the GPU intermittently. a suggestion for whenever you're able to move to to pure GPU work. PLEASE look into and enable "automatic mixed precision" in your code. https://pytorch.org/docs/stable/notes/amp_examples.html this should greatly benefit those devices which have Tensor cores. to speed things up.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page. I think your issue is your use of a fixed ram disk size instead of a dynamic pagefile that is allowed to grow larger as needed. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page. I just noticed the same problem with Rosetta Python tasks. So this may be in some kind of relation with the Python architecture. Also in the Rosetta case, the actual disk space available was significantly higher than Rosetta said it would need. So I don't believe that this has anything to do with the fixed ram disk size. What is the logic behind your assumption? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 662 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
If you read the through the various posts, including mine, or investigate the issues with Pytorch on Windows, it is because of the nature of how Windows handles reservation of memory addresses compared to how Linux handles that. The Pytorch libraries when downloaded and expanded ask for many gigabytes of memory. Windows has to set aside every bit of memory space that the application asks for whether it will be needed or not. Linux does not have to abide by this fact since it handles memory allocation dynamically automatically. And since every Python task is likely different, there is no reuse of the previous Pytorch libraries likely, so every task needs to get all of its configured resources every time a new task is executed. So the best method to satisfy this fact on Windows is to start with a 35GB minimum size pagefile with a 50GB maximum size and allow the pagefile to size dynamically between that range. Your fixed ram disk size just isn't flexible enough or large enough apparently. That pagefile size seems to be sufficient for the other Windows users I have assisted with these tasks. Read this explanation please for the actual particulars of the problem with Windows. https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908 |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So the best method to satisfy this fact on Windows is to start with a 35GB minimum size pagefile with a 50GB maximum size and allow the pagefile to size dynamically between that range. Your fixed ram disk size just isn't flexible enough or large enough apparently. That pagefile size seems to be sufficient for the other Windows users I have assisted with these tasks. thanks for the hint, I will adapt the page file size accordingly and see what happens. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Not sure if it would have made a difference, but I would have placed your code before line 433, only after importing os and sys """ if __name__ == "__main__": import sys sys.stderr.write("Starting!!\n") import os os.environ["MKL_DEBUG_CPU_TYPE"] = "5" import platform """ |
©2025 Universitat Pompeu Fabra