Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 50 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59215 - Posted: 10 Sep 2022, 17:57:37 UTC

One of my machines started a Python task yesterday evening and finished it after about 24-1/ 2hours.
How come that a runtime (and CPU time) of 1,354,433.00 secs (=376 hrs) is shown:

https://www.gpugrid.net/result.php?resultid=33030599

As a side effect, I did not get any credit bonus (in this case the one for finishing within 48 hrs).
ID: 59215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 59216 - Posted: 10 Sep 2022, 18:11:25 UTC - in response to Message 59215.  

One of my machines started a Python task yesterday evening and finished it after about 24-1/ 2hours.
How come that a runtime (and CPU time) of 1,354,433.00 secs (=376 hrs) is shown:

https://www.gpugrid.net/result.php?resultid=33030599

As a side effect, I did not get any credit bonus (in this case the one for finishing within 48 hrs).


The calculated runtime is using the cpu time. Has been mentioned many times. It’s because more than one core was being used. So the sum of each core’s cpu time is what’s shown.

You did get 48hr bonus of 25%. Base credit is 70,000. You got 87,500 (+25%). Less than 24hrs gets +50% for 105,000.
ID: 59216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59217 - Posted: 10 Sep 2022, 21:14:39 UTC

GPUGRID seems to have problems with figures, at least what concerns Python :-(
I just wanted to download a new Python task. On my Ramdisk there is about 59GB free disk space, but the BOINC event log tells me that Python needs some 532MB more disk space. How come?
ID: 59217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 59218 - Posted: 10 Sep 2022, 23:34:03 UTC - in response to Message 59217.  
Last modified: 10 Sep 2022, 23:36:01 UTC

GPUGRID seems to have problems with figures, at least what concerns Python :-(
I just wanted to download a new Python task. On my Ramdisk there is about 59GB free disk space, but the BOINC event log tells me that Python needs some 532MB more disk space. How come?


probably due to your allocation of disk usage in BOINC. go into the compute preferences and allow BOINC to use more disk space. by default I think it is set to 50% of the disk drive. you might need to increase that.

Options-> Computing Preferences...
Disk and Memory tab

and set whatever limits you think are appropriate. it will use the most restrictive of the 3 types of limits. The Python tasks take up a lot of space.
ID: 59218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59221 - Posted: 11 Sep 2022, 4:50:08 UTC - in response to Message 59218.  


probably due to your allocation of disk usage in BOINC. go into the compute preferences and allow BOINC to use more disk space. by default I think it is set to 50% of the disk drive. you might need to increase that.

Options-> Computing Preferences...
Disk and Memory tab

and set whatever limits you think are appropriate. it will use the most restrictive of the 3 types of limits. The Python tasks take up a lot of space.

no, it isn't that.
I am aware of these setting. Since nothing else than BOINC is being done on this computer, disk and RAM usage are set to 90% for BOINC.
So, when I have some 58GB free on a 128GB RAM disk (with some 60GB free system RAM), it should normally be no problem for Python to download and being processed.
On another machine, I have a lot less ressources, and it works.
So no idea, what the problem is in this case ... :-(
ID: 59221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59222 - Posted: 11 Sep 2022, 6:12:13 UTC

Or BOINC doesn't consider a RAM Disk a "real" drive and ignores the available storage there.

Could be BOINC only considers physical storage to be valid.
ID: 59222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59223 - Posted: 11 Sep 2022, 6:42:55 UTC - in response to Message 59222.  

Or BOINC doesn't consider a RAM Disk a "real" drive and ignores the available storage there.

Could be BOINC only considers physical storage to be valid.

no, I have BOINC running on another PC with Ramdisk - in that case a much smaller one: 32GB
ID: 59223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59224 - Posted: 11 Sep 2022, 6:56:19 UTC

another question -

I think I read something concerning this topic somewhere here, but I cannot find the posting any more (maybe though I am mistaken):

Is there the possibility to limit (by app_config.xml) the number of CPU cores Python is using?
The reason why I am asking is that on that machine onto which Python can be downloaded, I have also another project (not GPU) running, and when Python fills up the number of available cores, the CPU is busy with 100% which slows things down, and also heats up the CPU much more.
ID: 59224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59225 - Posted: 11 Sep 2022, 8:05:35 UTC
Last modified: 11 Sep 2022, 8:06:55 UTC

No. You cannot alter the task configuration. It will always create 32 spawned processes for each task during computation.

If the task is interfering with your other cpu tasks then you have a choice, either stop the Python tasks or reduce your other cpu tasks.

All you can do for making the Python task run reasonably well is assign 3-5 cpu cores for BOINC scheduling to keep other cpu work off the host.

You can do that through a app_config.xml file in the project directory.

Like this:

<app_config>

<app>
<name>PythonGPU</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>3.0</cpu_usage>
</gpu_versions>
</app>

</app_config>
ID: 59225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59226 - Posted: 11 Sep 2022, 12:21:35 UTC - in response to Message 59225.  

...
All you can do for making the Python task run reasonably well is assign 3-5 cpu cores for BOINC scheduling to keep other cpu work off the host.

You can do that through a app_config.xml file in the project directory.
Like this: ...

thanks, Keith, for your explanation.

Well, I actually would not need to put in this app_config.xml as in my case; the other BOINC tasks don't just asign any number of CPU cores by themselves. I tell each of these projects by a seperate app_config.xml how many cores to use (which I was, in fact, also hoping for Python).
So I have no other choice than to live with the situation as is :-(

What is too bad though is that obviously there are no longer any ACEMD tasks being sent out (where it is basically clear: 1 task = 1 CPU core [unless changed by an app_config.xml]).
ID: 59226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59228 - Posted: 11 Sep 2022, 15:27:14 UTC - in response to Message 59223.  

Or BOINC doesn't consider a RAM Disk a "real" drive and ignores the available storage there.

Could be BOINC only considers physical storage to be valid.

no, I have BOINC running on another PC with Ramdisk - in that case a much smaller one: 32GB


Now I tried once more to download a Python on my system with a 128GB Ramdisk (plus 128GB system RAM).
The BOINC event log says:

Python apps for GPU hosts needs 4590.46MB more disk space. You currently have 28788.14 MB available and it needs 33378.60 MB.

Somehow though all this does not fit together: in reality, the Ramdisk is filled with 73GB and has 55GB available.
Further, I am questioning whether Python indeed needs 33.378 MB free disk space for downloading?

I am really frustrated that this does not work :-(
ID: 59228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59229 - Posted: 11 Sep 2022, 15:30:21 UTC - in response to Message 59226.  
Last modified: 11 Sep 2022, 15:33:39 UTC

...
All you can do for making the Python task run reasonably well is assign 3-5 cpu cores for BOINC scheduling to keep other cpu work off the host.

You can do that through a app_config.xml file in the project directory.
Like this: ...

thanks, Keith, for your explanation.

Well, I actually would not need to put in this app_config.xml as in my case; the other BOINC tasks don't just asign any number of CPU cores by themselves. I tell each of these projects by a seperate app_config.xml how many cores to use (which I was, in fact, also hoping for Python).
So I have no other choice than to live with the situation as is :-(

What is too bad though is that obviously there are no longer any ACEMD tasks being sent out (where it is basically clear: 1 task = 1 CPU core [unless changed by an app_config.xml]).


You are not understanding the nature of the Python tasks. They are not using all your cores. They are not using 32 cores. They are using 32 spawned processes

A process is NOT a core.

The Python task use from 100-300% of a cpu core depending on the speed of the host and the number of cores in the host.

That is why I offered the app_config.xml file to allot 3 cpu cores to each Python task for BOINC scheduling purposes. And you can have many app_config.xml files in play among all your projects as a app_config file is specific to each project and is placed into the projects folder. You certainly can use one for scheduling help for GPUGrid.

A app_config file does not control the number of cores a task uses. That is dependent soley on the science application. A task will use as many or as little cores as needed.

The only exception to that fact is in the special case of plan_class MT like the cpu tasks at Milkyway. Then BOINC has an actual control parameter --nthreads that can specifically set the number of cores allowed in the MT plan_class task.

That cannot be used here because the Python tasks are not a simple cpu only MT type task. They are something completely different and something that BOINC does not know how to handle. They are a dual cpu-gpu combination task where the majority of computation is done on a cpu with bursts of activity on a gpu and then computation repeats that action.

It would take a major rewrite of core BOINC code to properly handle this type of machine-learning, reinforcement learning combo tasks. Unless BOINC attracts new developers that are willing to tackle this major development hurdle, the best we can do is just accommodate these tasks through other host controls.
ID: 59229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59230 - Posted: 11 Sep 2022, 15:40:07 UTC

Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page.

That is what is limiting your Downloads.
ID: 59230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59231 - Posted: 11 Sep 2022, 16:47:19 UTC - in response to Message 59230.  

Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page.

That is what is limiting your Downloads.

I had removed these checkmarks already before.
What I did now was to stop new Rosetta tasks (which also need a lot of disk space for their VM files), so the free disk space climbed up to about 80GB - only then the Python download worked. Strange, isn't it?
ID: 59231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 59232 - Posted: 11 Sep 2022, 17:19:36 UTC - in response to Message 58980.  

The reason Reinforcement Learning agents do not currently use the whole potential of the cards is because the interactions between the AI agent and the simulated environment are performed on CPU while the agent "learning" process is the one that uses the GPU intermittently.

There are, however, environments that only use GPU. They are becoming more and more common, so I see it as a real possibility that in the future most popular benchmarks of the field use only GPU. Then the jobs will be much more efficient since pretty much only GPU will be used. Unfortunately we are not there yet...


a suggestion for whenever you're able to move to to pure GPU work. PLEASE look into and enable "automatic mixed precision" in your code.

https://pytorch.org/docs/stable/notes/amp_examples.html

this should greatly benefit those devices which have Tensor cores. to speed things up.

ID: 59232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59233 - Posted: 11 Sep 2022, 18:48:40 UTC - in response to Message 59231.  

Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page.

That is what is limiting your Downloads.

I had removed these checkmarks already before.
What I did now was to stop new Rosetta tasks (which also need a lot of disk space for their VM files), so the free disk space climbed up to about 80GB - only then the Python download worked. Strange, isn't it?

I think your issue is your use of a fixed ram disk size instead of a dynamic pagefile that is allowed to grow larger as needed.
ID: 59233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59234 - Posted: 11 Sep 2022, 20:06:29 UTC - in response to Message 59233.  

Make sure there are NO checkmarks on any selection in the Disk and memory tab of the BOINC Manager Options >> Computing Preferences page.

That is what is limiting your Downloads.

I had removed these checkmarks already before.
What I did now was to stop new Rosetta tasks (which also need a lot of disk space for their VM files), so the free disk space climbed up to about 80GB - only then the Python download worked. Strange, isn't it?

I think your issue is your use of a fixed ram disk size instead of a dynamic pagefile that is allowed to grow larger as needed.

I just noticed the same problem with Rosetta Python tasks. So this may be in some kind of relation with the Python architecture.
Also in the Rosetta case, the actual disk space available was significantly higher than Rosetta said it would need.
So I don't believe that this has anything to do with the fixed ram disk size. What is the logic behind your assumption?
ID: 59234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59235 - Posted: 12 Sep 2022, 0:58:35 UTC - in response to Message 59234.  

If you read the through the various posts, including mine, or investigate the issues with Pytorch on Windows, it is because of the nature of how Windows handles reservation of memory addresses compared to how Linux handles that.

The Pytorch libraries when downloaded and expanded ask for many gigabytes of memory. Windows has to set aside every bit of memory space that the application asks for whether it will be needed or not. Linux does not have to abide by this fact since it handles memory allocation dynamically automatically.

And since every Python task is likely different, there is no reuse of the previous Pytorch libraries likely, so every task needs to get all of its configured resources every time a new task is executed.

So the best method to satisfy this fact on Windows is to start with a 35GB minimum size pagefile with a 50GB maximum size and allow the pagefile to size dynamically between that range. Your fixed ram disk size just isn't flexible enough or large enough apparently. That pagefile size seems to be sufficient for the other Windows users I have assisted with these tasks.

Read this explanation please for the actual particulars of the problem with Windows. https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908

ID: 59235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59236 - Posted: 12 Sep 2022, 6:54:49 UTC - in response to Message 59235.  

So the best method to satisfy this fact on Windows is to start with a 35GB minimum size pagefile with a 50GB maximum size and allow the pagefile to size dynamically between that range. Your fixed ram disk size just isn't flexible enough or large enough apparently. That pagefile size seems to be sufficient for the other Windows users I have assisted with these tasks.

thanks for the hint, I will adapt the page file size accordingly and see what happens.
ID: 59236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59237 - Posted: 12 Sep 2022, 14:43:46 UTC - in response to Message 59213.  

Not sure if it would have made a difference, but I would have placed your code before line 433, only after importing os and sys

"""
if __name__ == "__main__":

import sys
sys.stderr.write("Starting!!\n")
import os

os.environ["MKL_DEBUG_CPU_TYPE"] = "5"

import platform
"""


ID: 59237 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra