Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 50 · Next

AuthorMessage
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59902 - Posted: 11 Feb 2023, 16:25:16 UTC

Good to see Zoltan here again, welcome back!๐Ÿ˜€
~~~~~~~~~~~~

I need to correct what I reported on the program data folder to KAMasud earlier. The folder is not hidden (as Erich56 noted) but is a system folder, so in windows I've had to enable access to system files and folders on a new install in order to see it. Just in case you're still having trouble.
ID: 59902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59903 - Posted: 11 Feb 2023, 21:10:57 UTC
Last modified: 11 Feb 2023, 21:35:20 UTC

they already fixed the overused CPU issue. it's now capped at 4x CPU threads and hard coded in the run.py script. but that is in addition to the 32 threads for the agents. there is no way to reduce that unless abouh wanted to use less agents, but i don't think he does at this time.


I am enjoying watching abouh gain prowess at scripting with each run, using less and less resources as they evolve. Real progress. Godspeed to abouh and crew.
ID: 59903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59904 - Posted: 11 Feb 2023, 23:36:24 UTC - in response to Message 59901.  

Is there a way to control the number of spawned threads?
there is no reason to do this anymore.
My reason to reduce their numbers is to run two tasks at the same time to increase GPU usage, because I need the full heat output of my GPUs to heat our apartment. As I saw it in "Task Manager" the CPU usage of the spawned tasks drops when I start the second task (my CPU doesn't have that many threads).
Could the GPU usage be increased somehow?

it's now capped at 4x CPU threads and hard coded in the run.py script. but that is in addition to the 32 threads for the agents.
there is no way to reduce that ...
I confirm that. I looked into that script, though I'm not very familiar with python. I've even tried to modify the num_env_processes in conf.yaml, but this file gets overwritten every time I restart the task, even though I removed the rights of the boinc user and the boinc group to write that file. :)

if you want to run python tasks, you need to account for this and just tell BOINC to reserve some extra CPU resources by setting a larger value for the cpu_usage in app_config. i use values between 8-10. but you can experiment with what you are happy with. on my python dedicated system, I stop all other CPU projects as that gives the best performance.
That's clear I did that.
ID: 59904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59905 - Posted: 12 Feb 2023, 1:05:32 UTC

Good to see you Zoltan.
ID: 59905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59906 - Posted: 12 Feb 2023, 1:10:21 UTC - in response to Message 59902.  

Good to see Zoltan here again, welcome back!๐Ÿ˜€
~~~~~~~~~~~~

I need to correct what I reported on the program data folder to KAMasud earlier. The folder is not hidden (as Erich56 noted) but is a system folder, so in windows, I've had to enable access to system files and folders on a new install in order to see it. Just in case you're still having trouble.



Pop, there used to be two Program folders as I remember. Program and Program 32. Now there is a hidden Program System folder. Three in all.
ID: 59906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59907 - Posted: 12 Feb 2023, 1:10:24 UTC - in response to Message 59902.  

Good to see Zoltan here again, welcome back!๐Ÿ˜€
~~~~~~~~~~~~

I need to correct what I reported on the program data folder to KAMasud earlier. The folder is not hidden (as Erich56 noted) but is a system folder, so in windows, I've had to enable access to system files and folders on a new install in order to see it. Just in case you're still having trouble.



Pop, there used to be two Program folders as I remember. Program and Program 32. Now there is a hidden Program System folder. Three in all.
ID: 59907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59908 - Posted: 12 Feb 2023, 1:18:15 UTC - in response to Message 59904.  
Last modified: 12 Feb 2023, 1:20:40 UTC

Is there a way to control the number of spawned threads?
there is no reason to do this anymore.
My reason to reduce their numbers is to run two tasks at the same time to increase GPU usage, because I need the full heat output of my GPUs to heat our apartment. As I saw it in "Task Manager" the CPU usage of the spawned tasks drops when I start the second task (my CPU doesn't have that many threads).
Could the GPU usage be increased somehow?


If you need the heat output of the GPU, then you need to run a different project. Or only run ACEMD3 tasks when they are available. You will not get it from the Python tasks in their current state.

You can increase the GPU use by adding more tasks concurrently. But not to the extent that you expect or need. I run 4x tasks on my A4000s but they still donโ€™t even have full utilization. Usually only like 40% and ~100W avg power draw. Two tasks arenโ€™t gonna cut it for increasing utilization by any substantial amount.
ID: 59908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59909 - Posted: 12 Feb 2023, 6:36:52 UTC - in response to Message 59905.  

Good to see you Zoltan.

+1
ID: 59909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59910 - Posted: 12 Feb 2023, 22:48:44 UTC - in response to Message 59904.  

...I need the full heat output of my GPUs to heat our apartment...


It's been a bit chilly in my basement "computer lab/mancave" running these this winter, but I'm saving power($) so I'm bearing it. I just hope they last into summer so I can stay cool here in the humid Mississippi river valley of Illinois.

I've had some success running Einstein GPU tasks concurrently with Pythons and saw full GPU usage, although there is of course a longer completion time for both tasks.
ID: 59910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59912 - Posted: 13 Feb 2023, 1:29:20 UTC - in response to Message 59908.  
Last modified: 13 Feb 2023, 1:32:48 UTC

If you need the heat output of the GPU, then you need to run a different project.
I came to that conclusion, again.

Or only run ACEMD3 tasks when they are available.
I caught 2 or 3, that's why I put 3 host back to GPUGrid.

You will not get it [the full GPU heat output] from the Python tasks in their current state.
That's regrettable, but it could be ok for me this spring.

My main issue with the python app is that I think there's no point running that many spawned (training) threads, as their total (combined) memory access operations cause massive amount of CPU L3 cache misses, hindering each other's performace.
Before I've put my i9-12900F host back to GPUGrid, I run 7 TN-Grid tasks + 1 FAH GPU task simultaneously on that host, the average processing time was 4080-4200 sec for the TN-Grid tasks.
Now I run 1 GPUGrid task + 1 TN-Grid task simultaneously, and the processing time of the TN-Grid task went up to 4660-4770 sec. Compared to the 6 other TN-Grid tasks plus a FAH task the GPUGrid python task cause a 14% performance loss.
You can see the change in processing times for yourself here.
If I run only 1 TN-Grid task (no GPU tasks) on that host, the processing time is 3800 seconds. Compared to that, running a GPUGrid pythnon task cause a 22% performance loss.
Perhaps this app should do a short benchmark of the given CPU it's actually running on to establish the ideal number of training threads, or give some control of that number for the advanced users like me :) to do that benchmarking of their respective systems.
ID: 59912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59913 - Posted: 13 Feb 2023, 1:43:43 UTC - in response to Message 59912.  

I don't think you understand what the intention of the researcher is here. he wants 32 agents and the whole experiment is designed around 32 agents. and agent training happens on the CPU, so each agent needs its own process. you can't just arbitrarily reduce this number without the researcher making the change for everyone. it would fundamentally change the research. you could only reduce the number of agents with a new/different experiment.

or make MASSIVE changes to the code to push it all into the GPU, but likely most GPUs wouldn't have enough VRAM to run it and everyone would be complaining about that instead.
ID: 59913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59914 - Posted: 13 Feb 2023, 7:27:30 UTC - in response to Message 59913.  
Last modified: 13 Feb 2023, 7:27:54 UTC

Hello everyone,

this is exactly correct, agents collect data from their interaction with the environment (running on CPU), and the data is posteriorly used to update the neural network that controls action selection (on GPU).

Having multiple agents allows to collect data in parallel, speeding up training.
ID: 59914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59915 - Posted: 13 Feb 2023, 15:04:18 UTC

I think I am going a bit mad, I set the app_config file to use 0.33 GPU to try and get more units running at the same time, I then remembered 2 is the max, however this config when running 2 seemed to go faster, units completed 25% in about 3 hours, normally I think the units take a lot longer than this.
I will need to take a week to so to double-check this though.
What's the optimal config at the moment? this is my current one:

<app_config>
<app>
<name>PythonGPU</name>
<gpu_versions>
<cpu_usage>8</cpu_usage>
<gpu_usage>0.5</gpu_usage>
</gpu_versions>
</app>
</app_config>
ID: 59915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59916 - Posted: 13 Feb 2023, 19:17:59 UTC - in response to Message 59915.  
Last modified: 13 Feb 2023, 19:40:30 UTC

Ryan, here's what works for me:

<app_config>
<app>
<name>PythonGPU</name>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemd3</name>
<max_concurrent>2</max_concurrent>
<fraction_done_exact/>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<project_max_concurrent>2</project_max_concurrent>
<report_results_immediately/>
</app_config>

You can change the numbers whenever ACEMDs are available and allow them to run concurrent with a Python.

You will need to adjust the CPU figures to match your present appconfig.

(Many thanks Richard Hazelgrove, for helping me upthread)
ID: 59916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59917 - Posted: 14 Feb 2023, 16:09:12 UTC

Thanks, is 1 CPU per python unit enough? what times are you getting per unit? when I run 8 threads per unit and other tasks on the spare threads my CPU is always running at 100%.
ID: 59917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59918 - Posted: 14 Feb 2023, 17:10:42 UTC

It is not about how many threads your machine has, it is about how many tasks you can run alongside a Python. I have a six-core, twelve threads but can only run three Einstein WUs and my CPU peaks at 82%. A fine balancing act is required and sometimes a GPUGrid WU arrives and I have to suspend other work.
I have also reached the limit of my 16GB RAM(sometimes) other times? These AI WUs seem to be outdoing us. Monitoring is also required. Pop will explain.
ID: 59918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59920 - Posted: 14 Feb 2023, 19:35:24 UTC
Last modified: 14 Feb 2023, 19:36:00 UTC

Anybody else getting sent Python tasks for the old 1121 app? I have been using the newer 1131 app and it has worked fine on all tasks.

I don't even have the old 1121 app anymore since I did a project reset to use the new python job file for reduced cpu usage.

The 1121 app tasks are instant erroring out.
ID: 59920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59921 - Posted: 14 Feb 2023, 20:10:49 UTC - in response to Message 59920.  

Anybody else getting sent Python tasks for the old 1121 app?

not so far
ID: 59921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59922 - Posted: 14 Feb 2023, 20:33:08 UTC - in response to Message 59921.  

Based on the number of _x issues of these tasks and everyone else erroring out, must be a scheduler issue.
ID: 59922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59923 - Posted: 14 Feb 2023, 23:20:18 UTC

I've received some of them so far. they fail within like 10 seconds.

looks like someone at the project put the old v4.01 linux app up. these seem not compatible with the new experiment. I'm guessing someone enabled that application by accident.

abouh, you probably need to pull this app version back down to prevent it from being sent out. and leave the working v4.03 up.
ID: 59923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · 48 · 49 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra