Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 . . . 50 · Next

AuthorMessage
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59775 - Posted: 19 Jan 2023, 21:38:55 UTC - in response to Message 59762.  

Can http://www.gpugrid.net/apps.php link be put next to Server status link?


I'd like to see this change in the website design also.

Would be much easier for access than having to manually edit the URL or find the one apps link in the main project JoinUs page.
ID: 59775 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59780 - Posted: 22 Jan 2023, 21:12:44 UTC - in response to Message 59762.  
Last modified: 22 Jan 2023, 21:49:18 UTC

Can http://www.gpugrid.net/apps.php link be put next to Server status link?


You might want to repost that on the wish list thread so it's there when the webmaster gets around to updating the site.

I fear they may be too busy at this time. I went ahead and put a link in my browser until then.

Thanks for posting that page link.
ID: 59780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59782 - Posted: 23 Jan 2023, 19:09:05 UTC - in response to Message 59676.  

Right now: ~ 14.200 "unsent" Python tasks in the queue.

I guess it will take a while until they all are processed.


now down to less than 500. these went much quicker than I anticipated. only about 3 weeks.
ID: 59782 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59783 - Posted: 23 Jan 2023, 20:30:42 UTC
Last modified: 23 Jan 2023, 20:32:19 UTC

So what again is going to be the status of the expected new application?

Beta to start with?

Removal of wandb?

New nthreads value?

New job_xxx.xml file?

New compilation for Ada devices?
ID: 59783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59784 - Posted: 23 Jan 2023, 20:43:34 UTC

Will the new app be fine on 1 CPU core or will it still require many? on my Windows box atm I have to manually allocate 24 cores to the WU so it does not get starved with other projects running at the same time.
ID: 59784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59787 - Posted: 23 Jan 2023, 22:04:11 UTC - in response to Message 59784.  
Last modified: 23 Jan 2023, 22:05:21 UTC

Pretty sure you are confusing cores with processes. The app will still spin out 32 python processes. Processes are not cores.

But from testing of the modified job.xml file, the new app will probably need as few as 4 cores/threads to run.
ID: 59787 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59788 - Posted: 24 Jan 2023, 4:04:42 UTC - in response to Message 59787.  

There are two separate mechanisms with this app spinning up multiple processes/threads. The fix will only reduce one of them. Since each task is training 32x agents at once, those 32 processes still spin up. The fix I helped uncover only addresses the unnecessary extra CPU usage from the n-cores extra processes spinning up. I’ve been running with those capped at 4. And it seems fine.

About Ada support, since this app is not really an “app” as it’s not a compiled binary, but just a script, it works fine with Ada already according to some other users running it on their 40-series cards. It’s the Acemd3 app that needs to be recompiled for Ada.
ID: 59788 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59789 - Posted: 24 Jan 2023, 7:51:35 UTC - in response to Message 59783.  
Last modified: 24 Jan 2023, 7:52:16 UTC

The job_xxx.xml will also remain the same, since the instructions are as simple as:

- 1. unpack the conda python environment with all required dependencies.
- 2. run the provided python script.
- 3. return result files.

So I am only changing the provided python script.

As Ian mentioned, it is not a compiled app. The only difference is that the packed conda environment contains cuda10 (10.2.89) or cuda11 (11.3.1) depending on the host GPU.

Is that enough to support ADA GPUs?
ID: 59789 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59790 - Posted: 24 Jan 2023, 8:10:17 UTC

Only 75 jobs in the queue! Thank you all for your support :)

I imagine will be all processed today. So as I mentioned in an earlier post, the next steps will be the following:

1- I will release a new version of our Reinforcement Learning library (https://github.com/PyTorchRL/pytorchrl), used in the python scripts to instantiate and train the AI agents.

2- I will send a small batch of PythonGPUBeta jobs with the new python script and also using the new version of the library.

3- If everything goes well, start sending PythonGPU tasks again.

I am interested in your feedback regarding whether or not the new scripts configuration is helpful in terms of efficiency. In my machine seems to work fine.
ID: 59790 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59791 - Posted: 24 Jan 2023, 9:25:00 UTC - in response to Message 59787.  
Last modified: 24 Jan 2023, 9:27:24 UTC

Yea it spins up that many processes but if I leave the app at default it will get choked because Boinc will only allocate 1 thread to it and the other projects running will take up the other 31 threads.
I manually allocate it 24 threads as this is about what I observed it running when I only ran that task and nothing else, this stops it from getting choked when running multiple projects.

What I would like to see is the app download and allocate however many threads it needs to complete the task automatically without needing a custom app_config file.
ID: 59791 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59793 - Posted: 25 Jan 2023, 5:13:08 UTC - in response to Message 59791.  

Yea it spins up that many processes but if I leave the app at default it will get choked because Boinc will only allocate 1 thread to it and the other projects running will take up the other 31 threads.
I manually allocate it 24 threads as this is about what I observed it running when I only ran that task and nothing else, this stops it from getting choked when running multiple projects.

What I would like to see is the app download and allocate however many threads it needs to complete the task automatically without needing a custom app_config file.


I, second that.
ID: 59793 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59794 - Posted: 25 Jan 2023, 11:12:06 UTC
Last modified: 25 Jan 2023, 11:12:28 UTC

I just released the new version of the python library and sent the beta tasks.
ID: 59794 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59795 - Posted: 25 Jan 2023, 11:36:43 UTC - in response to Message 59793.  
Last modified: 25 Jan 2023, 11:43:57 UTC

Is there any BOINC specifiable WU parameter for that? I could not find it but I would also like to avoid to the hosts having to manually change configuration if possible
ID: 59795 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 59796 - Posted: 25 Jan 2023, 13:43:09 UTC - in response to Message 59795.  

Use this
<app_config>
<app>
<name>PythonGPU</name>
<plan_class>cuda1131</plan_class>
<gpu_versions>
<cpu_usage>8</cpu_usage>
<gpu_usage>1</gpu_usage>
</gpu_versions>
<max_concurrent>1</max_concurrent>
<fraction_done_exact/>
</app>
</app_config>
ID: 59796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59797 - Posted: 25 Jan 2023, 14:14:12 UTC
Last modified: 25 Jan 2023, 14:15:17 UTC

Just grabbed one of the beta units and it still says Running (0.999 CPUs and 1 GPU) but it seems to be fluctuating between 50% and 100% load on my 32-thread CPU.
If the app is spinning up a ton of processes that need their own threads can the app reflect that and allocate however many threads are needed, please? so for example it should say "Running (32 CPUs and 1 GPU)" or however many it needs.

Would simplify things and I assume cut down on failed units from users who do not know the app spins up more than one process and run it on a single thread with other apps taking up the remainder.

Thanks

Edit after an initial 100% utilisation spike it's now settled down at around 30% - 40% CPU utilisation.
ID: 59797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59799 - Posted: 25 Jan 2023, 14:23:04 UTC - in response to Message 59796.  
Last modified: 25 Jan 2023, 14:52:11 UTC

But this is on the client side.

On the server side I see I can adjust these parameters for a given app: https://boinc.berkeley.edu/trac/wiki/JobIn

I am open to implement both solutions:

1- Force from the server side that host have more than 1 cpu, 4-8 for example (the tasks spawn 32 python threads but not 32 cpus are required to run them successfully). In case that is possible, but on the server I could not find any option to specify that so far..

2- Specify that 32 processes are being created. I can add it to the logs, but where else can I mention it so users are aware?
ID: 59799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59800 - Posted: 25 Jan 2023, 19:13:40 UTC

I don't see any parameter in the jobin page that allocates the number of cpus the task will tie up.

I don't know how the cpu resource is calculated. Must be internal in BOINC.

Richard Haselgrove probably knows the answer.

It varies among projects I've noticed. I think it is calculated internally in BOINC based on client benchmarks rating and the rsc_fpops_est value the work generator assigns tasks.

The user has been able to override the project default values with their own values via the app_config mechanismm.

But these values don't actually control how an app runs. Only the science app determines how much resources the task takes.

The cpu_usage value is only a way to help the client determine how many tasks can be run for scheduling purposes and how much work should be downloaded.

I'm currently running one of the beta tasks and it either runs faster or the workunit is smaller than normal. Probably the latter being beta.

I notice 3 processes running run.py on the task along with the 32 spawned processes. I don't remember the previous app spinning up more than the one run.py process.

I wonder if the 3 run.py processes are tied into my <cpu_usage>3.0</cpu_usage> setting in my app_config.xml file.
ID: 59800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59801 - Posted: 25 Jan 2023, 19:34:05 UTC - in response to Message 59800.  
Last modified: 25 Jan 2023, 19:34:47 UTC



I notice 3 processes running run.py on the task along with the 32 spawned processes. I don't remember the previous app spinning up more than the one run.py process.

I wonder if the 3 run.py processes are tied into my <cpu_usage>3.0</cpu_usage> setting in my app_config.xml file.


as you said earlier in your comment, the cpu_use only tells BOINC how much is being used. it does not exert any kind of "control" over the application directly.

the previous tasks spun up a run.py child process for every core. these would be linked to the parent process. you can see them in htop.

I have not been able to get any of these beta tasks myself (i got some very early morning before I got up, but they errored because of my custom edits) to see what might be going on. but there might be a problem with them still, some other users that got them seem to have errored.
ID: 59801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59802 - Posted: 25 Jan 2023, 21:18:12 UTC

I reset the project on all hosts prior to the release of the beta tasks to start with a clean slate.

I have one of the beta tasks running well so far. 6.5 hrs in so far at 75% completion.

GPUGRID 1.12 Python apps for GPU hosts beta (cuda1131) e00001a00027-ABOU_rnd_ppod_expand_demos29_betatest-0-1-RND7327_1 06:22:55 (15:21:33) 240.67 79.210 78d,21:06:03 1/30/2023 3:14:52 AM 0.998C + 1NV (d0) Running High P. Darksider

I looked at this tasks in htop and it is different than before. I am not talking about the 32 spawned python processes. I was referring to 3 separate run.py process PID's that are using about 20% cpu each besides the main one.



I hadn't configured my app_config.xml for the PythonGPUbeta before I picked up the task so I ended up with the default 0.998C core usage value rather than my normal 3.0 cpu value I have for the regular Python on GPU tasks.
ID: 59802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59803 - Posted: 25 Jan 2023, 22:04:06 UTC - in response to Message 59802.  
Last modified: 25 Jan 2023, 22:07:33 UTC

what you're showing in your screenshot is exactly what I saw before. the "green" processes are representative of the child processes. before, you would have a number of child threads in the same amount as the number of cores. on my 16-core system there would be 16 children, on the 24-core system there was 24 children, on the 64 core system there was 64 children. and so on, for each running task.

if you move the selected line but pushing the down arrows or select one of the child processes with the cursor, you should see the top line as white text, which is the parent main process. this is all normal.

check my screenshots from this message: https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59239
ID: 59803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra