Message boards :
News :
Experimental Python tasks (beta) - task description
Message board moderation
Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 . . . 50 · Next
| Author | Message |
|---|---|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Can http://www.gpugrid.net/apps.php link be put next to Server status link? I'd like to see this change in the website design also. Would be much easier for access than having to manually edit the URL or find the one apps link in the main project JoinUs page. |
|
Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]()
|
Can http://www.gpugrid.net/apps.php link be put next to Server status link? You might want to repost that on the wish list thread so it's there when the webmaster gets around to updating the site. I fear they may be too busy at this time. I went ahead and put a link in my browser until then. Thanks for posting that page link. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
Right now: ~ 14.200 "unsent" Python tasks in the queue. now down to less than 500. these went much quicker than I anticipated. only about 3 weeks.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
So what again is going to be the status of the expected new application? Beta to start with? Removal of wandb? New nthreads value? New job_xxx.xml file? New compilation for Ada devices? |
|
Send message Joined: 6 Mar 18 Posts: 38 Credit: 1,340,042,080 RAC: 27 Level ![]() Scientific publications
|
Will the new app be fine on 1 CPU core or will it still require many? on my Windows box atm I have to manually allocate 24 cores to the WU so it does not get starved with other projects running at the same time. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Pretty sure you are confusing cores with processes. The app will still spin out 32 python processes. Processes are not cores. But from testing of the modified job.xml file, the new app will probably need as few as 4 cores/threads to run. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
There are two separate mechanisms with this app spinning up multiple processes/threads. The fix will only reduce one of them. Since each task is training 32x agents at once, those 32 processes still spin up. The fix I helped uncover only addresses the unnecessary extra CPU usage from the n-cores extra processes spinning up. I’ve been running with those capped at 4. And it seems fine. About Ada support, since this app is not really an “app” as it’s not a compiled binary, but just a script, it works fine with Ada already according to some other users running it on their 40-series cards. It’s the Acemd3 app that needs to be recompiled for Ada.
|
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
The job_xxx.xml will also remain the same, since the instructions are as simple as: - 1. unpack the conda python environment with all required dependencies. - 2. run the provided python script. - 3. return result files. So I am only changing the provided python script. As Ian mentioned, it is not a compiled app. The only difference is that the packed conda environment contains cuda10 (10.2.89) or cuda11 (11.3.1) depending on the host GPU. Is that enough to support ADA GPUs? |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Only 75 jobs in the queue! Thank you all for your support :) I imagine will be all processed today. So as I mentioned in an earlier post, the next steps will be the following: 1- I will release a new version of our Reinforcement Learning library (https://github.com/PyTorchRL/pytorchrl), used in the python scripts to instantiate and train the AI agents. 2- I will send a small batch of PythonGPUBeta jobs with the new python script and also using the new version of the library. 3- If everything goes well, start sending PythonGPU tasks again. I am interested in your feedback regarding whether or not the new scripts configuration is helpful in terms of efficiency. In my machine seems to work fine. |
|
Send message Joined: 6 Mar 18 Posts: 38 Credit: 1,340,042,080 RAC: 27 Level ![]() Scientific publications
|
Yea it spins up that many processes but if I leave the app at default it will get choked because Boinc will only allocate 1 thread to it and the other projects running will take up the other 31 threads. I manually allocate it 24 threads as this is about what I observed it running when I only ran that task and nothing else, this stops it from getting choked when running multiple projects. What I would like to see is the app download and allocate however many threads it needs to complete the task automatically without needing a custom app_config file. |
|
Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level ![]() Scientific publications ![]()
|
Yea it spins up that many processes but if I leave the app at default it will get choked because Boinc will only allocate 1 thread to it and the other projects running will take up the other 31 threads. I, second that. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
I just released the new version of the python library and sent the beta tasks. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Is there any BOINC specifiable WU parameter for that? I could not find it but I would also like to avoid to the hosts having to manually change configuration if possible |
|
Send message Joined: 18 Jul 13 Posts: 79 Credit: 210,528,292 RAC: 0 Level ![]() Scientific publications
|
Use this <app_config> <app> <name>PythonGPU</name> <plan_class>cuda1131</plan_class> <gpu_versions> <cpu_usage>8</cpu_usage> <gpu_usage>1</gpu_usage> </gpu_versions> <max_concurrent>1</max_concurrent> <fraction_done_exact/> </app> </app_config> |
|
Send message Joined: 6 Mar 18 Posts: 38 Credit: 1,340,042,080 RAC: 27 Level ![]() Scientific publications
|
Just grabbed one of the beta units and it still says Running (0.999 CPUs and 1 GPU) but it seems to be fluctuating between 50% and 100% load on my 32-thread CPU. If the app is spinning up a ton of processes that need their own threads can the app reflect that and allocate however many threads are needed, please? so for example it should say "Running (32 CPUs and 1 GPU)" or however many it needs. Would simplify things and I assume cut down on failed units from users who do not know the app spins up more than one process and run it on a single thread with other apps taking up the remainder. Thanks Edit after an initial 100% utilisation spike it's now settled down at around 30% - 40% CPU utilisation. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
But this is on the client side. On the server side I see I can adjust these parameters for a given app: https://boinc.berkeley.edu/trac/wiki/JobIn I am open to implement both solutions: 1- Force from the server side that host have more than 1 cpu, 4-8 for example (the tasks spawn 32 python threads but not 32 cpus are required to run them successfully). In case that is possible, but on the server I could not find any option to specify that so far.. 2- Specify that 32 processes are being created. I can add it to the logs, but where else can I mention it so users are aware? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I don't see any parameter in the jobin page that allocates the number of cpus the task will tie up. I don't know how the cpu resource is calculated. Must be internal in BOINC. Richard Haselgrove probably knows the answer. It varies among projects I've noticed. I think it is calculated internally in BOINC based on client benchmarks rating and the rsc_fpops_est value the work generator assigns tasks. The user has been able to override the project default values with their own values via the app_config mechanismm. But these values don't actually control how an app runs. Only the science app determines how much resources the task takes. The cpu_usage value is only a way to help the client determine how many tasks can be run for scheduling purposes and how much work should be downloaded. I'm currently running one of the beta tasks and it either runs faster or the workunit is smaller than normal. Probably the latter being beta. I notice 3 processes running run.py on the task along with the 32 spawned processes. I don't remember the previous app spinning up more than the one run.py process. I wonder if the 3 run.py processes are tied into my <cpu_usage>3.0</cpu_usage> setting in my app_config.xml file. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
as you said earlier in your comment, the cpu_use only tells BOINC how much is being used. it does not exert any kind of "control" over the application directly. the previous tasks spun up a run.py child process for every core. these would be linked to the parent process. you can see them in htop. I have not been able to get any of these beta tasks myself (i got some very early morning before I got up, but they errored because of my custom edits) to see what might be going on. but there might be a problem with them still, some other users that got them seem to have errored.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I reset the project on all hosts prior to the release of the beta tasks to start with a clean slate. I have one of the beta tasks running well so far. 6.5 hrs in so far at 75% completion. GPUGRID 1.12 Python apps for GPU hosts beta (cuda1131) e00001a00027-ABOU_rnd_ppod_expand_demos29_betatest-0-1-RND7327_1 06:22:55 (15:21:33) 240.67 79.210 78d,21:06:03 1/30/2023 3:14:52 AM 0.998C + 1NV (d0) Running High P. Darksider I looked at this tasks in htop and it is different than before. I am not talking about the 32 spawned python processes. I was referring to 3 separate run.py process PID's that are using about 20% cpu each besides the main one. I hadn't configured my app_config.xml for the PythonGPUbeta before I picked up the task so I ended up with the default 0.998C core usage value rather than my normal 3.0 cpu value I have for the regular Python on GPU tasks. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 5,269 Level ![]() Scientific publications
|
what you're showing in your screenshot is exactly what I saw before. the "green" processes are representative of the child processes. before, you would have a number of child threads in the same amount as the number of cores. on my 16-core system there would be 16 children, on the 24-core system there was 24 children, on the 64 core system there was 64 children. and so on, for each running task. if you move the selected line but pushing the down arrows or select one of the child processes with the cursor, you should see the top line as white text, which is the parent main process. this is all normal. check my screenshots from this message: https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59239
|
©2025 Universitat Pompeu Fabra