Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 50 · Next

AuthorMessage
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58640 - Posted: 13 Apr 2022, 15:16:35 UTC - in response to Message 58639.  

has the allowed limit changed to 30,000,000,000 bytes?
ID: 58640 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58641 - Posted: 13 Apr 2022, 16:19:28 UTC

Appears so.

<rsc_disk_bound>30000000000.000000</rsc_disk_bound>
ID: 58641 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 58642 - Posted: 13 Apr 2022, 19:28:33 UTC - in response to Message 58634.  
Last modified: 13 Apr 2022, 19:30:53 UTC

The size for all the app files (including the compressed environment) are:

2.0G for windows with cuda102
2.7G for windows with cuda1131
1.8G for linux with cuda102
2.6G for linux with cuda1131

The additional task specific data goes from a few KB to a few MB. I did not expect 7.8G compressed (not even after unpacking the environment). Is that the case for all PythonGPU tasks now?

Regarding CPU/GPU usage, this app actually uses a combination of both due to the nature of the problem we are tackling (training AI agent to develop intelligent behaviour in a simulated environment with reinforcement learning techniques). Interactions with the agent environment happen in CPU, learning happens in GPU.



Note: I was commenting on Rosetta at home CPU pythons.
What yours do, I don't know. I guess i had better add your project and see what happens.

I readded your project to my system, so if I am home when a task is sent out, I'll have a look.
ID: 58642 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58643 - Posted: 14 Apr 2022, 7:36:34 UTC - in response to Message 58642.  

Thank you!

I have added the subtask weights to the PythonGPUbeta app. Currently testing it with a small batch of tasks.
ID: 58643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58644 - Posted: 14 Apr 2022, 8:42:41 UTC
Last modified: 14 Apr 2022, 9:20:16 UTC

Testing was successful, so we can add the weights to the PythonGPU app job.xml file
ID: 58644 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 58655 - Posted: 15 Apr 2022, 21:20:06 UTC

abouh,

can you have a look at my comments in a thread I created.
The 4.0 task was not increasing in percentage done after watching it for 10 minutes. Time to completion kept jumping around 1 second up 1 second down.
40 minutes run time vs cpu time? That a hell of a lot of set up time!

Here are the local host task details
Application Python apps for GPU hosts 4.03 (cuda1131)
Workunit name e2a18-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND3898
State Running
Received 4/15/2022 12:06:46 PM
Report deadline 4/20/2022 12:06:46 PM
Estimated app speed 53.74 GFLOPs/sec
Estimated task size 1,000,000,000 GFLOPs
Resources 0.987 CPUs + 1 NVIDIA GPU (GTX 1050)
CPU time at last checkpoint 06:44:35
CPU time 06:47:39
Elapsed time 06:05:04
Estimated time remaining 198d,09:49:25
Fraction done 7.880%
Virtual memory size 7,230.02 MB
Working set size 2,057.87 MB
ID: 58655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey

Send message
Joined: 2 Jan 09
Posts: 303
Credit: 7,321,800,090
RAC: 270
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58666 - Posted: 17 Apr 2022, 20:16:19 UTC - in response to Message 58652.  

You can delete the previous post about ACMED3. I posted that incorrectly here.


Some forums let you put a double space or a double period to delete your own post, but you must still do it within the editing time
ID: 58666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 58669 - Posted: 18 Apr 2022, 12:27:00 UTC - in response to Message 58666.  

Mikey, I know. But the time limit expired on that post to edit it. I came back days later not within the 30-60 minutes allowed.
ID: 58669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 12 May 13
Posts: 5
Credit: 100,032,540
RAC: 0
Level
Cys
Scientific publications
wat
Message 58672 - Posted: 18 Apr 2022, 19:31:43 UTC

I am now running a Python task. It has a very low usage of my GPU most often around 5 to 10%, occasionally getting up to 20%. Is this normal? Should I wait until I move my GPU from an old 3770K to a 12500 computer for better CPU capabilities to do these tasks?
ID: 58672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58673 - Posted: 18 Apr 2022, 23:12:34 UTC - in response to Message 58672.  

This is normal for Python on GPU tasks. The tasks run on both the cpu and gpu during parts of the computation for the inferencing and machine learning segments.

Read the posts by the admin developer explaining what the process involves.

- cyclical GPU load is expected in Reinforcement Learning algorithms. Whenever GPU load in lower, CPU usage should increase. It is correct.
ID: 58673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58674 - Posted: 19 Apr 2022, 8:21:52 UTC - in response to Message 58655.  
Last modified: 19 Apr 2022, 8:24:36 UTC

Sorry for the late reply Greg _BE, I hid the ACEMD3 posts.

I checked your job e2a18-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND3898. Did the progress get stuck or was it just increasing slowly?

The job was finally completed by another Windows 10 host, but the CPU time is wrong because it says 668566.9 seconds.

I am not sure, but maybe one problem is that we ask only for 0.987 CPUs, since that was ideal for ACEMD jobs. In reality Python tasks use more. I will look into it.
ID: 58674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58675 - Posted: 19 Apr 2022, 8:25:47 UTC

New tasks being issued this morning, allocated to the old Linux v4.01 'Python app for GPU hosts' issued in October 2021.

All are failing with "ModuleNotFoundError: No module named 'yaml'".
ID: 58675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58676 - Posted: 19 Apr 2022, 8:38:26 UTC - in response to Message 58674.  

I am not sure, but maybe one problem is that we ask only for 0.987 CPUs, since that was ideal for ACEMD jobs. In reality Python tasks use more. I will look into it.

Asking for 1.00 CPUs (or above) would make a significant difference, because that would prompt the BOINC client to reduce the number of tasks being run for other projects.

It would be problematic to increase the CPU demand above 1.00, because the CPU loading is dynamic - BOINC has no provision for allowing another project to utilise the cycles available during periods when the GPUGrid app is quiescent. Normally, a GPU app is given a higher process priority for CPU usage than a pure CPU app, so the operating system should allocate resources to your advantage, but that can be problematic when the wrapper app is in use. That was changed recently: I'll look into the situation with your server version and our current client versions.
ID: 58676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58677 - Posted: 19 Apr 2022, 9:23:32 UTC - in response to Message 58675.  
Last modified: 19 Apr 2022, 9:24:44 UTC

Definitely only the latest version 403 should be sent. Thanks for letting us know.
ID: 58677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58678 - Posted: 19 Apr 2022, 12:01:04 UTC

BOINC GPU apps, wrapper apps, and process priority

The basic rule for BOINC applications (originally CPU only) has been to run applications at idle priority, to avoid interfering with foreground use of the computer.

Since the introduction of GPU apps into BOINC around 2008, the CPU portion of a GPU app has been automatically run at a slightly higher process priority (below normal) - an attempt to avoid highly-productive GPU work being throttled by competition for CPU resources.

Normally, the BOINC client manages these two different process priorities directly. But when a wrapper app is interpolated between the client and a worker app, it's the wrapper which sets the priority for the worker app. It was a user on this project who first noticed (Issue 3764 - May 2020) that the process priority of a GPU app wasn't being set correctly when it was executing under the control of a wrapper app.

Many false starts later (PRs 3826, 3948, 3988, 3999), a fully consistent set of process priority tools was developed, effective from about 25 September 2020.

But in order for these tools to be useful, compatible versions of both the BOINC client and the wrapper application have to be used. So far as I can tell, BOINC client for Windows v7.16.20 (current) is compliant; Wrapper version 26203 is compliant; but no full public release versions of the BOINC client for Linux are yet compliant (Gianfranco Costamagna's prototyping PPA client should be).

This project appears to be using wrapper code 26016 for Windows, and wrapper code 26198 for Linux. Unless these have been patched locally, neither wrapper will yet allow full process control management.

It's not urgent, but with the new Python apps running in a mixed CPU/GPU environment, it might be helpful to update the project's wrapper codebase. Fortunately, the basic server platform is unaffected by all this.
ID: 58678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58696 - Posted: 21 Apr 2022, 15:04:23 UTC - in response to Message 58675.  

We have deprecated v4.01

Hopefully, if everything went fine, the error

All are failing with "ModuleNotFoundError: No module named 'yaml'".


should not happen any more. And all jobs should use v4.03
ID: 58696 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 58752 - Posted: 27 Apr 2022, 18:52:49 UTC
Last modified: 27 Apr 2022, 19:41:00 UTC

abouh,

I got another python finally.
But here is something interesting, the CPU value according to BOINC Tasks is 221%!
How can you get more than 100% of a single core?
Another observation, elapsed time vs CPU time. The two are off by about 5 hours.
4:01 vs 8:54 currently
Progress is not moving very fast. In the time it has taken me to write this it is stuck at 7.88%
Now 4:16 to 9:24 and still 7.88%!!, 15 mins and no progress? If this hasn't changed in the next hour, I am also aborting this task.
BTW, 46 checkpoints in the 4hrs of run time.

https://www.gpugrid.net/workunit.php?wuid=27219917

Exit status 195 (0xc3) EXIT_CHILD_FAILED
Computer ID 589200
Exception: The wandb backend process has shutdown
GeForce GTX 1050 (2047MB) driver: 512.15

Exit status 203 (0xcb) EXIT_ABORTED_VIA_GUI
Computer ID 590211
Run time 241,306.00
CPU time 1,471.50
GeForce RTX 3080 Ti (4095MB) driver: 497.

The point of this information is:

1)I have GTX 1050 and 1080. Previous python failed with the same exit error as the first person in this python task. What is EXIT_CHILD_FAILED? Something on your end or on our end?

2) Person 2 probably aborted because of the way BOINC reads the data to determine the time. I killed my first python because it shows 160+ days to completion.




***I give up. No progress in 30 minutes since I started this post***

Computer: DESKTOP-LFM92VN
Project GPUGRID

Name e5a13-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND0256_2

Application Python apps for GPU hosts 4.03 (cuda1131)
Workunit name e5a13-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND0256
State Running
Received 4/27/2022 4:35:18 PM
Report deadline 5/2/2022 4:35:18 PM
Estimated app speed 3,171.20 GFLOPs/sec
Estimated task size 1,000,000,000 GFLOPs
Resources 0.987 CPUs + 1 NVIDIA GPU (device 1)
CPU time at last checkpoint 09:58:18
CPU time 10:08:59
Elapsed time 04:37:57
Estimated time remaining 161d,06:23:41
Fraction done 7.880%
Virtual memory size 6,429.20 MB
Working set size 1,072.13 MB
Directory slots/12
Process ID 16828

Debug State: 2 - Scheduler: 2

That's 4:01 to 4:38 and still at 7.88%
Checkpoints count up. CPU is 219%
This is all messed up.
I join the abort team.

------------

Something about the other task that failed with exit child.
A few extracts:

wandb: Network error (ReadTimeout), entering retry loop.

Exception in thread StatsThr:
Traceback (most recent call last):
File "D:\data\slots\13\lib\site-packages\psutil\_common.py", line 449, in wrapper
ret = self._cache[fun]
AttributeError: 'Process' object has no attribute '_cache'

During handling of the above exception, another exception occurred:
(followed by line this and line that, etc)

And then this:
OSError: [WinError 1455] The paging file is too small for this operation to complete

But the next person who got has this kind of setup:

CPU type AuthenticAMD
AMD Ryzen 5 5600X 6-Core Processor [Family 25 Model 33 Stepping 0]
Number of processors 12
Coprocessors NVIDIA NVIDIA GeForce RTX 3080 (4095MB) driver: 512.15
Operating System Microsoft Windows 11
x64 Edition, (10.00.22000.00

I run GTX and Win10 with a Ryzen 7 2800 and 7.16.20 BOINC
ID: 58752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58753 - Posted: 27 Apr 2022, 19:35:21 UTC

But here is something interesting, the CPU value according to BOINC Tasks is 221%!
How can you get more than 100% of a single core?

Because the task was actually using a little more than two cores to process the work.

Why I have set Python task to allocate 3 cpu threads for BOINC scheduling.
ID: 58753 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg _BE

Send message
Joined: 30 Jun 14
Posts: 153
Credit: 129,654,684
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 58754 - Posted: 27 Apr 2022, 19:45:18 UTC - in response to Message 58753.  
Last modified: 27 Apr 2022, 19:46:26 UTC

But here is something interesting, the CPU value according to BOINC Tasks is 221%!
How can you get more than 100% of a single core?

Because the task was actually using a little more than two cores to process the work.

Why I have set Python task to allocate 3 cpu threads for BOINC scheduling.


Ok...interesting, but what accounts for the lack of progress in 30 mins on this task that I just killed and the exit child error and blow up on the previous Python?

I mean really...0% with 2 decimal points, 7.88 for more than 30 minutes?
I don't know of any project that can't even 1/100th in 30 minutes.
I've seen my share of slow tasks in other projects, but this one...wow....

And how do you go about setting just python for 3 cpu cores? That's beyond my knowledge level.
ID: 58754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58755 - Posted: 27 Apr 2022, 22:31:48 UTC - in response to Message 58754.  

You use an app_config.xml file in the project like this:

<app_config>
<app>
<name>acemd3</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemd4</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>PythonGPU</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>3.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>PythonGPUbeta</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>3.0</cpu_usage>
</gpu_versions>
</app>
</app_config>
ID: 58755 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra