Message boards :
News :
Experimental Python tasks (beta) - task description
Message board moderation
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 50 · Next
| Author | Message |
|---|---|
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
has the allowed limit changed to 30,000,000,000 bytes? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Appears so. <rsc_disk_bound>30000000000.000000</rsc_disk_bound> |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
The size for all the app files (including the compressed environment) are: Note: I was commenting on Rosetta at home CPU pythons. What yours do, I don't know. I guess i had better add your project and see what happens. I readded your project to my system, so if I am home when a task is sent out, I'll have a look. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Thank you! I have added the subtask weights to the PythonGPUbeta app. Currently testing it with a small batch of tasks. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Testing was successful, so we can add the weights to the PythonGPU app job.xml file |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
abouh, can you have a look at my comments in a thread I created. The 4.0 task was not increasing in percentage done after watching it for 10 minutes. Time to completion kept jumping around 1 second up 1 second down. 40 minutes run time vs cpu time? That a hell of a lot of set up time! Here are the local host task details Application Python apps for GPU hosts 4.03 (cuda1131) Workunit name e2a18-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND3898 State Running Received 4/15/2022 12:06:46 PM Report deadline 4/20/2022 12:06:46 PM Estimated app speed 53.74 GFLOPs/sec Estimated task size 1,000,000,000 GFLOPs Resources 0.987 CPUs + 1 NVIDIA GPU (GTX 1050) CPU time at last checkpoint 06:44:35 CPU time 06:47:39 Elapsed time 06:05:04 Estimated time remaining 198d,09:49:25 Fraction done 7.880% Virtual memory size 7,230.02 MB Working set size 2,057.87 MB |
|
Send message Joined: 2 Jan 09 Posts: 303 Credit: 7,321,800,090 RAC: 270 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You can delete the previous post about ACMED3. I posted that incorrectly here. Some forums let you put a double space or a double period to delete your own post, but you must still do it within the editing time |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Mikey, I know. But the time limit expired on that post to edit it. I came back days later not within the 30-60 minutes allowed. |
|
Send message Joined: 12 May 13 Posts: 5 Credit: 100,032,540 RAC: 0 Level ![]() Scientific publications
|
I am now running a Python task. It has a very low usage of my GPU most often around 5 to 10%, occasionally getting up to 20%. Is this normal? Should I wait until I move my GPU from an old 3770K to a 12500 computer for better CPU capabilities to do these tasks? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
This is normal for Python on GPU tasks. The tasks run on both the cpu and gpu during parts of the computation for the inferencing and machine learning segments. Read the posts by the admin developer explaining what the process involves. - cyclical GPU load is expected in Reinforcement Learning algorithms. Whenever GPU load in lower, CPU usage should increase. It is correct. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Sorry for the late reply Greg _BE, I hid the ACEMD3 posts. I checked your job e2a18-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND3898. Did the progress get stuck or was it just increasing slowly? The job was finally completed by another Windows 10 host, but the CPU time is wrong because it says 668566.9 seconds. I am not sure, but maybe one problem is that we ask only for 0.987 CPUs, since that was ideal for ACEMD jobs. In reality Python tasks use more. I will look into it. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
New tasks being issued this morning, allocated to the old Linux v4.01 'Python app for GPU hosts' issued in October 2021. All are failing with "ModuleNotFoundError: No module named 'yaml'". |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I am not sure, but maybe one problem is that we ask only for 0.987 CPUs, since that was ideal for ACEMD jobs. In reality Python tasks use more. I will look into it. Asking for 1.00 CPUs (or above) would make a significant difference, because that would prompt the BOINC client to reduce the number of tasks being run for other projects. It would be problematic to increase the CPU demand above 1.00, because the CPU loading is dynamic - BOINC has no provision for allowing another project to utilise the cycles available during periods when the GPUGrid app is quiescent. Normally, a GPU app is given a higher process priority for CPU usage than a pure CPU app, so the operating system should allocate resources to your advantage, but that can be problematic when the wrapper app is in use. That was changed recently: I'll look into the situation with your server version and our current client versions. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Definitely only the latest version 403 should be sent. Thanks for letting us know. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
BOINC GPU apps, wrapper apps, and process priority The basic rule for BOINC applications (originally CPU only) has been to run applications at idle priority, to avoid interfering with foreground use of the computer. Since the introduction of GPU apps into BOINC around 2008, the CPU portion of a GPU app has been automatically run at a slightly higher process priority (below normal) - an attempt to avoid highly-productive GPU work being throttled by competition for CPU resources. Normally, the BOINC client manages these two different process priorities directly. But when a wrapper app is interpolated between the client and a worker app, it's the wrapper which sets the priority for the worker app. It was a user on this project who first noticed (Issue 3764 - May 2020) that the process priority of a GPU app wasn't being set correctly when it was executing under the control of a wrapper app. Many false starts later (PRs 3826, 3948, 3988, 3999), a fully consistent set of process priority tools was developed, effective from about 25 September 2020. But in order for these tools to be useful, compatible versions of both the BOINC client and the wrapper application have to be used. So far as I can tell, BOINC client for Windows v7.16.20 (current) is compliant; Wrapper version 26203 is compliant; but no full public release versions of the BOINC client for Linux are yet compliant (Gianfranco Costamagna's prototyping PPA client should be). This project appears to be using wrapper code 26016 for Windows, and wrapper code 26198 for Linux. Unless these have been patched locally, neither wrapper will yet allow full process control management. It's not urgent, but with the new Python apps running in a mixed CPU/GPU environment, it might be helpful to update the project's wrapper codebase. Fortunately, the basic server platform is unaffected by all this. |
|
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
We have deprecated v4.01 Hopefully, if everything went fine, the error All are failing with "ModuleNotFoundError: No module named 'yaml'". should not happen any more. And all jobs should use v4.03 |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
abouh, I got another python finally. But here is something interesting, the CPU value according to BOINC Tasks is 221%! How can you get more than 100% of a single core? Another observation, elapsed time vs CPU time. The two are off by about 5 hours. 4:01 vs 8:54 currently Progress is not moving very fast. In the time it has taken me to write this it is stuck at 7.88% Now 4:16 to 9:24 and still 7.88%!!, 15 mins and no progress? If this hasn't changed in the next hour, I am also aborting this task. BTW, 46 checkpoints in the 4hrs of run time. https://www.gpugrid.net/workunit.php?wuid=27219917 Exit status 195 (0xc3) EXIT_CHILD_FAILED Computer ID 589200 Exception: The wandb backend process has shutdown GeForce GTX 1050 (2047MB) driver: 512.15 Exit status 203 (0xcb) EXIT_ABORTED_VIA_GUI Computer ID 590211 Run time 241,306.00 CPU time 1,471.50 GeForce RTX 3080 Ti (4095MB) driver: 497. The point of this information is: 1)I have GTX 1050 and 1080. Previous python failed with the same exit error as the first person in this python task. What is EXIT_CHILD_FAILED? Something on your end or on our end? 2) Person 2 probably aborted because of the way BOINC reads the data to determine the time. I killed my first python because it shows 160+ days to completion. ***I give up. No progress in 30 minutes since I started this post*** Computer: DESKTOP-LFM92VN Project GPUGRID Name e5a13-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND0256_2 Application Python apps for GPU hosts 4.03 (cuda1131) Workunit name e5a13-ABOU_rnd_ppod_avoid_cnn_4-0-1-RND0256 State Running Received 4/27/2022 4:35:18 PM Report deadline 5/2/2022 4:35:18 PM Estimated app speed 3,171.20 GFLOPs/sec Estimated task size 1,000,000,000 GFLOPs Resources 0.987 CPUs + 1 NVIDIA GPU (device 1) CPU time at last checkpoint 09:58:18 CPU time 10:08:59 Elapsed time 04:37:57 Estimated time remaining 161d,06:23:41 Fraction done 7.880% Virtual memory size 6,429.20 MB Working set size 1,072.13 MB Directory slots/12 Process ID 16828 Debug State: 2 - Scheduler: 2 That's 4:01 to 4:38 and still at 7.88% Checkpoints count up. CPU is 219% This is all messed up. I join the abort team. ------------ Something about the other task that failed with exit child. A few extracts: wandb: Network error (ReadTimeout), entering retry loop. Exception in thread StatsThr: Traceback (most recent call last): File "D:\data\slots\13\lib\site-packages\psutil\_common.py", line 449, in wrapper ret = self._cache[fun] AttributeError: 'Process' object has no attribute '_cache' During handling of the above exception, another exception occurred: (followed by line this and line that, etc) And then this: OSError: [WinError 1455] The paging file is too small for this operation to complete But the next person who got has this kind of setup: CPU type AuthenticAMD AMD Ryzen 5 5600X 6-Core Processor [Family 25 Model 33 Stepping 0] Number of processors 12 Coprocessors NVIDIA NVIDIA GeForce RTX 3080 (4095MB) driver: 512.15 Operating System Microsoft Windows 11 x64 Edition, (10.00.22000.00 I run GTX and Win10 with a Ryzen 7 2800 and 7.16.20 BOINC |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
But here is something interesting, the CPU value according to BOINC Tasks is 221%! Because the task was actually using a little more than two cores to process the work. Why I have set Python task to allocate 3 cpu threads for BOINC scheduling. |
|
Send message Joined: 30 Jun 14 Posts: 153 Credit: 129,654,684 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
But here is something interesting, the CPU value according to BOINC Tasks is 221%! Ok...interesting, but what accounts for the lack of progress in 30 mins on this task that I just killed and the exit child error and blow up on the previous Python? I mean really...0% with 2 decimal points, 7.88 for more than 30 minutes? I don't know of any project that can't even 1/100th in 30 minutes. I've seen my share of slow tasks in other projects, but this one...wow.... And how do you go about setting just python for 3 cpu cores? That's beyond my knowledge level. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 731 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
You use an app_config.xml file in the project like this: <app_config> <app> <name>acemd3</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> <app> <name>acemd4</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> <app> <name>PythonGPU</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>3.0</cpu_usage> </gpu_versions> </app> <app> <name>PythonGPUbeta</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>3.0</cpu_usage> </gpu_versions> </app> </app_config> |
©2025 Universitat Pompeu Fabra