Experimental Python tasks (beta)

Author	Message
Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59546 - Posted: 25 Oct 2022, 12:33:46 UTC - in response to Message 59545. These tasks report CPU time as elapsed time. Actually, that's not quite right. The report (made in sched_request_www.gpugrid.net.xml) is accurate - it's after it lands in the server that it's filed in the wrong pocket. I've got a couple of tasks finishing in the next hour / 90 minutes - I'll try to catch the report for one of them. ID: 59546 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 1 Level Scientific publications	Message 59547 - Posted: 25 Oct 2022, 12:44:47 UTC - in response to Message 59546. It’s correct. You just misinterpreted my perspective. I was talking about what the website reports to us. Not what we report to the server. ID: 59547 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59548 - Posted: 25 Oct 2022, 13:31:48 UTC - in response to Message 59547. Anyway, I caught one just to clarify my perspective. <result> <name>e00021a01361-ABOU_rnd_ppod_expand_demos25_20-0-1-RND2109_0</name> <final_cpu_time>151352.900000</final_cpu_time> <final_elapsed_time>54305.405065</final_elapsed_time> <exit_status>0</exit_status> <state>5</state> <platform>x86_64-pc-linux-gnu</platform> <version_num>403</version_num> <plan_class>cuda1131</plan_class> <final_peak_working_set_size>4950069248</final_peak_working_set_size> <final_peak_swap_size>17198002176</final_peak_swap_size> <final_peak_disk_usage>10656485468</final_peak_disk_usage> <app_version_num>403</app_version_num> That matches what it says in the job log: ct 151352.900000 et 54305.405065 But not what is says on the website: task 33116901 I'm going on about it, because if it was a problem in the client, we could patch the code and fix it. But because it happens on the server, it's not even worth trying. Precision in language matters. ID: 59548 · Rating: 0 · rate: / Reply Quote

GS Send message Joined: 16 Oct 22 Posts: 12 Credit: 1,382,500 RAC: 0 Level Scientific publications	Message 59549 - Posted: 25 Oct 2022, 17:26:09 UTC - in response to Message 59529. If you want to inflate both values, all that is needed is to allocate more cores to the task in a cpu_usage parameter in an app_config.xml. The task runs in whatever time it needs on your hardware. If one core is used to compute the task the time for cpu_time and run_time = 1X. If two cores are used then the time is 2X, 5 cores = 5X etc. I have a question: Currently, I'm running a Python task with 1 core and one GPU. Would the crunching time decrease, if I allocate more cores to this tasks? 2 cores equals 50%, 4 cores equals 25% ? I know how to tweak the app_config.xml, but I want to ask before I waist time with tinkering. ID: 59549 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 1 Level Scientific publications	Message 59550 - Posted: 25 Oct 2022, 17:41:45 UTC - in response to Message 59549. If you want to inflate both values, all that is needed is to allocate more cores to the task in a cpu_usage parameter in an app_config.xml. The task runs in whatever time it needs on your hardware. If one core is used to compute the task the time for cpu_time and run_time = 1X. If two cores are used then the time is 2X, 5 cores = 5X etc. I have a question: Currently, I'm running a Python task with 1 core and one GPU. Would the crunching time decrease, if I allocate more cores to this tasks? 2 cores equals 50%, 4 cores equals 25% ? I know how to tweak the app_config.xml, but I want to ask before I waist time with tinkering. I assume you're talking about the app_config settings when you say "allocate". as a reminder, these settings do not change how much CPU is used by the app. the app uses whatever it needs no matter what settings you choose (up to physical constraints). the only way you can constrain CPU use is to do something like run a virtual machine with less cores allocated to it than the host has. otherwise the app still has full access to all your cores, and if you monitor cpu use by the various processes you'll observe this. if you're not running any other tasks (other CPU projects) at the same time, then changing the CPU allocation likely wont have any impact to your completion times since it's already using all of your cores. ID: 59550 · Rating: 0 · rate: / Reply Quote

GS Send message Joined: 16 Oct 22 Posts: 12 Credit: 1,382,500 RAC: 0 Level Scientific publications	Message 59551 - Posted: 25 Oct 2022, 17:53:52 UTC Last modified: 25 Oct 2022, 17:54:22 UTC Thanks for the fast reply. I'm running MCM from WCG on my machine in parallel. I will do a short test and suspend all other tasks. The question is: Will Python add more cores to this task if the other cores become available? My system: Ryzen 9 5950X, NVidia RTX 3060 Ti, 64 GB RAM, WIN 10 ID: 59551 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 1 Level Scientific publications	Message 59552 - Posted: 25 Oct 2022, 18:02:57 UTC - in response to Message 59551. Last modified: 25 Oct 2022, 18:03:40 UTC don't think of it in that sense. these tasks will spawn 32+ processes no matter how many cores you have or how much you allocate in BOINC. these processes need to be serviced by the CPU. if you have many processes and not enough threads to service them all, they will need to wait in the priority queue against all other processes. increasing the BOINC CPU allocation for the Python tasks, will stop processing by other competing BOINC CPU tasks, leaving more free available resources to the Python processes. so they will get the opportunity use more CPU in a shorter amount of time, but probably not much different total CPU time. meaning the tasks should run faster since they aren't competing with the other CPU work. ID: 59552 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 459,461 Level Scientific publications	Message 59553 - Posted: 25 Oct 2022, 18:21:40 UTC - in response to Message 59550. ...the only way you can constrain CPU use is to do something like run a virtual machine with less cores allocated to it than the host has. otherwise the app still has full access to all your cores, and if you monitor cpu use by the various processes you'll observe this. if you're not running any other tasks (other CPU projects) at the same time, then changing the CPU allocation likely wont have any impact to your completion times since it's already using all of your cores. however, you guys recently stated that best way is not to run any other projects while processing Python tasks. I can confirm. A week ago, I ran one LHC-ATLAS task, 2-core (virtual machine) together with 2 Pythons (1 each per GPU), and after a while the system crashed. Since then, only Pythons are being processed - no crashes so far. ID: 59553 · Rating: 0 · rate: / Reply Quote

GS Send message Joined: 16 Oct 22 Posts: 12 Credit: 1,382,500 RAC: 0 Level Scientific publications	Message 59554 - Posted: 25 Oct 2022, 19:03:43 UTC Well, CPU load was 100 % before with 30 MCM tasks running in parallel. Now, only the Python task is running and the CPU load is between 40 and 75 %. GPU load has not changed and is between 18 and 22 % like before. Looks like it is progressing faster than before ;-) ID: 59554 · Rating: 0 · rate: / Reply Quote

GS Send message Joined: 16 Oct 22 Posts: 12 Credit: 1,382,500 RAC: 0 Level Scientific publications	Message 59555 - Posted: 25 Oct 2022, 20:03:23 UTC Found a nice balance between MCM and Python tasks. Now I run 7 MCM and 1 Python tasks and the CPU load is about 99 %. ID: 59555 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 459,461 Level Scientific publications	Message 59556 - Posted: 26 Oct 2022, 7:25:39 UTC there was a task which ran for about 20 hours and yielded a credit of 45.000 https://www.gpugrid.net/result.php?resultid=33117861 how come ? ID: 59556 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59557 - Posted: 26 Oct 2022, 8:38:39 UTC - in response to Message 59556. Currently, credits are not defined by execution time, but by the maximum possible compute effort. In particular for these AI experiments which consist on training AI agents, a maximum number of learning steps for the AI agents is defined as a target. That means that the agent interacts with its simulated environment and then learns from these interactions a certain amount of time. However, if some condition is met earlier, the task ends. There is a certain amount of randomness in the learning process, but the amount of credits is defined by the upper bound of training steps, independently of whether the task finished earlier or not. That is the amount of learning steps that the agent would do if the early stopping condition is never met. In general the condition is met more often by earlier RL agents in the populations that by later ones. Also can vary from experiment to experiment. Locally the task last on average 10-14h. ID: 59557 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59558 - Posted: 26 Oct 2022, 13:08:43 UTC - in response to Message 59552. don't think of it in that sense. these tasks will spawn 32+ processes no matter how many cores you have or how much you allocate in BOINC. these processes need to be serviced by the CPU. if you have many processes and not enough threads to service them all, they will need to wait in the priority queue against all other processes. increasing the BOINC CPU allocation for the Python tasks, will stop processing by other competing BOINC CPU tasks, leaving more free available resources to the Python processes. so they will get the opportunity use more CPU in a shorter amount of time, but probably not much different total CPU time. meaning the tasks should run faster since they aren't competing with the other CPU work. I have a question also. Maybe Richard might understand better. I run CPDN tasks also which are very few and far between. So I gave zero resources to Moo Wrapper and ran it in parallel. No CPDN task then Moo would send me WUs. Now with GPUgrid tasks, this is not the case. These tasks do not register in Boinc as a task for some reason. If I am crunching a GPUgrid task then I SHOULD not get a Moo task. That is the correct procedure but what happened when I shifted from CPDN to here, I was running one GPUgrid(on all cores) task as well as twelve Moo tasks. That is thirteen tasks. I am not worried about if it can be done but why is this happening? ID: 59558 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59559 - Posted: 26 Oct 2022, 13:24:54 UTC - in response to Message 59558. Without having full details of how your copy of BOINC is configured, and how the tasks from each project are configured to run (in particular, the resource assignment for each task type) it's impossible to say. This may help: That machine has six CPU cores, but it's only running five tasks. That's because BOINC has committed 3+1+0.5+0.5+1 = 6 cores, and there are none left. If one of the GPU applications had been configured to require 2.99 CPUs, or 0.49 CPUs, the total core allocation would have fallen "below six", and BOINC's rules say that another task can be started. ID: 59559 · Rating: 0 · rate: / Reply Quote

[AF] fansyl Send message Joined: 26 Sep 13 Posts: 20 Credit: 1,714,356,441 RAC: 0 Level Scientific publications	Message 59560 - Posted: 26 Oct 2022, 14:44:56 UTC - in response to Message 59533. Example: https://www.gpugrid.net/result.php?resultid=33109419 OSError: [WinError 1455] Le fichier de pagination est insuffisant pour terminer cette opération. Error loading "D:\BOINC\slots\3\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies. Your page file still isn't large enough. I need to push swap size file up to 32GB but now it's OK. Even if the GPU activity rate is low and the Python task does not respect the number of threads allocated to it... no problem, go ahead science ! ID: 59560 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59563 - Posted: 29 Oct 2022, 8:04:38 UTC - in response to Message 59559. Last modified: 29 Oct 2022, 8:08:29 UTC Without having full details of how your copy of BOINC is configured, and how the tasks from each project are configured to run (in particular, the resource assignment for each task type) it's impossible to say. This may help: That machine has six CPU cores, but it's only running five tasks. That's because BOINC has committed 3+1+0.5+0.5+1 = 6 cores, and there are none left. If one of the GPU applications had been configured to require 2.99 CPUs or 0.49 CPUs, the total core allocation would have fallen "below six", and BOINC's rules say that another task can be started. Boinc version 7.20.2. Stock, out of the box. If there is a thread where I can learn mischief let me know. It is stock Boinc and I have allocated 100% of resources to GPUGrid plus 0% resources to Moo Wrapper. In case of no task from GPUGrid, I can get Moo tasks. I am in a hot, arid part of South Asia so I have to keep an eye on Temperatures. I don't want a puddle of plastic. Having too many cores is not an advantage in my case. ID: 59563 · Rating: 0 · rate: / Reply Quote

STARBASEn Send message Joined: 17 Feb 09 Posts: 91 Credit: 1,603,303,394 RAC: 0 Level Scientific publications	Message 59573 - Posted: 9 Nov 2022, 23:32:44 UTC According to my work in progress listings, I received this WU listed in progress: https://www.gpugrid.net/result.php?resultid=33134063 but it is non existent on the computer. Since it doesn't exist, I can't abort it or anything so the project will have to remove it from my queue and reassign it. ID: 59573 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 459,461 Level Scientific publications	Message 59576 - Posted: 11 Nov 2022, 15:32:21 UTC Last modified: 11 Nov 2022, 15:34:27 UTC on one of my hosts a Python has now been running for almost 3 times as long as all the "long" ones before. There is CPU activity, also GPU activity + VRAM usage in the usual range. Also RAM. The slot in the project folder is also filled with some 8,25GB. Still I am not sure whether this task maybe has hung up itself some way. Could this still be a valid task, or should I better terminate it? ID: 59576 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 459,461 Level Scientific publications	Message 59577 - Posted: 11 Nov 2022, 16:58:30 UTC - in response to Message 59576. on one of my hosts a Python has now been running for almost 3 times as long as all the "long" ones before. There is CPU activity, also GPU activity + VRAM usage in the usual range. Also RAM. The slot in the project folder is also filled with some 8,25GB. Still I am not sure whether this task maybe has hung up itself some way. Could this still be a valid task, or should I better terminate it? I now looked up the task history - it failed on 7 other hosts. So I'd better cancel it :-) ID: 59577 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 241,278,292 RAC: 24,803 Level Scientific publications	Message 59578 - Posted: 11 Nov 2022, 19:38:42 UTC - in response to Message 59577. Last modified: 11 Nov 2022, 19:38:56 UTC Can you check whether wrapper_run.out changes and number of samples collected? There should be a config file in slot directory that contains start sample number and end sample number. You can use subtraction to determine target number of samples. ID: 59578 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description