Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 50 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59546 - Posted: 25 Oct 2022, 12:33:46 UTC - in response to Message 59545.  

These tasks report CPU time as elapsed time.

Actually, that's not quite right.

The report (made in sched_request_www.gpugrid.net.xml) is accurate - it's after it lands in the server that it's filed in the wrong pocket.

I've got a couple of tasks finishing in the next hour / 90 minutes - I'll try to catch the report for one of them.
ID: 59546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,876,970,595
RAC: 9,834
Level
Trp
Scientific publications
wat
Message 59547 - Posted: 25 Oct 2022, 12:44:47 UTC - in response to Message 59546.  

It’s correct. You just misinterpreted my perspective.

I was talking about what the website reports to us. Not what we report to the server.
ID: 59547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59548 - Posted: 25 Oct 2022, 13:31:48 UTC - in response to Message 59547.  

Anyway, I caught one just to clarify my perspective.

<result>
<name>e00021a01361-ABOU_rnd_ppod_expand_demos25_20-0-1-RND2109_0</name>
<final_cpu_time>151352.900000</final_cpu_time>
<final_elapsed_time>54305.405065</final_elapsed_time>
<exit_status>0</exit_status>
<state>5</state>
<platform>x86_64-pc-linux-gnu</platform>
<version_num>403</version_num>
<plan_class>cuda1131</plan_class>
<final_peak_working_set_size>4950069248</final_peak_working_set_size>
<final_peak_swap_size>17198002176</final_peak_swap_size>
<final_peak_disk_usage>10656485468</final_peak_disk_usage>
<app_version_num>403</app_version_num>

That matches what it says in the job log:

ct 151352.900000 et 54305.405065

But not what is says on the website:

task 33116901

I'm going on about it, because if it was a problem in the client, we could patch the code and fix it. But because it happens on the server, it's not even worth trying. Precision in language matters.
ID: 59548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59549 - Posted: 25 Oct 2022, 17:26:09 UTC - in response to Message 59529.  


If you want to inflate both values, all that is needed is to allocate more cores to the task in a cpu_usage parameter in an app_config.xml.

The task runs in whatever time it needs on your hardware. If one core is used to compute the task the time for cpu_time and run_time = 1X. If two cores are used then the time is 2X, 5 cores = 5X etc.


I have a question: Currently, I'm running a Python task with 1 core and one GPU.
Would the crunching time decrease, if I allocate more cores to this tasks? 2 cores equals 50%, 4 cores equals 25% ?
I know how to tweak the app_config.xml, but I want to ask before I waist time with tinkering.
ID: 59549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,876,970,595
RAC: 9,834
Level
Trp
Scientific publications
wat
Message 59550 - Posted: 25 Oct 2022, 17:41:45 UTC - in response to Message 59549.  


If you want to inflate both values, all that is needed is to allocate more cores to the task in a cpu_usage parameter in an app_config.xml.

The task runs in whatever time it needs on your hardware. If one core is used to compute the task the time for cpu_time and run_time = 1X. If two cores are used then the time is 2X, 5 cores = 5X etc.


I have a question: Currently, I'm running a Python task with 1 core and one GPU.
Would the crunching time decrease, if I allocate more cores to this tasks? 2 cores equals 50%, 4 cores equals 25% ?
I know how to tweak the app_config.xml, but I want to ask before I waist time with tinkering.


I assume you're talking about the app_config settings when you say "allocate". as a reminder, these settings do not change how much CPU is used by the app. the app uses whatever it needs no matter what settings you choose (up to physical constraints). the only way you can constrain CPU use is to do something like run a virtual machine with less cores allocated to it than the host has. otherwise the app still has full access to all your cores, and if you monitor cpu use by the various processes you'll observe this.

if you're not running any other tasks (other CPU projects) at the same time, then changing the CPU allocation likely wont have any impact to your completion times since it's already using all of your cores.
ID: 59550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59551 - Posted: 25 Oct 2022, 17:53:52 UTC
Last modified: 25 Oct 2022, 17:54:22 UTC

Thanks for the fast reply. I'm running MCM from WCG on my machine in parallel. I will do a short test and suspend all other tasks. The question is: Will Python add more cores to this task if the other cores become available?

My system: Ryzen 9 5950X, NVidia RTX 3060 Ti, 64 GB RAM, WIN 10
ID: 59551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,876,970,595
RAC: 9,834
Level
Trp
Scientific publications
wat
Message 59552 - Posted: 25 Oct 2022, 18:02:57 UTC - in response to Message 59551.  
Last modified: 25 Oct 2022, 18:03:40 UTC

don't think of it in that sense.

these tasks will spawn 32+ processes no matter how many cores you have or how much you allocate in BOINC. these processes need to be serviced by the CPU. if you have many processes and not enough threads to service them all, they will need to wait in the priority queue against all other processes.

increasing the BOINC CPU allocation for the Python tasks, will stop processing by other competing BOINC CPU tasks, leaving more free available resources to the Python processes. so they will get the opportunity use more CPU in a shorter amount of time, but probably not much different total CPU time. meaning the tasks should run faster since they aren't competing with the other CPU work.
ID: 59552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1168
Credit: 12,317,898,501
RAC: 91,654
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59553 - Posted: 25 Oct 2022, 18:21:40 UTC - in response to Message 59550.  

...the only way you can constrain CPU use is to do something like run a virtual machine with less cores allocated to it than the host has. otherwise the app still has full access to all your cores, and if you monitor cpu use by the various processes you'll observe this.

if you're not running any other tasks (other CPU projects) at the same time, then changing the CPU allocation likely wont have any impact to your completion times since it's already using all of your cores.

however, you guys recently stated that best way is not to run any other projects while processing Python tasks.
I can confirm. A week ago, I ran one LHC-ATLAS task, 2-core (virtual machine) together with 2 Pythons (1 each per GPU), and after a while the system crashed.
Since then, only Pythons are being processed - no crashes so far.
ID: 59553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59554 - Posted: 25 Oct 2022, 19:03:43 UTC

Well,
CPU load was 100 % before with 30 MCM tasks running in parallel. Now, only the Python task is running and the CPU load is between 40 and 75 %. GPU load has not changed and is between 18 and 22 % like before.

Looks like it is progressing faster than before ;-)
ID: 59554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GS

Send message
Joined: 16 Oct 22
Posts: 12
Credit: 1,382,500
RAC: 0
Level
Ala
Scientific publications
wat
Message 59555 - Posted: 25 Oct 2022, 20:03:23 UTC

Found a nice balance between MCM and Python tasks. Now I run 7 MCM and 1 Python tasks and the CPU load is about 99 %.
ID: 59555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1168
Credit: 12,317,898,501
RAC: 91,654
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59556 - Posted: 26 Oct 2022, 7:25:39 UTC

there was a task which ran for about 20 hours and yielded a credit of 45.000

https://www.gpugrid.net/result.php?resultid=33117861

how come ?
ID: 59556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59557 - Posted: 26 Oct 2022, 8:38:39 UTC - in response to Message 59556.  

Currently, credits are not defined by execution time, but by the maximum possible compute effort. In particular for these AI experiments which consist on training AI agents, a maximum number of learning steps for the AI agents is defined as a target. That means that the agent interacts with its simulated environment and then learns from these interactions a certain amount of time.

However, if some condition is met earlier, the task ends. There is a certain amount of randomness in the learning process, but the amount of credits is defined by the upper bound of training steps, independently of whether the task finished earlier or not. That is the amount of learning steps that the agent would do if the early stopping condition is never met.

In general the condition is met more often by earlier RL agents in the populations that by later ones. Also can vary from experiment to experiment. Locally the task last on average 10-14h.
ID: 59557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59558 - Posted: 26 Oct 2022, 13:08:43 UTC - in response to Message 59552.  

don't think of it in that sense.

these tasks will spawn 32+ processes no matter how many cores you have or how much you allocate in BOINC. these processes need to be serviced by the CPU. if you have many processes and not enough threads to service them all, they will need to wait in the priority queue against all other processes.

increasing the BOINC CPU allocation for the Python tasks, will stop processing by other competing BOINC CPU tasks, leaving more free available resources to the Python processes. so they will get the opportunity use more CPU in a shorter amount of time, but probably not much different total CPU time. meaning the tasks should run faster since they aren't competing with the other CPU work.


I have a question also. Maybe Richard might understand better. I run CPDN tasks also which are very few and far between. So I gave zero resources to Moo Wrapper and ran it in parallel. No CPDN task then Moo would send me WUs.
Now with GPUgrid tasks, this is not the case. These tasks do not register in Boinc as a task for some reason. If I am crunching a GPUgrid task then I SHOULD not get a Moo task. That is the correct procedure but what happened when I shifted from CPDN to here, I was running one GPUgrid(on all cores) task as well as twelve Moo tasks. That is thirteen tasks. I am not worried about if it can be done but why is this happening?
ID: 59558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59559 - Posted: 26 Oct 2022, 13:24:54 UTC - in response to Message 59558.  

Without having full details of how your copy of BOINC is configured, and how the tasks from each project are configured to run (in particular, the resource assignment for each task type) it's impossible to say.

This may help:



That machine has six CPU cores, but it's only running five tasks. That's because BOINC has committed 3+1+0.5+0.5+1 = 6 cores, and there are none left. If one of the GPU applications had been configured to require 2.99 CPUs, or 0.49 CPUs, the total core allocation would have fallen "below six", and BOINC's rules say that another task can be started.
ID: 59559 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF] fansyl

Send message
Joined: 26 Sep 13
Posts: 20
Credit: 1,714,356,441
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59560 - Posted: 26 Oct 2022, 14:44:56 UTC - in response to Message 59533.  

Example: https://www.gpugrid.net/result.php?resultid=33109419

OSError: [WinError 1455] Le fichier de pagination est insuffisant pour terminer cette op&#233;ration. Error loading "D:\BOINC\slots\3\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies.

Your page file still isn't large enough.



I need to push swap size file up to 32GB but now it's OK.

Even if the GPU activity rate is low and the Python task does not respect the number of threads allocated to it... no problem, go ahead science !
ID: 59560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59563 - Posted: 29 Oct 2022, 8:04:38 UTC - in response to Message 59559.  
Last modified: 29 Oct 2022, 8:08:29 UTC

Without having full details of how your copy of BOINC is configured, and how the tasks from each project are configured to run (in particular, the resource assignment for each task type) it's impossible to say.

This may help:



That machine has six CPU cores, but it's only running five tasks. That's because BOINC has committed 3+1+0.5+0.5+1 = 6 cores, and there are none left. If one of the GPU applications had been configured to require 2.99 CPUs or 0.49 CPUs, the total core allocation would have fallen "below six", and BOINC's rules say that another task can be started.

Boinc version 7.20.2. Stock, out of the box. If there is a thread where I can learn mischief let me know.
It is stock Boinc and I have allocated 100% of resources to GPUGrid plus 0% resources to Moo Wrapper. In case of no task from GPUGrid, I can get Moo tasks.
I am in a hot, arid part of South Asia so I have to keep an eye on Temperatures. I don't want a puddle of plastic. Having too many cores is not an advantage in my case.
ID: 59563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STARBASEn
Avatar

Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 59573 - Posted: 9 Nov 2022, 23:32:44 UTC

According to my work in progress listings, I received this WU listed in progress: https://www.gpugrid.net/result.php?resultid=33134063 but it is non existent on the computer. Since it doesn't exist, I can't abort it or anything so the project will have to remove it from my queue and reassign it.
ID: 59573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1168
Credit: 12,317,898,501
RAC: 91,654
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59576 - Posted: 11 Nov 2022, 15:32:21 UTC
Last modified: 11 Nov 2022, 15:34:27 UTC

on one of my hosts a Python has now been running for almost 3 times as long as all the "long" ones before.
There is CPU activity, also GPU activity + VRAM usage in the usual range. Also RAM.
The slot in the project folder is also filled with some 8,25GB.

Still I am not sure whether this task maybe has hung up itself some way.
Could this still be a valid task, or should I better terminate it?
ID: 59576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1168
Credit: 12,317,898,501
RAC: 91,654
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59577 - Posted: 11 Nov 2022, 16:58:30 UTC - in response to Message 59576.  

on one of my hosts a Python has now been running for almost 3 times as long as all the "long" ones before.
There is CPU activity, also GPU activity + VRAM usage in the usual range. Also RAM.
The slot in the project folder is also filled with some 8,25GB.

Still I am not sure whether this task maybe has hung up itself some way.
Could this still be a valid task, or should I better terminate it?

I now looked up the task history - it failed on 7 other hosts.
So I'd better cancel it :-)
ID: 59577 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 218,778,292
RAC: 12,880
Level
Leu
Scientific publications
wat
Message 59578 - Posted: 11 Nov 2022, 19:38:42 UTC - in response to Message 59577.  
Last modified: 11 Nov 2022, 19:38:56 UTC

Can you check whether wrapper_run.out changes and number of samples collected?
There should be a config file in slot directory that contains start sample number and end sample number. You can use subtraction to determine target number of samples.
ID: 59578 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 35 · 36 · 37 · 38 · 39 · 40 · 41 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2026 Universitat Pompeu Fabra