Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 50 · Next

AuthorMessage
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58310 - Posted: 21 Jan 2022, 17:41:41 UTC - in response to Message 58300.  

I've one task now that had the same timeout issue getting python. The host was running fine on these tasks before and I don't know what has changed.

I've aborted a couple tasks now that are not making any progress after 20 hours or so and are stuck at 13% completion. Similar series tasks are showing much more progress after only a few minutes. Most complete in 5-6 hours.

I reset the project thinking something got corrupted in the downloaded libraries but that has not fixed anything.

Need to figure out how to debug the tasks on this host.
ID: 58310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58311 - Posted: 21 Jan 2022, 17:42:23 UTC - in response to Message 58309.  

You might look into schedtool as an alternative.
ID: 58311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 58317 - Posted: 29 Jan 2022, 21:23:39 UTC - in response to Message 58301.  
Last modified: 29 Jan 2022, 22:08:45 UTC

I'd like to hear what others are using for ncpus for their Python tasks in their app_config files.

I'm using:

<app>
<name>PythonGPU</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>5.0</cpu_usage>
</gpu_versions>
</app>

for all my hosts and they seem to like that. Haven't had any issues.
Very interesting. Does this actually limit PythonGPU to using at most 5 cpu threads?
Does it work better than:
<app_config>
<!-- i9-7980XE 18c36t 32 GB L3 Cache 24.75 MB -->
<app>
<name>PythonGPU</name>
<plan_class>cuda1121</plan_class>
<gpu_versions>
<cpu_usage>1.0</cpu_usage>
<gpu_usage>1.0</gpu_usage>
</gpu_versions>
<avg_ncpus>5</avg_ncpus>
<cmdline>--nthreads 5</cmdline>
<fraction_done_exact/>
</app>
</app_config>
Edit 1: To answer my own question I changed cpu_usage to 5 and am running a single PythonGPU WU with nothing else going on. The System Monitor shows 5 CPUs are running in the 60 to 80% range with all othe CPU running in the 10 to 40% range.
Is there any way to stop it from taking over ones entire computer?
Edit 2: I turned on WCG and the group of 5 went up to 100% and all the rest went to OPN in the 80 to 95% range.
ID: 58317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 58318 - Posted: 30 Jan 2022, 5:24:25 UTC - in response to Message 58317.  

No. Setting that value won’t change how much CPU is actually used. It just tells BOINC how much of the CPU is being used so that it can probably account resources.

This app will use 32 threads and there’s nothing you can do in BOINC configuration to change that. This has always been the case though.
ID: 58318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58320 - Posted: 2 Feb 2022, 22:06:09 UTC

This morning, in a routine system update, I noticed that BOINC Client / Manager was updated from Version 7.16.17 to Version 7.18.1.
It would be interesting to know if PrivateTmp=true is set as a default at this new version, thus in some way helping for Python GPU task to succeed...
ID: 58320 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58321 - Posted: 2 Feb 2022, 23:06:32 UTC - in response to Message 58320.  

Which distro/repository are you using? I have Mint with Gianfranco Costamagna's PPA: that's usually the fastest to update, and I see v7.18.1 is being offered there as well - although I haven't installed it yet.

I'll check it out in the morning. v7.18.1 should be pretty good (it's been available for Android since August last year), but I don't yet know the answer to your specific question - there hasn't been any chatter about testing or new releases in the usual places.
ID: 58321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58322 - Posted: 2 Feb 2022, 23:47:29 UTC - in response to Message 58321.  
Last modified: 2 Feb 2022, 23:50:53 UTC

Which distro/repository are you using? I have Mint with Gianfranco Costamagna's PPA: that's usually the fastest to update, and I see v7.18.1 is being offered there as well - although I haven't installed it yet.

I'll check it out in the morning. v7.18.1 should be pretty good

It bombed out on the Rosetta pythons; they did not run at all (a VBox problem undoubtedly). And it failed all the validations on QuChemPedIA, which does not use VirtualBox on the Linux version. But it works OK on CPDN, WCG/ARP and Einstein/FGRBP (GPU). All were on Ubuntu 20.04.3.

So be prepared to bail out if you have to.
ID: 58322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58324 - Posted: 3 Feb 2022, 6:29:43 UTC - in response to Message 58321.  

Which distro/repository are you using?

I'm using the regular repository for Ubuntu 20.04.3 LTS
I took screenshot of offered updates before updating.
ID: 58324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58325 - Posted: 3 Feb 2022, 9:25:23 UTC - in response to Message 58324.  

My PPA gives slightly more information on the available update:



I know that it's auto-generated from the Debian package maintenance sources, which is probably the ultimate source of the Ubuntu LTS package as well. I've had a quick look round, but there's no sign so far that this release was originated by BOINC developers: in particular, no mention was made of it during the BOINC projects conference call on January 14th 2022. I'll keep digging.
ID: 58325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58327 - Posted: 3 Feb 2022, 12:13:36 UTC
Last modified: 3 Feb 2022, 12:34:19 UTC

OK, I've taken a deep breath and enough coffee - applied all updates.

WARNING - the BOINC update appears to break things.

The new systemd file, in full, is

[Unit]
Description=Berkeley Open Infrastructure Network Computing Client
Documentation=man:boinc(1)
After=network-online.target

[Service]
Type=simple
ProtectHome=true
ProtectSystem=strict
ProtectControlGroups=true
ReadWritePaths=-/var/lib/boinc -/etc/boinc-client
Nice=10
User=boinc
WorkingDirectory=/var/lib/boinc
ExecStart=/usr/bin/boinc
ExecStop=/usr/bin/boinccmd --quit
ExecReload=/usr/bin/boinccmd --read_cc_config
ExecStopPost=/bin/rm -f lockfile
IOSchedulingClass=idle
# The following options prevent setuid root as they imply NoNewPrivileges=true
# Since Atlas requires setuid root, they break Atlas
# In order to improve security, if you're not using Atlas,
# Add these options to the [Service] section of an override file using
# sudo systemctl edit boinc-client.service
#NoNewPrivileges=true
#ProtectKernelModules=true
#ProtectKernelTunables=true
#RestrictRealtime=true
#RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
#RestrictNamespaces=true
#PrivateUsers=true
#CapabilityBoundingSet=
#MemoryDenyWriteExecute=true
#PrivateTmp=true #Block X11 idle detection

[Install]
WantedBy=multi-user.target

Note the line I've picked out. That starts with a # sign, for comment, so it has no effect: PrivateTmp is undefined in this file.

New work became available just as I was preparing to update, so I downloaded a task and immediately suspended it. After the updates, and enough reboots to get my NVidia drivers functional again (it took three this time), I restarted BOINC and allowed the task to run.

Task 32736884

Our old enemy "INTERNAL ERROR: cannot create temporary directory!" is back. Time for a systemd over-ride file, and to go fishing for another task.

Edit - updated the file, as described in message 58312, and got task 32736938. That seems to be running OK, having passed the 10% danger point. Result will be in sometime after midnight.
ID: 58327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58328 - Posted: 3 Feb 2022, 23:34:25 UTC

I see your task completed normally with the PrivateTmp=true uncommented in the service file.

But is the repeating warning:

wandb: WARNING Path /var/lib/boinc-client/slots/11/.config/wandb/wandb/ wasn't writable, using system temp directory

a normal entry for those using the standard BOINC location installation?
ID: 58328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58329 - Posted: 4 Feb 2022, 9:04:58 UTC - in response to Message 58328.  

No, that's the first time I've seen that particular warning. The general structure is right for this machine, but it does't usually reach as high as 11 - GPUGrid normally gets slot 7. Whatever - there were some tasks left waiting after the updates and restarts.

I think this task must have run under a revised version of the app - the next stage in testing. The output is slightly different in other ways, and the task ran for a significantly shorter time than other recent tasks. My other machine, which hasn't been updated yet, got the same warnings in a task running at the same time.
ID: 58329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58330 - Posted: 4 Feb 2022, 9:14:25 UTC - in response to Message 58328.  
Last modified: 4 Feb 2022, 9:23:48 UTC

Oh, I was not aware of this warning.

"/var/lib/boinc-client/slots/11/.config/wandb/wandb/" is the directory where the training logs are stored. Yes, it changed in the last batch because of a problem detected earlier, in which the logs were stored in a directory outside boinc-client.

I could actually change it to any other location. I just thought that any location inside "/var/lib/boinc-client/slots/11/" was fine.

Maybe it is just a warning because .config is a hidden directory. I will change it again anyway, so that the logs are stored in "/var/lib/boinc-client/slots/11/" directly. The next batches will still contains the warning, but will disappear for the next experiment.
ID: 58330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58331 - Posted: 4 Feb 2022, 9:25:40 UTC - in response to Message 58329.  

Yes, this experiments is with a slightly modified version of the algorithm, which should be faster. It runs the same number of interactions with the reinforcement learning environment, so the credits amount is the same.
ID: 58331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58332 - Posted: 4 Feb 2022, 9:38:39 UTC - in response to Message 58330.  

I'll take a look at the contents of the slot directory, next time I see a task running. You're right - the entire '/var/lib/boinc-client/slots/n/...' structure should be writable, to any depth, by any program running under the boinc user account.

How is the '.config/wandb/wandb/' component of the path created? The doubled '/wandb' looks unusual.
ID: 58332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58333 - Posted: 4 Feb 2022, 9:44:30 UTC - in response to Message 58332.  
Last modified: 4 Feb 2022, 9:55:30 UTC

The directory paths are defined as environment variables in the python script.

# Set wandb paths
os.environ["WANDB_CONFIG_DIR"] = os.getcwd()
os.environ["WANDB_DIR"] = os.path.join(os.getcwd(), ".config/wandb")


Then the directories are created by the wandb python package (which handles logging of relevant training data). I suspect it could be in the creation that the permissions are defined. So it is not a BOINC problem. I will change the paths in future jobs to:

# Set wandb paths
os.environ["WANDB_CONFIG_DIR"] = os.getcwd()
os.environ["WANDB_DIR"] = os.getcwd()


Note that "os.getcwd()" is the working directory, so "/var/lib/boinc-client/slots/11/" in this case
ID: 58333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 58334 - Posted: 4 Feb 2022, 13:32:42 UTC - in response to Message 58330.  

Oh, I was not aware of this warning.

"/var/lib/boinc-client/slots/11/.config/wandb/wandb/" is the directory where the training logs are stored. Yes, it changed in the last batch because of a problem detected earlier, in which the logs were stored in a directory outside boinc-client.

I could actually change it to any other location. I just thought that any location inside "/var/lib/boinc-client/slots/11/" was fine.

Maybe it is just a warning because .config is a hidden directory. I will change it again anyway, so that the logs are stored in "/var/lib/boinc-client/slots/11/" directly. The next batches will still contains the warning, but will disappear for the next experiment.


what happens if that directory doesn't exist? several of us run BOINC in a different location. since it's in /var/lib/ the process wont have permissions to create the directory, unless maybe if BOINC is run as root.
ID: 58334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58335 - Posted: 4 Feb 2022, 14:22:26 UTC - in response to Message 58334.  

'/var/lib/boinc-client/' is the default BOINC data directory for Ubuntu BOINC service (systemd) installations. It most certainly exists, and is writable, on my machine, which is where Keith first noticed the error message in the report of a successful run. During that run, much will have been written to .../slots/11

Since abouh is using code to retrieve the working (i.e. BOINC slot) directory, the correct value should be returned for non-default data locations - otherwise BOINC wouldn't be able to run at all.
ID: 58335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 58336 - Posted: 4 Feb 2022, 15:33:49 UTC - in response to Message 58335.  
Last modified: 4 Feb 2022, 15:39:39 UTC

I'm aware it's the default location on YOUR computer, and others running the standard ubuntu repository installer. but the message from abouh sounded like this directory was hard coded since he put the entire path. and for folks running BOINC in another location, this directory will not be the same. if it uses a relative file path, then it's fine, but I was seeking clarification.

/var/lib/boinc-client/ does not exist on my system. /var/lib is write protected, creating a directory there requires elevated privileges, which I'm sure happens during install from the repository.
ID: 58336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58337 - Posted: 4 Feb 2022, 15:59:00 UTC - in response to Message 58336.  
Last modified: 4 Feb 2022, 16:21:03 UTC

Hard path coding was removed before this most recent test batch.

edit - see message 58292: "Should be easy to fix".
ID: 58337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra