Message boards :
News :
Experimental Python tasks (beta) - task description
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 50 · Next
Author | Message |
---|---|
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I've one task now that had the same timeout issue getting python. The host was running fine on these tasks before and I don't know what has changed. I've aborted a couple tasks now that are not making any progress after 20 hours or so and are stuck at 13% completion. Similar series tasks are showing much more progress after only a few minutes. Most complete in 5-6 hours. I reset the project thinking something got corrupted in the downloaded libraries but that has not fixed anything. Need to figure out how to debug the tasks on this host. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
You might look into schedtool as an alternative. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
I'd like to hear what others are using for ncpus for their Python tasks in their app_config files.Very interesting. Does this actually limit PythonGPU to using at most 5 cpu threads? Does it work better than: <app_config> <!-- i9-7980XE 18c36t 32 GB L3 Cache 24.75 MB --> <app> <name>PythonGPU</name> <plan_class>cuda1121</plan_class> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <avg_ncpus>5</avg_ncpus> <cmdline>--nthreads 5</cmdline> <fraction_done_exact/> </app> </app_config> Edit 1: To answer my own question I changed cpu_usage to 5 and am running a single PythonGPU WU with nothing else going on. The System Monitor shows 5 CPUs are running in the 60 to 80% range with all othe CPU running in the 10 to 40% range. Is there any way to stop it from taking over ones entire computer? Edit 2: I turned on WCG and the group of 5 went up to 100% and all the rest went to OPN in the 80 to 95% range. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
No. Setting that value won’t change how much CPU is actually used. It just tells BOINC how much of the CPU is being used so that it can probably account resources. This app will use 32 threads and there’s nothing you can do in BOINC configuration to change that. This has always been the case though. ![]() |
![]() ![]() Send message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 998,578 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This morning, in a routine system update, I noticed that BOINC Client / Manager was updated from Version 7.16.17 to Version 7.18.1. It would be interesting to know if PrivateTmp=true is set as a default at this new version, thus in some way helping for Python GPU task to succeed... |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Which distro/repository are you using? I have Mint with Gianfranco Costamagna's PPA: that's usually the fastest to update, and I see v7.18.1 is being offered there as well - although I haven't installed it yet. I'll check it out in the morning. v7.18.1 should be pretty good (it's been available for Android since August last year), but I don't yet know the answer to your specific question - there hasn't been any chatter about testing or new releases in the usual places. |
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Which distro/repository are you using? I have Mint with Gianfranco Costamagna's PPA: that's usually the fastest to update, and I see v7.18.1 is being offered there as well - although I haven't installed it yet. It bombed out on the Rosetta pythons; they did not run at all (a VBox problem undoubtedly). And it failed all the validations on QuChemPedIA, which does not use VirtualBox on the Linux version. But it works OK on CPDN, WCG/ARP and Einstein/FGRBP (GPU). All were on Ubuntu 20.04.3. So be prepared to bail out if you have to. |
![]() ![]() Send message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 998,578 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Which distro/repository are you using? I'm using the regular repository for Ubuntu 20.04.3 LTS I took screenshot of offered updates before updating. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
My PPA gives slightly more information on the available update: ![]() I know that it's auto-generated from the Debian package maintenance sources, which is probably the ultimate source of the Ubuntu LTS package as well. I've had a quick look round, but there's no sign so far that this release was originated by BOINC developers: in particular, no mention was made of it during the BOINC projects conference call on January 14th 2022. I'll keep digging. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
OK, I've taken a deep breath and enough coffee - applied all updates. WARNING - the BOINC update appears to break things. The new systemd file, in full, is [Unit] Note the line I've picked out. That starts with a # sign, for comment, so it has no effect: PrivateTmp is undefined in this file. New work became available just as I was preparing to update, so I downloaded a task and immediately suspended it. After the updates, and enough reboots to get my NVidia drivers functional again (it took three this time), I restarted BOINC and allowed the task to run. Task 32736884 Our old enemy "INTERNAL ERROR: cannot create temporary directory!" is back. Time for a systemd over-ride file, and to go fishing for another task. Edit - updated the file, as described in message 58312, and got task 32736938. That seems to be running OK, having passed the 10% danger point. Result will be in sometime after midnight. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
I see your task completed normally with the PrivateTmp=true uncommented in the service file. But is the repeating warning: wandb: WARNING Path /var/lib/boinc-client/slots/11/.config/wandb/wandb/ wasn't writable, using system temp directory a normal entry for those using the standard BOINC location installation? |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No, that's the first time I've seen that particular warning. The general structure is right for this machine, but it does't usually reach as high as 11 - GPUGrid normally gets slot 7. Whatever - there were some tasks left waiting after the updates and restarts. I think this task must have run under a revised version of the app - the next stage in testing. The output is slightly different in other ways, and the task ran for a significantly shorter time than other recent tasks. My other machine, which hasn't been updated yet, got the same warnings in a task running at the same time. |
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Oh, I was not aware of this warning. "/var/lib/boinc-client/slots/11/.config/wandb/wandb/" is the directory where the training logs are stored. Yes, it changed in the last batch because of a problem detected earlier, in which the logs were stored in a directory outside boinc-client. I could actually change it to any other location. I just thought that any location inside "/var/lib/boinc-client/slots/11/" was fine. Maybe it is just a warning because .config is a hidden directory. I will change it again anyway, so that the logs are stored in "/var/lib/boinc-client/slots/11/" directly. The next batches will still contains the warning, but will disappear for the next experiment. |
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Yes, this experiments is with a slightly modified version of the algorithm, which should be faster. It runs the same number of interactions with the reinforcement learning environment, so the credits amount is the same. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'll take a look at the contents of the slot directory, next time I see a task running. You're right - the entire '/var/lib/boinc-client/slots/n/...' structure should be writable, to any depth, by any program running under the boinc user account. How is the '.config/wandb/wandb/' component of the path created? The doubled '/wandb' looks unusual. |
Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
The directory paths are defined as environment variables in the python script. # Set wandb paths Then the directories are created by the wandb python package (which handles logging of relevant training data). I suspect it could be in the creation that the permissions are defined. So it is not a BOINC problem. I will change the paths in future jobs to: # Set wandb paths Note that "os.getcwd()" is the working directory, so "/var/lib/boinc-client/slots/11/" in this case |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
Oh, I was not aware of this warning. what happens if that directory doesn't exist? several of us run BOINC in a different location. since it's in /var/lib/ the process wont have permissions to create the directory, unless maybe if BOINC is run as root. ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
'/var/lib/boinc-client/' is the default BOINC data directory for Ubuntu BOINC service (systemd) installations. It most certainly exists, and is writable, on my machine, which is where Keith first noticed the error message in the report of a successful run. During that run, much will have been written to .../slots/11 Since abouh is using code to retrieve the working (i.e. BOINC slot) directory, the correct value should be returned for non-default data locations - otherwise BOINC wouldn't be able to run at all. |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,909,595 RAC: 4,232,576 Level ![]() Scientific publications ![]() |
I'm aware it's the default location on YOUR computer, and others running the standard ubuntu repository installer. but the message from abouh sounded like this directory was hard coded since he put the entire path. and for folks running BOINC in another location, this directory will not be the same. if it uses a relative file path, then it's fine, but I was seeking clarification. /var/lib/boinc-client/ does not exist on my system. /var/lib is write protected, creating a directory there requires elevated privileges, which I'm sure happens during install from the repository. ![]() |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hard path coding was removed before this most recent test batch. edit - see message 58292: "Should be easy to fix". |
©2025 Universitat Pompeu Fabra