Experimental Python tasks (beta)

Author	Message
Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58290 - Posted: 17 Jan 2022, 9:36:10 UTC - in response to Message 58289. My question would be: what is the working directory? The individual line errors concern /home/boinc-client/slots/1/... but the final failure concerns /var/lib/boinc-client That sounds like a mixed-up installation of BOINC: 'home' sounds like a location for a user-mode installation of BOINC, but '/var/lib/' would be normal for a service mode installation. It's reasonable for the two different locations to have different write permissions. What app is doing the writing in each case, and what account are they running under? Could the final write location be hard-coded, but the others dependent on locations supplied by the local BOINC installation? ID: 58290 · Rating: 0 · rate: / Reply Quote

[VENETO] sabayonino Send message Joined: 4 Apr 10 Posts: 50 Credit: 650,142,596 RAC: 0 Level Scientific publications	Message 58291 - Posted: 17 Jan 2022, 12:51:27 UTC Hi I've the same issue regarding boinc-directory (boinc dir is setup to ~/boinc) So, I cleanup ~/.conda directory and reinstall gpugridnet project to the boinc client So , flock detect the right running boinc directory but now I have this error task https://www.gpugrid.net/result.php?resultid=32734225 ./gpugridpy/bin/python (I think this is in boinc/slots/<N>/ folder) The WU is running and 0.43% completed but /home/<user>/boinc/slots/11/gpugridpy still empty. No data are writted . ID: 58291 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 58292 - Posted: 17 Jan 2022, 15:28:21 UTC - in response to Message 58290. Last modified: 17 Jan 2022, 15:55:31 UTC Right so the working directory is /home/boinc-client/slots/1/... to which the script has full access. The script tries to create a directory to save the logs, but I guess it should not do it in /var/lib/boinc-client So I think the problem is just that the package I am using to log results by default saves them outside the working directory. Should be easy to fix. ID: 58292 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58293 - Posted: 17 Jan 2022, 15:55:05 UTC - in response to Message 58292. BOINC has the concept of a "data directory". Absolutely everything that has to be written should be written somewhere in that directory or its sub-directories. Everything else must be assumed to be sandboxed and inaccessible. ID: 58293 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 197,587 Level Scientific publications	Message 58294 - Posted: 17 Jan 2022, 16:17:56 UTC - in response to Message 58282. Its often said as the "Best" card but its just the 1st https://www.gpugrid.net/show_host_detail.php?hostid=475308 This host has a 1070 and 1080 but just shows 2x 1070s as the 1070 is in the 1st slot. Any way to check for a "best" would come up with the 1080. Or the 1070Ti that used to be there with the 1070. In your case, the metrics that BOINC is looking at are identical between the two cards (actually all three of the 1070, 1070Ti, and 1080 have identical specs as far as BOINC ranking is concerned). All have the same amount of VRAM and have the same compute capability. So the tie goes to device number I guess. If you were to swap the 1080 for even a weaker card with a better CC (like a GTX 1650) then that would get picked up instead, even when not in the first slot. The PC now as 1080 and 1080Ti with the Ti having more VRAM. BOINC shows 2x 1080. The 1080 is GPU 0 in nvidia-smi and so have the other BOINC displayed GPUs. The Ti is in the physical 1st slot. This PC happened to pick up two Python tasks. They aren't taking 4 days this time. 5:45 hr:min at 38.8% and 31 min at 11.8%. ID: 58294 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level Scientific publications	Message 58295 - Posted: 17 Jan 2022, 21:07:22 UTC - in response to Message 58294. Last modified: 17 Jan 2022, 21:52:59 UTC Its often said as the "Best" card but its just the 1st https://www.gpugrid.net/show_host_detail.php?hostid=475308 This host has a 1070 and 1080 but just shows 2x 1070s as the 1070 is in the 1st slot. Any way to check for a "best" would come up with the 1080. Or the 1070Ti that used to be there with the 1070. In your case, the metrics that BOINC is looking at are identical between the two cards (actually all three of the 1070, 1070Ti, and 1080 have identical specs as far as BOINC ranking is concerned). All have the same amount of VRAM and have the same compute capability. So the tie goes to device number I guess. If you were to swap the 1080 for even a weaker card with a better CC (like a GTX 1650) then that would get picked up instead, even when not in the first slot. The PC now as 1080 and 1080Ti with the Ti having more VRAM. BOINC shows 2x 1080. The 1080 is GPU 0 in nvidia-smi and so have the other BOINC displayed GPUs. The Ti is in the physical 1st slot. This PC happened to pick up two Python tasks. They aren't taking 4 days this time. 5:45 hr:min at 38.8% and 31 min at 11.8%. what motherboard? and what version of BOINC?, your hosts are hidden so I cannot inspect myself. PCIe enumeration and ordering can be inconsistent against consumer boards. My server boards seem to enumerate starting from the slot furthest from the CPU socket, while most consumer boards are the opposite with device0 at the slot closest to the CPU socket. or do you perhaps run a locked coproc_info.xml file, this would prevent any GPU changes from being picked up by BOINC if it can't write to the coproc file. edit: also I forgot that most versions of BOINC incorrectly detect nvidia GPU memory. they will all max out at 4GB due to a bug in BOINC. So to BOINC your 1080Ti has the same amount of memory as your 1080. and since the 1080Ti is still a pascal card like the 1080, it has the same compute capability, so you're running into the same specs between them all still to get it to sort properly, you need to fix BOINC code, or use a GPU with higher or lower compute capability. put a Turing card in the system not in the first slot and BOINC will pick it up as GPU0 ID: 58295 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58296 - Posted: 18 Jan 2022, 19:03:55 UTC The tests continue. Just reported e2a13-ABOU_rnd_ppod_baseline_cnn_nophi_2-0-1-RND9761_1, with final stats <result> <name>e2a13-ABOU_rnd_ppod_baseline_cnn_nophi_2-0-1-RND9761_1</name> <final_cpu_time>107668.100000</final_cpu_time> <final_elapsed_time>46186.399529</final_elapsed_time> That's an average CPU core count of 2.33 over the entire run - that's high for what is planned to be a GPU application. We can manage with that - I'm sure we all want to help develop and test the application for the coming research run - but I think it would be helpful to put more realistic usage values into the BOINC scheduler. ID: 58296 · Rating: 0 · rate: / Reply Quote

GDF Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level Scientific publications	Message 58297 - Posted: 19 Jan 2022, 9:17:03 UTC - in response to Message 58296. It's not a GPU application. It uses both CPU and GPU. ID: 58297 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 58298 - Posted: 19 Jan 2022, 9:49:39 UTC - in response to Message 58296. Do you mean changing some of the BOINC parameters like it was done in the case of <rsc_fpops_est>? Is that to better define the resources required by the tasks? ID: 58298 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58299 - Posted: 19 Jan 2022, 11:03:54 UTC - in response to Message 58298. It would need to be done in the plan class definition. Toni said that you define your plan classes in C++ code, so there are some examples in Specifying plan classes in C++. Unfortunately, the BOINC developers didn't consider your use-case of mixing CPU elements and GPU elements in the same task, so none of the examples really match - your app is a mixture of MT and CUDA classes. What we need (or at least, would like to see) at this end are realistic values for <avg_ncpus> and <coproc><count>. ID: 58299 · Rating: 0 · rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 17 Credit: 2,978,057,945 RAC: 55,974 Level Scientific publications	Message 58300 - Posted: 19 Jan 2022, 19:00:18 UTC it seems to work better now but I've reached time limit after 1800sec https://www.gpugrid.net/result.php?resultid=32734648 19:39:23 (6124): task /usr/bin/flock reached time limit 1800 application ./gpugridpy/bin/python missing ID: 58300 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level Scientific publications	Message 58301 - Posted: 19 Jan 2022, 20:55:08 UTC I'd like to hear what others are using for ncpus for their Python tasks in their app_config files. I'm using: <app> <name>PythonGPU</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>5.0</cpu_usage> </gpu_versions> </app> for all my hosts and they seem to like that. Haven't had any issues. ID: 58301 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58302 - Posted: 19 Jan 2022, 22:28:41 UTC - in response to Message 58301. I'm still running them at 1 CPU plus 1 GPU. They run fine, but when they are busy on the CPU-only sections, they steal time from the CPU tasks that are running at the same time - most obviously from CPDN. Because these tasks are defined as GPU tasks, and GPU tasks are given a higher run priority than CPU tasks by BOINC ('below normal' against 'idle'), the real CPU project will always come off worst. ID: 58302 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level Scientific publications	Message 58303 - Posted: 20 Jan 2022, 0:27:39 UTC - in response to Message 58302. Last modified: 20 Jan 2022, 0:28:14 UTC You could employ ProcessLasso on the apps and up their priority I suppose. When I ran Windows, I really utilized that utility to make the apps run the way I wanted them to, and not how BOINC sets them up on its own agenda. ID: 58303 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,102,898 Level Scientific publications	Message 58304 - Posted: 20 Jan 2022, 6:46:45 UTC - in response to Message 58301. I'd like to hear what others are using for ncpus for their Python tasks in their app_config files. I think that Python GPU App is very efficient in adapting to any amount of CPU cores, and taking profit of available CPU resources. This seems to be in some way independent of ncpus parameter at Gpugrid app_config.xml Setup at my twin GPU system is as follows: <app> <name>PythonGPU</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.49</cpu_usage> </gpu_versions> </app> And setup for my triple GPU system is as follows: <app> <name>PythonGPU</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.33</cpu_usage> </gpu_versions> </app> The finality for this is being able to respectively run two or three concurrent Python GPU tasks without reaching a full "1" CPU core (2 x 0.49 = 0.98; 3 x 0.33 = 0.99). Then, I manually control CPU usage by setting "Use at most XX % of the CPUs" at BOINC Manager for each system, according to its amount of CPU cores. This allows me to run concurrently "N" Python GPU tasks and a fixed number of other CPU tasks as desired. But as said, Gpugrid Python GPU app seems to take CPU resources as needed for successfully processing its tasks... at the cost of slowing down the other CPU applications. ID: 58304 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58305 - Posted: 20 Jan 2022, 7:44:41 UTC Yes, I use Process Lasso on all my Windows machines, but I haven't explored its use under Linux. Remember that ncpus and similar has no effect whatsoever on the actual running of a BOINC project app - there is no 'control' element to its operation. The only effect it has is on BOINC's scheduling - how many tasks are allowed to run concurrently. ID: 58305 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 58306 - Posted: 20 Jan 2022, 15:58:45 UTC - in response to Message 58300. This message 19:39:23 (6124): task /usr/bin/flock reached time limit 1800 Indicates that, after 30 minutes, the installation of miniconda and the task environment setup have not been finished. Consequently, python is not found later on to execute the task since it is one of the requirements of the miniconda environment. application ./gpugridpy/bin/python missing Therefore, it is not an error in itself, it just means that the miniconda setup went too slow for some reason (in theory 30 minutes should be enough time). Maybe the machine is slower than usual for some reason. Or the connection is slow and dependencies are not being downloaded. We could extend this timeout, but normally if 30 minutes is not enough for the miniconda setup another underlying problem could exists. ID: 58306 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level Scientific publications	Message 58307 - Posted: 20 Jan 2022, 16:18:58 UTC - in response to Message 58306. it seems to be a reasonably fast system. my guess is another type of permissions issue which is blocking the python install and it hits the timeout, or the CPUs are being too heavily used and not giving enough resources to the extraction process. ID: 58307 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 678,713 Level Scientific publications	Message 58308 - Posted: 20 Jan 2022, 22:15:20 UTC - in response to Message 58305. There is no Linux equivalent of Process Lasso. But there is a Linux equivalent of Windows Process-Explorer https://github.com/wolfc01/procexp Screenshots of the application at the old SourceForge repo. https://sourceforge.net/projects/procexp/ Can dynamically change the nice value of the application. There is also the command line schedtool utility that can be easily implemented in a bash file. I used to run that all the time in my gpuoverclock.sh script for Seti cpu and gpu apps. ID: 58308 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 326,008 Level Scientific publications	Message 58309 - Posted: 21 Jan 2022, 12:14:55 UTC - in response to Message 58308. Well, that got me a long way. There are dependencies listed for Mint 18.3 - I'm running Mint 20.2 The apt-get for the older version of Mint returns E: Unable to locate package python-qwt5-qt4 E: Unable to locate package python-configobj Unsurprisingly, the next step returns Traceback (most recent call last): File "./procexp.py", line 27, in <module> from PyQt5 import QtCore, QtGui, QtWidgets, uic ModuleNotFoundError: No module named 'PyQt5' htop, however, shows about 30 multitasking processes spawned from main, each using around 2% of a CPU core (varying by the second) at nice 19. At the time of inspection, that is. I'll go away and think about that. ID: 58309 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description