Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 50 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58290 - Posted: 17 Jan 2022, 9:36:10 UTC - in response to Message 58289.  

My question would be: what is the working directory?

The individual line errors concern

/home/boinc-client/slots/1/...

but the final failure concerns

/var/lib/boinc-client

That sounds like a mixed-up installation of BOINC: 'home' sounds like a location for a user-mode installation of BOINC, but '/var/lib/' would be normal for a service mode installation. It's reasonable for the two different locations to have different write permissions.

What app is doing the writing in each case, and what account are they running under?

Could the final write location be hard-coded, but the others dependent on locations supplied by the local BOINC installation?
ID: 58290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [VENETO] sabayonino

Send message
Joined: 4 Apr 10
Posts: 50
Credit: 650,142,596
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58291 - Posted: 17 Jan 2022, 12:51:27 UTC

Hi

I've the same issue regarding boinc-directory (boinc dir is setup to ~/boinc)

So, I cleanup ~/.conda directory and reinstall gpugridnet project to the boinc client

So , flock detect the right running boinc directory but now I have this error task

https://www.gpugrid.net/result.php?resultid=32734225

./gpugridpy/bin/python (I think this is in boinc/slots/<N>/ folder)

The WU is running and 0.43% completed but /home/<user>/boinc/slots/11/gpugridpy still empty. No data are writted .
ID: 58291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58292 - Posted: 17 Jan 2022, 15:28:21 UTC - in response to Message 58290.  
Last modified: 17 Jan 2022, 15:55:31 UTC

Right so the working directory is

/home/boinc-client/slots/1/...


to which the script has full access. The script tries to create a directory to save the logs, but I guess it should not do it in

/var/lib/boinc-client


So I think the problem is just that the package I am using to log results by default saves them outside the working directory. Should be easy to fix.
ID: 58292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58293 - Posted: 17 Jan 2022, 15:55:05 UTC - in response to Message 58292.  

BOINC has the concept of a "data directory". Absolutely everything that has to be written should be written somewhere in that directory or its sub-directories. Everything else must be assumed to be sandboxed and inaccessible.
ID: 58293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 178,897
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58294 - Posted: 17 Jan 2022, 16:17:56 UTC - in response to Message 58282.  



Its often said as the "Best" card but its just the 1st
https://www.gpugrid.net/show_host_detail.php?hostid=475308

This host has a 1070 and 1080 but just shows 2x 1070s as the 1070 is in the 1st slot. Any way to check for a "best" would come up with the 1080. Or the 1070Ti that used to be there with the 1070.


In your case, the metrics that BOINC is looking at are identical between the two cards (actually all three of the 1070, 1070Ti, and 1080 have identical specs as far as BOINC ranking is concerned). All have the same amount of VRAM and have the same compute capability. So the tie goes to device number I guess. If you were to swap the 1080 for even a weaker card with a better CC (like a GTX 1650) then that would get picked up instead, even when not in the first slot.


The PC now as 1080 and 1080Ti with the Ti having more VRAM. BOINC shows 2x 1080. The 1080 is GPU 0 in nvidia-smi and so have the other BOINC displayed GPUs. The Ti is in the physical 1st slot.

This PC happened to pick up two Python tasks. They aren't taking 4 days this time. 5:45 hr:min at 38.8% and 31 min at 11.8%.
ID: 58294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 58295 - Posted: 17 Jan 2022, 21:07:22 UTC - in response to Message 58294.  
Last modified: 17 Jan 2022, 21:52:59 UTC



Its often said as the "Best" card but its just the 1st
https://www.gpugrid.net/show_host_detail.php?hostid=475308

This host has a 1070 and 1080 but just shows 2x 1070s as the 1070 is in the 1st slot. Any way to check for a "best" would come up with the 1080. Or the 1070Ti that used to be there with the 1070.


In your case, the metrics that BOINC is looking at are identical between the two cards (actually all three of the 1070, 1070Ti, and 1080 have identical specs as far as BOINC ranking is concerned). All have the same amount of VRAM and have the same compute capability. So the tie goes to device number I guess. If you were to swap the 1080 for even a weaker card with a better CC (like a GTX 1650) then that would get picked up instead, even when not in the first slot.


The PC now as 1080 and 1080Ti with the Ti having more VRAM. BOINC shows 2x 1080. The 1080 is GPU 0 in nvidia-smi and so have the other BOINC displayed GPUs. The Ti is in the physical 1st slot.

This PC happened to pick up two Python tasks. They aren't taking 4 days this time. 5:45 hr:min at 38.8% and 31 min at 11.8%.


what motherboard? and what version of BOINC?, your hosts are hidden so I cannot inspect myself. PCIe enumeration and ordering can be inconsistent against consumer boards. My server boards seem to enumerate starting from the slot furthest from the CPU socket, while most consumer boards are the opposite with device0 at the slot closest to the CPU socket.

or do you perhaps run a locked coproc_info.xml file, this would prevent any GPU changes from being picked up by BOINC if it can't write to the coproc file.

edit:

also I forgot that most versions of BOINC incorrectly detect nvidia GPU memory. they will all max out at 4GB due to a bug in BOINC. So to BOINC your 1080Ti has the same amount of memory as your 1080. and since the 1080Ti is still a pascal card like the 1080, it has the same compute capability, so you're running into the same specs between them all still

to get it to sort properly, you need to fix BOINC code, or use a GPU with higher or lower compute capability. put a Turing card in the system not in the first slot and BOINC will pick it up as GPU0
ID: 58295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58296 - Posted: 18 Jan 2022, 19:03:55 UTC

The tests continue. Just reported e2a13-ABOU_rnd_ppod_baseline_cnn_nophi_2-0-1-RND9761_1, with final stats

<result>
    <name>e2a13-ABOU_rnd_ppod_baseline_cnn_nophi_2-0-1-RND9761_1</name>
    <final_cpu_time>107668.100000</final_cpu_time>
    <final_elapsed_time>46186.399529</final_elapsed_time>

That's an average CPU core count of 2.33 over the entire run - that's high for what is planned to be a GPU application. We can manage with that - I'm sure we all want to help develop and test the application for the coming research run - but I think it would be helpful to put more realistic usage values into the BOINC scheduler.
ID: 58296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 58297 - Posted: 19 Jan 2022, 9:17:03 UTC - in response to Message 58296.  

It's not a GPU application. It uses both CPU and GPU.
ID: 58297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58298 - Posted: 19 Jan 2022, 9:49:39 UTC - in response to Message 58296.  

Do you mean changing some of the BOINC parameters like it was done in the case of <rsc_fpops_est>?

Is that to better define the resources required by the tasks?
ID: 58298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58299 - Posted: 19 Jan 2022, 11:03:54 UTC - in response to Message 58298.  

It would need to be done in the plan class definition. Toni said that you define your plan classes in C++ code, so there are some examples in Specifying plan classes in C++.

Unfortunately, the BOINC developers didn't consider your use-case of mixing CPU elements and GPU elements in the same task, so none of the examples really match - your app is a mixture of MT and CUDA classes. What we need (or at least, would like to see) at this end are realistic values for <avg_ncpus> and <coproc><count>.
ID: 58299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FritzB

Send message
Joined: 7 Apr 15
Posts: 17
Credit: 2,978,057,945
RAC: 50,679
Level
Phe
Scientific publications
wat
Message 58300 - Posted: 19 Jan 2022, 19:00:18 UTC

it seems to work better now but I've reached time limit after 1800sec
https://www.gpugrid.net/result.php?resultid=32734648

19:39:23 (6124): task /usr/bin/flock reached time limit 1800
application ./gpugridpy/bin/python missing
ID: 58300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58301 - Posted: 19 Jan 2022, 20:55:08 UTC

I'd like to hear what others are using for ncpus for their Python tasks in their app_config files.

I'm using:

<app>
<name>PythonGPU</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>5.0</cpu_usage>
</gpu_versions>
</app>

for all my hosts and they seem to like that. Haven't had any issues.
ID: 58301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58302 - Posted: 19 Jan 2022, 22:28:41 UTC - in response to Message 58301.  

I'm still running them at 1 CPU plus 1 GPU. They run fine, but when they are busy on the CPU-only sections, they steal time from the CPU tasks that are running at the same time - most obviously from CPDN.

Because these tasks are defined as GPU tasks, and GPU tasks are given a higher run priority than CPU tasks by BOINC ('below normal' against 'idle'), the real CPU project will always come off worst.
ID: 58302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58303 - Posted: 20 Jan 2022, 0:27:39 UTC - in response to Message 58302.  
Last modified: 20 Jan 2022, 0:28:14 UTC

You could employ ProcessLasso on the apps and up their priority I suppose.

When I ran Windows, I really utilized that utility to make the apps run the way I wanted them to, and not how BOINC sets them up on its own agenda.
ID: 58303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58304 - Posted: 20 Jan 2022, 6:46:45 UTC - in response to Message 58301.  

I'd like to hear what others are using for ncpus for their Python tasks in their app_config files.

I think that Python GPU App is very efficient in adapting to any amount of CPU cores, and taking profit of available CPU resources.
This seems to be in some way independent of ncpus parameter at Gpugrid app_config.xml

Setup at my twin GPU system is as follows:

<app>
<name>PythonGPU</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>0.49</cpu_usage>
</gpu_versions>
</app>

And setup for my triple GPU system is as follows:

<app>
<name>PythonGPU</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>0.33</cpu_usage>
</gpu_versions>
</app>

The finality for this is being able to respectively run two or three concurrent Python GPU tasks without reaching a full "1" CPU core (2 x 0.49 = 0.98; 3 x 0.33 = 0.99). Then, I manually control CPU usage by setting "Use at most XX % of the CPUs" at BOINC Manager for each system, according to its amount of CPU cores.
This allows me to run concurrently "N" Python GPU tasks and a fixed number of other CPU tasks as desired.
But as said, Gpugrid Python GPU app seems to take CPU resources as needed for successfully processing its tasks... at the cost of slowing down the other CPU applications.
ID: 58304 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58305 - Posted: 20 Jan 2022, 7:44:41 UTC

Yes, I use Process Lasso on all my Windows machines, but I haven't explored its use under Linux.

Remember that ncpus and similar has no effect whatsoever on the actual running of a BOINC project app - there is no 'control' element to its operation. The only effect it has is on BOINC's scheduling - how many tasks are allowed to run concurrently.
ID: 58305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 58306 - Posted: 20 Jan 2022, 15:58:45 UTC - in response to Message 58300.  

This message

19:39:23 (6124): task /usr/bin/flock reached time limit 1800


Indicates that, after 30 minutes, the installation of miniconda and the task environment setup have not been finished.

Consequently, python is not found later on to execute the task since it is one of the requirements of the miniconda environment.

application ./gpugridpy/bin/python missing


Therefore, it is not an error in itself, it just means that the miniconda setup went too slow for some reason (in theory 30 minutes should be enough time). Maybe the machine is slower than usual for some reason. Or the connection is slow and dependencies are not being downloaded.

We could extend this timeout, but normally if 30 minutes is not enough for the miniconda setup another underlying problem could exists.
ID: 58306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 58307 - Posted: 20 Jan 2022, 16:18:58 UTC - in response to Message 58306.  

it seems to be a reasonably fast system. my guess is another type of permissions issue which is blocking the python install and it hits the timeout, or the CPUs are being too heavily used and not giving enough resources to the extraction process.
ID: 58307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 614,515
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58308 - Posted: 20 Jan 2022, 22:15:20 UTC - in response to Message 58305.  

There is no Linux equivalent of Process Lasso.

But there is a Linux equivalent of Windows Process-Explorer

https://github.com/wolfc01/procexp

Screenshots of the application at the old SourceForge repo.

https://sourceforge.net/projects/procexp/

Can dynamically change the nice value of the application.

There is also the command line schedtool utility that can be easily implemented in a bash file. I used to run that all the time in my gpuoverclock.sh script for Seti cpu and gpu apps.
ID: 58308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58309 - Posted: 21 Jan 2022, 12:14:55 UTC - in response to Message 58308.  

Well, that got me a long way.

There are dependencies listed for Mint 18.3 - I'm running Mint 20.2

The apt-get for the older version of Mint returns

E: Unable to locate package python-qwt5-qt4
E: Unable to locate package python-configobj

Unsurprisingly, the next step returns

Traceback (most recent call last):
  File "./procexp.py", line 27, in <module>
    from PyQt5 import QtCore, QtGui, QtWidgets, uic
ModuleNotFoundError: No module named 'PyQt5'

htop, however, shows about 30 multitasking processes spawned from main, each using around 2% of a CPU core (varying by the second) at nice 19. At the time of inspection, that is. I'll go away and think about that.
ID: 58309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra