Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 37 · 38 · 39 · 40 · 41 · 42 · 43 . . . 50 · Next

AuthorMessage
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59652 - Posted: 25 Dec 2022, 20:27:03 UTC
Last modified: 25 Dec 2022, 20:52:26 UTC

Feliz navidad, amigos!

Odd thing about Pythons using GPU. They seem incoherent about their time reportage.

I see them finish in around 10-12 hours but the cpu time is much greater. Often around 80 hrs.

Looking at the properties of a running task I see:


Application
Python apps for GPU hosts 4.04 (cuda1131)
Name
e00007a01485-ABOU_rnd_ppod_expand_demos30_2_test2-0-1-RND0975
State
Running
Received
12/25/2022 1:49:40 AM
Report deadline
12/30/2022 1:49:40 AM
Resources
0.988 CPUs + 1 NVIDIA GPU
Estimated computation size
1,000,000,000 GFLOPs
CPU time
3d 00:26:17
CPU time since checkpoint
00:04:01
Elapsed time
09:17:00
Estimated time remaining
07:04:18
Fraction done
96.160%
Virtual memory size
6.91 GB
Working set size
1.66 GB
Directory
slots/0
Process ID
16952
Progress rate
10.440% per hour
Executable
wrapper_6.1_windows_x86_64.exe
________________

Notice the cpu time vs the elapsed time.

I also see that the estimate of time remaining runs ridiculously high.

Though the wrapper claims to be 0.988 CPUs it is actually using up to 70% on machines with fewer cores when nothing else is running. The more cpu time slices it can get of any available threads the faster it seems to run up to the point of max usable by the wrapper. It also eats up as much as 50GB of commit charge (total memory) and more.

It seems to be immune to the BOINC manager limits on cpu usage, so it can easily peg your processor usage with other projects running. Setting max processor usage at 25-33% should ensure that the WUs finish within the 105,000 point bonus time frame if you are running other projects simultaneously.

Another observation I made was that dual graphics cards don't seem to work with this wrapper on my hosts. GPU1 always seemed to stay at 4% or so while GPU0's WU ran at a reduced speed.
This limitation is coincidentally identical to my experience on MLC@Home. In addition, I am seeing that very modest GPUs (1050s and such) are just as effective as the latest models at producing points when the cpu runs unconstrained. That was also noted at MLC from my experience.
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59653 - Posted: 25 Dec 2022, 21:43:07 UTC

This has been commented on extensively in this thread if you had read it.

The cpu_time is not calculated correctly because BOINC has no mechanisim to deal with these one of a kind tasks who are using machine learning and are of dual cpu-gpu nature.

The tasks spawn 32 processes on your cpu and will use a significant amount of cpu resources and main and virtual memory.

They sporadically use your gpu in brief spurts of computation before passing computation back to the cpu.

As long as the gpu has 4 GB of VRAM, the tasks can be run on very moderate gpu hardware.
ID: 59653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 59655 - Posted: 26 Dec 2022, 10:42:37 UTC - in response to Message 59652.  
Last modified: 26 Dec 2022, 10:43:03 UTC

You can create app_config.xml with
<app_config>
    <app>
       <name>PythonGPU</name>
       <fraction_done_exact/>
    </app>
</app_config>

It should make it display more accurate time estimation.
ID: 59655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59656 - Posted: 26 Dec 2022, 11:21:30 UTC - in response to Message 59655.  

You can create app_config.xml with
<app_config>
    <app>
       <name>PythonGPU</name>
       <fraction_done_exact/>
    </app>
</app_config>

It should make it display more accurate time estimation.

for me, this worked well with all the ACEMD tasks. It does NOT work with the Pythons.
I am talking about Windows; maybe it works with Linux, no idea.
ID: 59656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59657 - Posted: 26 Dec 2022, 18:53:30 UTC
Last modified: 26 Dec 2022, 18:54:20 UTC

As I mentioned BOINC has no idea how to display these tasks because they do not fit in ANY category that BOINC is coded for.

So no existing BOINC mechanism can properly display the cpu usage or get even close with time estimations.

Does not matter whether the host is Windows, Mac or Linux based. The OS has nothing to do with the issue.

The issue is BOINC.
ID: 59657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59658 - Posted: 26 Dec 2022, 22:51:48 UTC

Thanks for the tips guys. Sorry about being captain obvious there, I just rejoined this project and should have caught up on the thread before reporting my observations.
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59676 - Posted: 4 Jan 2023, 8:09:36 UTC

Right now: ~ 14.200 "unsent" Python tasks in the queue.

I guess it will take a while until they all are processed.
ID: 59676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59677 - Posted: 4 Jan 2023, 22:00:56 UTC
Last modified: 4 Jan 2023, 22:39:03 UTC

Looks good but getting some bad WUs.

Had 3 errors in a row on the same host and thought it was something about the machine until I checked to see who else ran them. They were on their last chance runs.
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59678 - Posted: 5 Jan 2023, 7:00:23 UTC - in response to Message 59677.  

Could you provide the name of the task? I will take a look at the errors.
ID: 59678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59680 - Posted: 5 Jan 2023, 12:37:46 UTC - in response to Message 59678.  

Could you provide the name of the task? I will take a look at the errors.

In case you don't want to wait until he reads your posting for replying - look here:

http://www.gpugrid.net/results.php?hostid=602606
ID: 59680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 59681 - Posted: 5 Jan 2023, 13:41:30 UTC - in response to Message 59680.  

It seems some of them need more pagefile than usual
ID: 59681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59682 - Posted: 5 Jan 2023, 18:00:53 UTC

I see the tasks that my host and some others crashed were successfully finished eventually. Sorry to have assumed before the fact.
I suspect my host had errors because it was running Mapping Cancer Markers concurrent with Python. Once I suspended WCG tasks it has run error free.

Thanks to Eric for providing the host link.
Sorry for the misinformation.
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59683 - Posted: 5 Jan 2023, 18:40:57 UTC

abouh,

can you confirm the section of code that the task spends the most time on?

is it here?

while not learner.done():

    learner.step()


I'm still trying to track down why AMD systems use so much more CPU than Intel systems. I even went so far as to rebuild the numpy module against MKL (yours is using the default BLAS, not MKL or OpenBLAS). and injecting it into the environment package. but it made no difference again. probably because it looks like numpy is barely used in the code anyway and not in the main loop.
ID: 59683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59684 - Posted: 6 Jan 2023, 17:48:38 UTC - in response to Message 59682.  

Pop Piasa wrote:

I suspect my host had errors because it was running Mapping Cancer Markers concurrent with Python. Once I suspended WCG tasks it has run error free.

I had made the same experience when I began crunching Pythons.
Best is not to run anything else.
ID: 59684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
theBigAl

Send message
Joined: 4 Oct 22
Posts: 4
Credit: 2,297,295,306
RAC: 0
Level
Phe
Scientific publications
wat
Message 59685 - Posted: 7 Jan 2023, 1:05:39 UTC - in response to Message 59684.  
Last modified: 7 Jan 2023, 1:52:20 UTC

I've been running WCG (CPU only tasks though) and GPUGrid concurrently past few days and its working out fine so far.
ID: 59685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59686 - Posted: 7 Jan 2023, 18:38:56 UTC - in response to Message 59685.  

My Intel hosts seem to have no problems, only my Ryzen5-5600X. Same memory in all of them. That is indeed odd because theBigAl is using the exact same processor without errors. one difference is that theBigAl is running Windows 11 where I have Win 10 on my host.

Erich56 is spot-on that Python likes to have the machine (or virtual machine) to itself for these integrated gpu tasks. I have seen them drop from around 14 hrs. to under twelve hrs. completion time by stopping concurrent projects.

How does one get two or more of these to run with multiple gpus in a host?
I took a second card back out of one of my hosts because it just slowed it down running these.
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59689 - Posted: 8 Jan 2023, 0:35:56 UTC - in response to Message 59686.  

Because of the unique issue with virtual memory on Windows compared to Linux, I don't know if running more than a single task is doable, let alone running multiple gpus.

And yes, it is possible to run more than a single gpu on these tasks in Linux.

My teammate Ian has been running 3X concurrently on his 2X 3060's and now 2X RTX A4000 gpus.
ID: 59689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
theBigAl

Send message
Joined: 4 Oct 22
Posts: 4
Credit: 2,297,295,306
RAC: 0
Level
Phe
Scientific publications
wat
Message 59690 - Posted: 8 Jan 2023, 3:04:22 UTC - in response to Message 59686.  

My Intel hosts seem to have no problems, only my Ryzen5-5600X. Same memory in all of them. That is indeed odd because theBigAl is using the exact same processor without errors. one difference is that theBigAl is running Windows 11 where I have Win 10 on my host.


I dont know if it'll help but I have allocated 100Gb of virtual memory swap for the computer which is probably an overkill but doesn't hurt to try if you got the space.

I'll up that to 140Gb when I'll eagerly receive my 3060ti tomorrow and testing out if it can run multiple GPU tasks on Win11 (probably not and even if it does it'll run a lot slower since it'll be CPU bound then)
ID: 59690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59691 - Posted: 8 Jan 2023, 6:39:09 UTC - in response to Message 59689.  

Keith Myers wrote:
Because of the unique issue with virtual memory on Windows compared to Linux, I don't know if running more than a single task is doable, let alone running multiple gpus.

On my host with 1 GTX980ti and 1 Intel i7-4930K I run 2 Pythons concurrently.
On my host with 2 RTX3070 and 1 i9-10900KF I run 4 Pythons concurrently.
On my host with 1 Quadro P5000 and 2 Xeon E5-2667 v4 I run 4 Pythons concurrently.

All Windows 10.
No problems at all (except that I don't make it below 24hours with any task)
ID: 59691 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59693 - Posted: 8 Jan 2023, 14:20:12 UTC
Last modified: 8 Jan 2023, 14:30:40 UTC

abouh,

as a followup to my previous post, I think I've narrowed down the issue in your script/setup that causes unnecessarily high CPU use for newer and high core count hosts. I was able to reduce the CPU use from 100% to 40%, and speed up task execution at the same time (due to much less scheduling conflicts with so many running processes). I was able to connect with someone who understands these tools and they helped me figure out what's wrong, I'll paraphrase their comments and findings below.

the basic answer is that the thread pool isn't configured correctly for wandb. (Its only configured for parser so its unlikely correctly limiting amount of threads - and likely there's a soft error somewhere)

Line 447 & 448
spawns threads, but doesn't specify them anywhere.

Line 373
defines how many thread processes will be used; but it doesnt seem to work correctly. it's defined as 16, but changing this value does nothing, and on my 64-core system, 64 processes are spun up for each running task. in addition to the 32 agents spawned. a 16-core CPU will spin up 16+32 processes, and so on. trying to run 10 concurrent tasks on my 64-core system results in a staggering 960 processes being run, this seems to cripple the system and it slows things down as a result.

https://docs.wandb.ai/guides/track/advanced/distributed-training
(by end of the page, shows how they are configured correctly)

do you get the error log in the npz output? is this send back with tasks? I tried to read this file but could not, it's compressed or encrypted. it may contain more information about what is setup wrong with the wandb mp pool.

I was able to work around this issue by setting environment variables to put hard limits on the number of processes used. i edited run.py at line 445 with:

    NUM_THREADS = "8"
    os.environ["OMP_NUM_THREADS"] = NUM_THREADS
    os.environ["OPENBLAS_NUM_THREADS"] = NUM_THREADS
    os.environ["MKL_NUM_THREADS"] = NUM_THREADS
    os.environ["CNN_NUM_THREADS"] = NUM_THREADS
    os.environ["VECLIB_MAXIMUM_THREADS"] = NUM_THREADS
    os.environ["NUMEXPR_NUM_THREADS"] = NUM_THREADS


but it's not a proper fix. I added further workarounds to make this a little more persistent for myself, but it will need to be fixed by the project to fix for everyone. proper fix would be investigating what is the soft error in the error log file, with full access to the job (which we don't have - and we cannot implement proper mp without it).

you could band-aid fix with the same edits I have for run.py, but It might cause issues if you have less than 8 threads I guess? or maybe it's fine since the script launches so many processes anyway. I'm still testing to see if there's a point where less threads on run.py actually slows the task down. on these fast CPUs I might be able to run as little as 4.
ID: 59693 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 37 · 38 · 39 · 40 · 41 · 42 · 43 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra