Experimental Python tasks (beta)

Author	Message
Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 59652 - Posted: 25 Dec 2022, 20:27:03 UTC Last modified: 25 Dec 2022, 20:52:26 UTC Feliz navidad, amigos! Odd thing about Pythons using GPU. They seem incoherent about their time reportage. I see them finish in around 10-12 hours but the cpu time is much greater. Often around 80 hrs. Looking at the properties of a running task I see: Application Python apps for GPU hosts 4.04 (cuda1131) Name e00007a01485-ABOU_rnd_ppod_expand_demos30_2_test2-0-1-RND0975 State Running Received 12/25/2022 1:49:40 AM Report deadline 12/30/2022 1:49:40 AM Resources 0.988 CPUs + 1 NVIDIA GPU Estimated computation size 1,000,000,000 GFLOPs CPU time 3d 00:26:17 CPU time since checkpoint 00:04:01 Elapsed time 09:17:00 Estimated time remaining 07:04:18 Fraction done 96.160% Virtual memory size 6.91 GB Working set size 1.66 GB Directory slots/0 Process ID 16952 Progress rate 10.440% per hour Executable wrapper_6.1_windows_x86_64.exe ________________ Notice the cpu time vs the elapsed time. I also see that the estimate of time remaining runs ridiculously high. Though the wrapper claims to be 0.988 CPUs it is actually using up to 70% on machines with fewer cores when nothing else is running. The more cpu time slices it can get of any available threads the faster it seems to run up to the point of max usable by the wrapper. It also eats up as much as 50GB of commit charge (total memory) and more. It seems to be immune to the BOINC manager limits on cpu usage, so it can easily peg your processor usage with other projects running. Setting max processor usage at 25-33% should ensure that the WUs finish within the 105,000 point bonus time frame if you are running other projects simultaneously. Another observation I made was that dual graphics cards don't seem to work with this wrapper on my hosts. GPU1 always seemed to stay at 4% or so while GPU0's WU ran at a reduced speed. This limitation is coincidentally identical to my experience on MLC@Home. In addition, I am seeing that very modest GPUs (1050s and such) are just as effective as the latest models at producing points when the cpu runs unconstrained. That was also noted at MLC from my experience. "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation ID: 59652 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59653 - Posted: 25 Dec 2022, 21:43:07 UTC This has been commented on extensively in this thread if you had read it. The cpu_time is not calculated correctly because BOINC has no mechanisim to deal with these one of a kind tasks who are using machine learning and are of dual cpu-gpu nature. The tasks spawn 32 processes on your cpu and will use a significant amount of cpu resources and main and virtual memory. They sporadically use your gpu in brief spurts of computation before passing computation back to the cpu. As long as the gpu has 4 GB of VRAM, the tasks can be run on very moderate gpu hardware. ID: 59653 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 218,778,292 RAC: 12,880 Level Scientific publications	Message 59655 - Posted: 26 Dec 2022, 10:42:37 UTC - in response to Message 59652. Last modified: 26 Dec 2022, 10:43:03 UTC You can create app_config.xml with <app_config> <app> <name>PythonGPU</name> <fraction_done_exact/> </app> </app_config> It should make it display more accurate time estimation. ID: 59655 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59656 - Posted: 26 Dec 2022, 11:21:30 UTC - in response to Message 59655. You can create app_config.xml with <app_config> <app> <name>PythonGPU</name> <fraction_done_exact/> </app> </app_config> It should make it display more accurate time estimation. for me, this worked well with all the ACEMD tasks. It does NOT work with the Pythons. I am talking about Windows; maybe it works with Linux, no idea. ID: 59656 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59657 - Posted: 26 Dec 2022, 18:53:30 UTC Last modified: 26 Dec 2022, 18:54:20 UTC As I mentioned BOINC has no idea how to display these tasks because they do not fit in ANY category that BOINC is coded for. So no existing BOINC mechanism can properly display the cpu usage or get even close with time estimations. Does not matter whether the host is Windows, Mac or Linux based. The OS has nothing to do with the issue. The issue is BOINC. ID: 59657 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 59658 - Posted: 26 Dec 2022, 22:51:48 UTC Thanks for the tips guys. Sorry about being captain obvious there, I just rejoined this project and should have caught up on the thread before reporting my observations. "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation ID: 59658 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59676 - Posted: 4 Jan 2023, 8:09:36 UTC Right now: ~ 14.200 "unsent" Python tasks in the queue. I guess it will take a while until they all are processed. ID: 59676 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 59677 - Posted: 4 Jan 2023, 22:00:56 UTC Last modified: 4 Jan 2023, 22:39:03 UTC Looks good but getting some bad WUs. Had 3 errors in a row on the same host and thought it was something about the machine until I checked to see who else ran them. They were on their last chance runs. "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation ID: 59677 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59678 - Posted: 5 Jan 2023, 7:00:23 UTC - in response to Message 59677. Could you provide the name of the task? I will take a look at the errors. ID: 59678 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59680 - Posted: 5 Jan 2023, 12:37:46 UTC - in response to Message 59678. Could you provide the name of the task? I will take a look at the errors. In case you don't want to wait until he reads your posting for replying - look here: http://www.gpugrid.net/results.php?hostid=602606 ID: 59680 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 218,778,292 RAC: 12,880 Level Scientific publications	Message 59681 - Posted: 5 Jan 2023, 13:41:30 UTC - in response to Message 59680. It seems some of them need more pagefile than usual ID: 59681 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 59682 - Posted: 5 Jan 2023, 18:00:53 UTC I see the tasks that my host and some others crashed were successfully finished eventually. Sorry to have assumed before the fact. I suspect my host had errors because it was running Mapping Cancer Markers concurrent with Python. Once I suspended WCG tasks it has run error free. Thanks to Eric for providing the host link. Sorry for the misinformation. "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation ID: 59682 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59683 - Posted: 5 Jan 2023, 18:40:57 UTC abouh, can you confirm the section of code that the task spends the most time on? is it here? while not learner.done(): learner.step() I'm still trying to track down why AMD systems use so much more CPU than Intel systems. I even went so far as to rebuild the numpy module against MKL (yours is using the default BLAS, not MKL or OpenBLAS). and injecting it into the environment package. but it made no difference again. probably because it looks like numpy is barely used in the code anyway and not in the main loop. ID: 59683 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59684 - Posted: 6 Jan 2023, 17:48:38 UTC - in response to Message 59682. Pop Piasa wrote: I suspect my host had errors because it was running Mapping Cancer Markers concurrent with Python. Once I suspended WCG tasks it has run error free. I had made the same experience when I began crunching Pythons. Best is not to run anything else. ID: 59684 · Rating: 0 · rate: / Reply Quote

theBigAl Send message Joined: 4 Oct 22 Posts: 4 Credit: 2,297,295,306 RAC: 0 Level Scientific publications	Message 59685 - Posted: 7 Jan 2023, 1:05:39 UTC - in response to Message 59684. Last modified: 7 Jan 2023, 1:52:20 UTC I've been running WCG (CPU only tasks though) and GPUGrid concurrently past few days and its working out fine so far. ID: 59685 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 59686 - Posted: 7 Jan 2023, 18:38:56 UTC - in response to Message 59685. My Intel hosts seem to have no problems, only my Ryzen5-5600X. Same memory in all of them. That is indeed odd because theBigAl is using the exact same processor without errors. one difference is that theBigAl is running Windows 11 where I have Win 10 on my host. Erich56 is spot-on that Python likes to have the machine (or virtual machine) to itself for these integrated gpu tasks. I have seen them drop from around 14 hrs. to under twelve hrs. completion time by stopping concurrent projects. How does one get two or more of these to run with multiple gpus in a host? I took a second card back out of one of my hosts because it just slowed it down running these. "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation ID: 59686 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59689 - Posted: 8 Jan 2023, 0:35:56 UTC - in response to Message 59686. Because of the unique issue with virtual memory on Windows compared to Linux, I don't know if running more than a single task is doable, let alone running multiple gpus. And yes, it is possible to run more than a single gpu on these tasks in Linux. My teammate Ian has been running 3X concurrently on his 2X 3060's and now 2X RTX A4000 gpus. ID: 59689 · Rating: 0 · rate: / Reply Quote

theBigAl Send message Joined: 4 Oct 22 Posts: 4 Credit: 2,297,295,306 RAC: 0 Level Scientific publications	Message 59690 - Posted: 8 Jan 2023, 3:04:22 UTC - in response to Message 59686. My Intel hosts seem to have no problems, only my Ryzen5-5600X. Same memory in all of them. That is indeed odd because theBigAl is using the exact same processor without errors. one difference is that theBigAl is running Windows 11 where I have Win 10 on my host. I dont know if it'll help but I have allocated 100Gb of virtual memory swap for the computer which is probably an overkill but doesn't hurt to try if you got the space. I'll up that to 140Gb when I'll eagerly receive my 3060ti tomorrow and testing out if it can run multiple GPU tasks on Win11 (probably not and even if it does it'll run a lot slower since it'll be CPU bound then) ID: 59690 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59691 - Posted: 8 Jan 2023, 6:39:09 UTC - in response to Message 59689. Keith Myers wrote: Because of the unique issue with virtual memory on Windows compared to Linux, I don't know if running more than a single task is doable, let alone running multiple gpus. On my host with 1 GTX980ti and 1 Intel i7-4930K I run 2 Pythons concurrently. On my host with 2 RTX3070 and 1 i9-10900KF I run 4 Pythons concurrently. On my host with 1 Quadro P5000 and 2 Xeon E5-2667 v4 I run 4 Pythons concurrently. All Windows 10. No problems at all (except that I don't make it below 24hours with any task) ID: 59691 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59693 - Posted: 8 Jan 2023, 14:20:12 UTC Last modified: 8 Jan 2023, 14:30:40 UTC abouh, as a followup to my previous post, I think I've narrowed down the issue in your script/setup that causes unnecessarily high CPU use for newer and high core count hosts. I was able to reduce the CPU use from 100% to 40%, and speed up task execution at the same time (due to much less scheduling conflicts with so many running processes). I was able to connect with someone who understands these tools and they helped me figure out what's wrong, I'll paraphrase their comments and findings below. the basic answer is that the thread pool isn't configured correctly for wandb. (Its only configured for parser so its unlikely correctly limiting amount of threads - and likely there's a soft error somewhere) Line 447 & 448 spawns threads, but doesn't specify them anywhere. Line 373 defines how many thread processes will be used; but it doesnt seem to work correctly. it's defined as 16, but changing this value does nothing, and on my 64-core system, 64 processes are spun up for each running task. in addition to the 32 agents spawned. a 16-core CPU will spin up 16+32 processes, and so on. trying to run 10 concurrent tasks on my 64-core system results in a staggering 960 processes being run, this seems to cripple the system and it slows things down as a result. https://docs.wandb.ai/guides/track/advanced/distributed-training (by end of the page, shows how they are configured correctly) do you get the error log in the npz output? is this send back with tasks? I tried to read this file but could not, it's compressed or encrypted. it may contain more information about what is setup wrong with the wandb mp pool. I was able to work around this issue by setting environment variables to put hard limits on the number of processes used. i edited run.py at line 445 with: NUM_THREADS = "8" os.environ["OMP_NUM_THREADS"] = NUM_THREADS os.environ["OPENBLAS_NUM_THREADS"] = NUM_THREADS os.environ["MKL_NUM_THREADS"] = NUM_THREADS os.environ["CNN_NUM_THREADS"] = NUM_THREADS os.environ["VECLIB_MAXIMUM_THREADS"] = NUM_THREADS os.environ["NUMEXPR_NUM_THREADS"] = NUM_THREADS but it's not a proper fix. I added further workarounds to make this a little more persistent for myself, but it will need to be fixed by the project to fix for everyone. proper fix would be investigating what is the soft error in the error log file, with full access to the job (which we don't have - and we cannot implement proper mp without it). you could band-aid fix with the same edits I have for run.py, but It might cause issues if you have less than 8 threads I guess? or maybe it's fine since the script launches so many processes anyway. I'm still testing to see if there's a point where less threads on run.py actually slows the task down. on these fast CPUs I might be able to run as little as 4. ID: 59693 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description