Experimental Python tasks (beta)

Author	Message
Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59804 - Posted: 25 Jan 2023, 22:24:59 UTC - in response to Message 59800. I don't see any parameter in the jobin page that allocates the number of cpus the task will tie up. I don't know how the cpu resource is calculated. Must be internal in BOINC. Richard Haselgrove probably knows the answer. You're right - it doesn't belong there. It will be set in the <app_version>, through the plan class - see https://boinc.berkeley.edu/trac/wiki/AppPlanSpec. And to amplify Ian's point - not only does BOINC not control the application, it merely allows the specified amount of free resource in which to run. ID: 59804 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 4 Level Scientific publications	Message 59805 - Posted: 25 Jan 2023, 23:12:32 UTC - in response to Message 59804. Since these apps aren't proper BOINC MT or multi-threaded apps using a MT plan class, you wouldn't be using the <max_threads>N [M]</max_threads> parameter. Seems like the proper parameter to use would be the <cpu_frac>x</cpu_frac> one. Do you concur? ID: 59805 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 4 Level Scientific publications	Message 59806 - Posted: 25 Jan 2023, 23:18:39 UTC Bunch of the standard Python 4.03 versioned tasks have been going out and erroring out. I've had five so far today. Problems in the main learner step with the pytorchl packages. https://www.gpugrid.net/result.php?resultid=33268830 ID: 59806 · Rating: 0 · rate: / Reply Quote

Pop Piasa Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level Scientific publications	Message 59807 - Posted: 26 Jan 2023, 2:52:42 UTC - in response to Message 59806. Maybe this might help Abou with the scripting, I'm too green at Python to know. https://stackoverflow.com/questions/58666537/error-while-feeding-tf-dataset-to-fit-keyerror-embedding-input ID: 59807 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59808 - Posted: 26 Jan 2023, 3:06:26 UTC I was allocated two tasks of "ATM: Free energy calculations of protein-ligand binding v1.11 (cuda1121)" and both of them were cancelled by the server in transmission. What are these tasks about and why were they cancelled? ID: 59808 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 4 Level Scientific publications	Message 59809 - Posted: 26 Jan 2023, 4:43:05 UTC - in response to Message 59808. The researcher cancelled them because they recognized a problem with how the package was put together and the tasks would fail. So better to cancel them in the pipeline rather than waste download bandwidth and the cruncher's resources. You can thank them for being responsible and diligent. ID: 59809 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 4 Level Scientific publications	Message 59810 - Posted: 26 Jan 2023, 4:49:01 UTC I successfully ran one of the beta Python tasks after the first cruncher errored out the task. https://www.gpugrid.net/result.php?resultid=33268305 ID: 59810 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59811 - Posted: 26 Jan 2023, 7:48:45 UTC - in response to Message 59800. Last modified: 26 Jan 2023, 8:06:21 UTC The beta tasks were of the same size as the normal ones. So if they run faster hopefully the future PythonGPU tasks will too. ID: 59811 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59812 - Posted: 26 Jan 2023, 7:52:53 UTC - in response to Message 59806. Thank you very much for pointing it out. Will look at the error this morning! ID: 59812 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 1 Level Scientific publications	Message 59813 - Posted: 26 Jan 2023, 13:26:50 UTC Last modified: 26 Jan 2023, 14:16:29 UTC finally got some more beta tasks and they seem to be running fine. and now limited to only 4 threads on the main run.py process. but i did notice that VRAM use has increased by about 30%. tasks are now using more than 4GB on the GPU, before it was about 3.2GB. was that intentional? are these beta tasks going to be the same as the new batch? beta is running fine but the small amount of new ones going out non-beta seem to be failing. ID: 59813 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59814 - Posted: 26 Jan 2023, 13:50:56 UTC - in response to Message 59809. I never blamed anyone. Just asked a question for my own knowledge. Anyway, Thank you. Now I wish I could get a task. ID: 59814 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1171 Credit: 12,662,148,501 RAC: 459,461 Level Scientific publications	Message 59815 - Posted: 26 Jan 2023, 15:26:43 UTC - in response to Message 59813. ... but i did notice that VRAM use has increased by about 30%. tasks are now using more than 4GB on the GPU, before it was about 3.2GB. was that intentional? ... this is definitely bad news for GPUs with 8GB VRAM, like the two RTX3070 in my case. Before, I could run 2 tasks each GPU. It became quite tight, but it worked (with some 70-100MB left on the GPU the monitor is connected to). ID: 59815 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59816 - Posted: 26 Jan 2023, 15:37:07 UTC Last modified: 26 Jan 2023, 15:40:53 UTC Yes, these latest beta tasks use a little bit more GPU memory. The AI agent has a bigger neural network. Hope it is not too big and most machines can still handle it. What about number of threads? Is it any better? I also fixed the problems with the non-beta (queue was empty but I guess some failed jobs were added again to it after the new software version was released). Let me know if more errors occur please. ID: 59816 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 1 Level Scientific publications	Message 59817 - Posted: 26 Jan 2023, 16:09:33 UTC - in response to Message 59816. Last modified: 26 Jan 2023, 16:38:39 UTC i have 4 of the beta tasks running. the number of threads looks good. using 4 threads per task as specified in the run.py script. i just got an older non-beta task resend, and it's working fine so far (after I manually edited the threads again. but the setup with beta tasks seems viable to push out to non-beta now. about VRAM use. so far, it seems when they first get going, they are using about 3800MB, but rises over time. at about 50% run completion the tasks are up to ~4600MB each. not sure how high they will go. the old tasks did not have this increasing VRAM as the task progressed behavior. is it necessary? or is it leaking and not cleaning up? ID: 59817 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59819 - Posted: 27 Jan 2023, 8:47:47 UTC - in response to Message 59817. Last modified: 27 Jan 2023, 9:03:35 UTC Great! very helpful feedback Ian thanks. Since the scripts seem to run correctly I will start sending tasks to PythonGPU app with the current python script version. In parallel, I will look into the VRAM increase running a few more tests in PythonGPUbeta. I don't think it is a memory leak but maybe there is way for a more efficient memory use in the code. I will dig a bit into it. Will post updates on that. ID: 59819 · Rating: 0 · rate: / Reply Quote

Ryan Munro Send message Joined: 6 Mar 18 Posts: 38 Credit: 1,405,292,080 RAC: 62,283 Level Scientific publications	Message 59820 - Posted: 27 Jan 2023, 10:11:43 UTC Got one of the new Betas, it's using about 28% average of my 16core 5950x in Windows 11 so roughly 9 threads? ID: 59820 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59821 - Posted: 27 Jan 2023, 10:23:36 UTC - in response to Message 59820. The scripts still spawn 32 python threads. But I think before with wandb and maybe without fixing some environ variables even more were spawned. However, note that not 32 cores are necessary to run the scripts. Not sure what is the optimal number but much lower than 32. ID: 59821 · Rating: 0 · rate: / Reply Quote

Ryan Munro Send message Joined: 6 Mar 18 Posts: 38 Credit: 1,405,292,080 RAC: 62,283 Level Scientific publications	Message 59822 - Posted: 27 Jan 2023, 11:50:54 UTC Yea it definitely uses less overall CPU time that before, capped the apps at 10 cores now which seems like the sweet spot to allow me to also run other apps. ID: 59822 · Rating: 0 · rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 138 Credit: 539,953,398 RAC: 0 Level Scientific publications	Message 59823 - Posted: 27 Jan 2023, 15:41:59 UTC Last modified: 27 Jan 2023, 15:44:27 UTC task 33269102 Eight times and it was a success. Does someone want to Post Mortem it? ID: 59823 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1117 Credit: 40,876,970,595 RAC: 1 Level Scientific publications	Message 59824 - Posted: 27 Jan 2023, 20:54:29 UTC - in response to Message 59819. Great! very helpful feedback Ian thanks. Since the scripts seem to run correctly I will start sending tasks to PythonGPU app with the current python script version. In parallel, I will look into the VRAM increase running a few more tests in PythonGPUbeta. I don't think it is a memory leak but maybe there is way for a more efficient memory use in the code. I will dig a bit into it. Will post updates on that. seeing up to 5.6GB VRAM use per task. but it doesnt seem consistent. some tasks will go up to ~4.8, others 4.5, etc. there doesnt seem to be a clear pattern to it. the previous tasks were very consistent and always used exactly the same amount of VRAM. ID: 59824 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description