Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 50 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59804 - Posted: 25 Jan 2023, 22:24:59 UTC - in response to Message 59800.  

I don't see any parameter in the jobin page that allocates the number of cpus the task will tie up.

I don't know how the cpu resource is calculated. Must be internal in BOINC.

Richard Haselgrove probably knows the answer.

You're right - it doesn't belong there. It will be set in the <app_version>, through the plan class - see https://boinc.berkeley.edu/trac/wiki/AppPlanSpec.

And to amplify Ian's point - not only does BOINC not control the application, it merely allows the specified amount of free resource in which to run.
ID: 59804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59805 - Posted: 25 Jan 2023, 23:12:32 UTC - in response to Message 59804.  

Since these apps aren't proper BOINC MT or multi-threaded apps using a MT plan class, you wouldn't be using the <max_threads>N [M]</max_threads> parameter.

Seems like the proper parameter to use would be the <cpu_frac>x</cpu_frac> one.

Do you concur?
ID: 59805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59806 - Posted: 25 Jan 2023, 23:18:39 UTC

Bunch of the standard Python 4.03 versioned tasks have been going out and erroring out. I've had five so far today.

Problems in the main learner step with the pytorchl packages.

https://www.gpugrid.net/result.php?resultid=33268830
ID: 59806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59807 - Posted: 26 Jan 2023, 2:52:42 UTC - in response to Message 59806.  

Maybe this might help Abou with the scripting, I'm too green at Python to know.

https://stackoverflow.com/questions/58666537/error-while-feeding-tf-dataset-to-fit-keyerror-embedding-input
ID: 59807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59808 - Posted: 26 Jan 2023, 3:06:26 UTC

I was allocated two tasks of "ATM: Free energy calculations of protein-ligand binding v1.11 (cuda1121)" and both of them were cancelled by the server in transmission. What are these tasks about and why were they cancelled?
ID: 59808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59809 - Posted: 26 Jan 2023, 4:43:05 UTC - in response to Message 59808.  

The researcher cancelled them because they recognized a problem with how the package was put together and the tasks would fail.

So better to cancel them in the pipeline rather than waste download bandwidth and the cruncher's resources.

You can thank them for being responsible and diligent.
ID: 59809 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59810 - Posted: 26 Jan 2023, 4:49:01 UTC

I successfully ran one of the beta Python tasks after the first cruncher errored out the task.

https://www.gpugrid.net/result.php?resultid=33268305
ID: 59810 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59811 - Posted: 26 Jan 2023, 7:48:45 UTC - in response to Message 59800.  
Last modified: 26 Jan 2023, 8:06:21 UTC

The beta tasks were of the same size as the normal ones. So if they run faster hopefully the future PythonGPU tasks will too.
ID: 59811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59812 - Posted: 26 Jan 2023, 7:52:53 UTC - in response to Message 59806.  

Thank you very much for pointing it out. Will look at the error this morning!
ID: 59812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59813 - Posted: 26 Jan 2023, 13:26:50 UTC
Last modified: 26 Jan 2023, 14:16:29 UTC

finally got some more beta tasks and they seem to be running fine. and now limited to only 4 threads on the main run.py process.

but i did notice that VRAM use has increased by about 30%. tasks are now using more than 4GB on the GPU, before it was about 3.2GB. was that intentional?

are these beta tasks going to be the same as the new batch? beta is running fine but the small amount of new ones going out non-beta seem to be failing.
ID: 59813 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59814 - Posted: 26 Jan 2023, 13:50:56 UTC - in response to Message 59809.  

I never blamed anyone. Just asked a question for my own knowledge. Anyway, Thank you. Now I wish I could get a task.
ID: 59814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59815 - Posted: 26 Jan 2023, 15:26:43 UTC - in response to Message 59813.  

...
but i did notice that VRAM use has increased by about 30%. tasks are now using more than 4GB on the GPU, before it was about 3.2GB. was that intentional?
...

this is definitely bad news for GPUs with 8GB VRAM, like the two RTX3070 in my case. Before, I could run 2 tasks each GPU. It became quite tight, but it worked (with some 70-100MB left on the GPU the monitor is connected to).
ID: 59815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59816 - Posted: 26 Jan 2023, 15:37:07 UTC
Last modified: 26 Jan 2023, 15:40:53 UTC

Yes, these latest beta tasks use a little bit more GPU memory. The AI agent has a bigger neural network. Hope it is not too big and most machines can still handle it.

What about number of threads? Is it any better?

I also fixed the problems with the non-beta (queue was empty but I guess some failed jobs were added again to it after the new software version was released). Let me know if more errors occur please.
ID: 59816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59817 - Posted: 26 Jan 2023, 16:09:33 UTC - in response to Message 59816.  
Last modified: 26 Jan 2023, 16:38:39 UTC

i have 4 of the beta tasks running. the number of threads looks good. using 4 threads per task as specified in the run.py script.

i just got an older non-beta task resend, and it's working fine so far (after I manually edited the threads again.

but the setup with beta tasks seems viable to push out to non-beta now.

about VRAM use. so far, it seems when they first get going, they are using about 3800MB, but rises over time. at about 50% run completion the tasks are up to ~4600MB each. not sure how high they will go. the old tasks did not have this increasing VRAM as the task progressed behavior. is it necessary? or is it leaking and not cleaning up?
ID: 59817 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59819 - Posted: 27 Jan 2023, 8:47:47 UTC - in response to Message 59817.  
Last modified: 27 Jan 2023, 9:03:35 UTC

Great! very helpful feedback Ian thanks.

Since the scripts seem to run correctly I will start sending tasks to PythonGPU app with the current python script version.

In parallel, I will look into the VRAM increase running a few more tests in PythonGPUbeta. I don't think it is a memory leak but maybe there is way for a more efficient memory use in the code. I will dig a bit into it. Will post updates on that.
ID: 59819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59820 - Posted: 27 Jan 2023, 10:11:43 UTC

Got one of the new Betas, it's using about 28% average of my 16core 5950x in Windows 11 so roughly 9 threads?
ID: 59820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59821 - Posted: 27 Jan 2023, 10:23:36 UTC - in response to Message 59820.  

The scripts still spawn 32 python threads. But I think before with wandb and maybe without fixing some environ variables even more were spawned.

However, note that not 32 cores are necessary to run the scripts. Not sure what is the optimal number but much lower than 32.
ID: 59821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 27
Level
Met
Scientific publications
wat
Message 59822 - Posted: 27 Jan 2023, 11:50:54 UTC

Yea it definitely uses less overall CPU time that before, capped the apps at 10 cores now which seems like the sweet spot to allow me to also run other apps.
ID: 59822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59823 - Posted: 27 Jan 2023, 15:41:59 UTC
Last modified: 27 Jan 2023, 15:44:27 UTC

task 33269102

Eight times and it was a success. Does someone want to Post Mortem it?
ID: 59823 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 59824 - Posted: 27 Jan 2023, 20:54:29 UTC - in response to Message 59819.  

Great! very helpful feedback Ian thanks.

Since the scripts seem to run correctly I will start sending tasks to PythonGPU app with the current python script version.

In parallel, I will look into the VRAM increase running a few more tests in PythonGPUbeta. I don't think it is a memory leak but maybe there is way for a more efficient memory use in the code. I will dig a bit into it. Will post updates on that.


seeing up to 5.6GB VRAM use per task. but it doesnt seem consistent. some tasks will go up to ~4.8, others 4.5, etc. there doesnt seem to be a clear pattern to it.

the previous tasks were very consistent and always used exactly the same amount of VRAM.
ID: 59824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 40 · 41 · 42 · 43 · 44 · 45 · 46 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra