Experimental Python tasks (beta) - task description

Message boards : News : Experimental Python tasks (beta) - task description
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 50 · Next

AuthorMessage
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59440 - Posted: 12 Oct 2022, 5:05:45 UTC

I notice a big difference in VRAM use between various Python tasks and/or systems, eg:

- GPU with running 3 tasks simultaneously: 5.250 MB
- GPU with running 2 tasks simultaneously: 5.012 MB
- GPU with running 2 tasks simulteanously: 8.055 MB

with the third one cited above I was lucky, VRAM of the GPU is 8.142 MB

(FYI, all values including a few hundred MB for the monitor).

Does anyone else make the same experience?
ID: 59440 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59441 - Posted: 12 Oct 2022, 10:38:02 UTC - in response to Message 59430.  

Hello Aleksey,

Yes, I struggled a bit with the single command solution. BOINC job requires specifying tasks in the following way.

<task>
<application>XXXXXX.exe</application>
<command_line>XXXXXXXXXXXXX"</command_line>
</task>


And this is the command that should work right?

7za x "X:\BOINC\projects\www.gpugrid.net\pythongpu_windows_x86_64__cuda1131.txz.1a152f102cdad20f16638f0f269a5a17" -so | 7za x -aoa -si -ttar



Isn't it actually using 7za 2 times? After some testing, the conclusion I arrived to is that in principle it actually requires 2 BOINC tasks to do it, because 7za decompresses .txz to .tar, and then .tar to plain files. The only way to do it in one task would be to compress the files into a format that 7za can decompress in a single call (like zip, but we already discussed that ziped filed are too big).

Does anyone know is that reasoning is correct? can BOINC wrappers execute commands like the one Aleksey suggested?
ID: 59441 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59442 - Posted: 12 Oct 2022, 10:54:24 UTC - in response to Message 59439.  
Last modified: 12 Oct 2022, 14:21:04 UTC

Hello, of course, let me explain

tasks names "demos25" and "demos25_2" belong to 2 different variants of the same experiment. In particular the selection of the agents sent to GPUGrid is different.

In both experiments the AI agents sent to GPUGrid learn using Reinforcement Learning, a machine learning technique that allows them to learn specific behaviours from interactions with their simulated environment (actually to make it faster they interact with 32 copies of the environment at the same time, the famous 32 threads). Also in both cases, when the agents "discover" something relevant, the job finishes and the info is sent back to be shared with the rest of the population.

The difference between "demos25" and "demos25_2" experiments is that in "demos25_2" I am experimenting with a more careful selection of the environment regions each agent is targeted to explore. I try to direct each agent to explore a different region of the environment (or with little overlap with the rest). The result is that agents in "demos25_2" are more likely to find something relevant that the rest of the population has not found yet and therefore more likely to finish earlier. The "demos25" experiment, contrarily, uses a more "brute force" approach, and as the population grows it becomes more difficult for new agents to discover new things.

I hope the explanation will make sense. Let me know if you have any other doubt, I will try to answer to it as well. There is also an experiment "demos25_3" in process which is similar to "demos25_2".
ID: 59442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 59443 - Posted: 12 Oct 2022, 11:33:22 UTC - in response to Message 59442.  
Last modified: 12 Oct 2022, 11:34:44 UTC

Each task patches several dlls to disable ASLR and make .nv_fatb sections read-only and leaves 1.93 GB of backup files.
05.01.2022 10:28 70 403 584 cudnn_ops_train64_8.dll_bak
05.01.2022 10:23 88 405 504 cudnn_ops_infer64_8.dll_bak
03.08.2022 04:04 1 329 664 torch_cuda_cpp.dll_bak
05.01.2022 11:21 81 487 360 cudnn_cnn_train64_8.dll_bak
05.01.2022 10:36 129 872 896 cudnn_adv_infer64_8.dll_bak
05.01.2022 10:46 97 293 824 cudnn_adv_train64_8.dll_bak
03.08.2022 05:05 871 934 464 torch_cuda_cu.dll_bak
05.01.2022 11:15 736 718 848 cudnn_cnn_infer64_8.dll_bak
Can patched dlls be included in pythongpu_windows_x86_64__cuda1131.txz?
ID: 59443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 59444 - Posted: 12 Oct 2022, 12:22:35 UTC - in response to Message 59440.  

I notice a big difference in VRAM use between various Python tasks and/or systems, eg:

- GPU with running 3 tasks simultaneously: 5.250 MB
- GPU with running 2 tasks simultaneously: 5.012 MB
- GPU with running 2 tasks simulteanously: 8.055 MB

with the third one cited above I was lucky, VRAM of the GPU is 8.142 MB

(FYI, all values including a few hundred MB for the monitor).

Does anyone else make the same experience?


more powerful GPUs will use more VRAM than less powerful GPUs, it scales roughly with core count of the GPU. so a 3090 would use more VRAM than say a 1050Ti on the same exact task. it's just the way it works when the GPU sets up the task, if the task has to scale to 10,000 cores instead of 2,000, it needs to use more memory.

ID: 59444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59445 - Posted: 12 Oct 2022, 14:02:02 UTC - in response to Message 59444.  

more powerful GPUs will use more VRAM than less powerful GPUs, it scales roughly with core count of the GPU.

okay, I see. Many thanks for explaining :-)

One thing here that's a pitty is that the GPU with the largest VRAM (Quadro P5000: 16GB) has the lowest number of cores (2.560) :-(

But, as so many times: one cannot have everything in life :-)
ID: 59445 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 59446 - Posted: 12 Oct 2022, 15:14:53 UTC - in response to Message 59445.  

Is here anyone with NVIDIA A100 80GB?
ID: 59446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 59447 - Posted: 12 Oct 2022, 16:06:25 UTC - in response to Message 59446.  

Is here anyone with NVIDIA A100 80GB?


only those with $10,000 to spare to use for free on DC. so likely no one ;) lol

faster GPUs don't provide much benefit for these tasks since they are so CPU bound. sure there's a lot of VRAM on this card, and maybe you could theoretically spin up 10-15 tasks on a single card, but unless you have A LOT of CPU power and bandwidth to feed it, you're gonna hit another bottleneck before you can hope to benefit from running that many tasks.

just 6x tasks maxes out my EPYC 7443P 48 threads @ 3.9GHz.

maybe in the future the project can get these tasks to the point where they lean more on the GPU tensor cores and a more GPU only environment, but for now it's mostly a CPU environment with a small contribution by the GPU.
ID: 59447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59449 - Posted: 13 Oct 2022, 5:57:44 UTC
Last modified: 13 Oct 2022, 6:03:37 UTC

just wanted to download another Python task, but BOINC event log tells me the following:

13.10.2022 07:49:38 | GPUGRID | Nachricht vom Server: Python apps for GPU hosts needs 1296.10MB more disk space. You currently have 32082.50 MB available and it needs 33378.60 MB.

I wonder why a Python needs 33.378 MB free disk space.
Experience has shown that a Python takes some 8 GB disk space when being processed. So how come it says it needs 33GB ?
ID: 59449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 26 Dec 13
Posts: 86
Credit: 1,292,358,731
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59450 - Posted: 13 Oct 2022, 11:42:16 UTC - in response to Message 59449.  
Last modified: 13 Oct 2022, 11:46:15 UTC


Experience has shown that a Python takes some 8 GB disk space when being processed. So how come it says it needs 33GB ?

Check my previous post about space usage at PythonGPU startup stage.
Previously: tar.gz >> slotX (2,66 GiB) >> tar (5,48 GiB) >> app files (~8,13 GiB) = 16,27 GiB (Since archives(tar.gz & tar) were not deleted).
Now, after implementation of some improvements, at peak, consumption is about 13,61 GiB, and then(after startup stage) ~8,13 GiB.
In any case, it seems to require adjustment.
ID: 59450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59451 - Posted: 13 Oct 2022, 12:04:58 UTC - in response to Message 59450.  
Last modified: 13 Oct 2022, 12:05:21 UTC

In any case, it seems to require adjustment.

I agree
ID: 59451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 26 Dec 13
Posts: 86
Credit: 1,292,358,731
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59452 - Posted: 13 Oct 2022, 12:10:16 UTC - in response to Message 59441.  
Last modified: 13 Oct 2022, 12:14:24 UTC


Isn't it actually using 7za 2 times? After some testing, the conclusion I arrived to is that in principle it actually requires 2 BOINC tasks to do it

Yeah, it seems you are right.

Try use this:
<task>
<application>C:\Windows\System32\cmd.exe</application>
<command_line>/C ".\7za.exe x pythongpu_windows_x86_64__cuda1131.txz -so | .\7za.exe x -aoa -si -ttar"</command_line>
</task>
ID: 59452 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59453 - Posted: 14 Oct 2022, 9:21:33 UTC - in response to Message 59443.  

Patching seemed to be required to run so many threads with pytorchrl as these jobs do. Otherwise windows used a lot of memory for every new thread. The script that does the patching is relatively fast. So doing it locally would not save a lot of time.

However, are you saying that after the patching some files could be deleted to further optimise memory use? If this is the case, I can look into it. These .dll_bak files? I am not very used to windows...

ID: 59453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59454 - Posted: 14 Oct 2022, 9:27:18 UTC - in response to Message 59449.  

Does anyone know if these requirements are estimated by BOINC and adjusted over time like completion time? or if manual adjustment is required?
ID: 59454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 4,772
Level
Trp
Scientific publications
wat
Message 59455 - Posted: 14 Oct 2022, 12:33:28 UTC - in response to Message 59454.  

my runtime estimates have come down to basically reasonable and real levels now. so i think it will adjust on its own over time.
ID: 59455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 318
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59456 - Posted: 14 Oct 2022, 14:53:44 UTC - in response to Message 59455.  

abouh's message 59454 was in response to a question about disk storage requirements. No, they won't adjust themselves over time: the amount of disk space required by the task is set by the server, and the amount available to the client is calculated from readings taken of the current state of the host computer. They will only change if the user adjusts the hardware or BOINC client options, or the project staff adjust the job specifications passed to the workunit generator.

One the subject of runtimes: the (calculated) runtime estimation relies on just three things:
The job speed (sent by the server in the <app_version> specification).
The job size (again set on the server)
and the Duration Correction Factor (dynamically adjusted by the client)

SPEED seems to have fallen by approaching a half over the last month, but I haven't currently got a job I can verify that for.
SIZE has remained the same while I've been monitoring it.
DCF will have fallen dramatically - mine is now below 1
ID: 59456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 18 Jul 13
Posts: 79
Credit: 210,528,292
RAC: 0
Level
Leu
Scientific publications
wat
Message 59457 - Posted: 15 Oct 2022, 19:13:21 UTC

What can this output mean?
e00003a00008-ABOU_rnd_ppod_expand_demos25_9-0-1-RND2053
Update 464, num samples collected 118784, FPS 344
  Algorithm: loss 0.1224, value_loss 0.0002, ivalue_loss 0.0113, rnd_loss 0.0307, action_loss 0.0846, entropy_loss 0.0043, mean_intrinsic_rewards 0.0421, min_intrinsic_rewards 0.0084, max_intrinsic_rewards 0.1857, mean_embed_dist 0.0000, max_embed_dist 0.0000, min_embed_dist 0.0000, min_external_reward 0.0000
  Episodes: TrainReward 0.0000, l 360.6000, t 649.8340, UnclippedReward 0.0000, VisitedRooms 1.0000

REWARD DEMOS 25, INTRINSIC DEMOS 25, RHO 0.05, PHI 0.05, REWARD THRESHOLD 0.0, MAX DEMO REWARD -inf, INTRINSIC THRESHOLD 1000

FRAMES TO AVOID: 0

Update 465, num samples collected 122880, FPS 347
  Algorithm: loss 0.1329, value_loss 0.0002, ivalue_loss 0.0098, rnd_loss 0.0317, action_loss 0.0955, entropy_loss 0.0043, mean_intrinsic_rewards 0.0414, min_intrinsic_rewards 0.0082, max_intrinsic_rewards 0.1516, mean_embed_dist 0.0000, max_embed_dist 0.0000, min_embed_dist 0.0000, min_external_reward 0.0000
  Episodes: TrainReward 0.0000, l 341.3529, t 658.7952, UnclippedReward 0.0000, VisitedRooms 1.00000
ID: 59457 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 662
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59458 - Posted: 15 Oct 2022, 22:06:06 UTC - in response to Message 59457.  

Nothing of any meaning or consequence for you. Pertinent only to the researcher.
ID: 59458 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
abouh

Send message
Joined: 31 May 21
Posts: 200
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 59459 - Posted: 16 Oct 2022, 7:34:24 UTC - in response to Message 59457.  

These are just the logs of the algorithm, printing out the relevant metrics during agent training.
ID: 59459 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59460 - Posted: 17 Oct 2022, 16:23:51 UTC

I now have had 5 tasks in a row which failed after some 2.100 secs, one after the other, within about half an hour.

https://www.gpugrid.net/result.php?resultid=33098926
https://www.gpugrid.net/result.php?resultid=33100629
https://www.gpugrid.net/result.php?resultid=33100675
https://www.gpugrid.net/result.php?resultid=33100715
https://www.gpugrid.net/result.php?resultid=33100745

anyone any idea what is the problem?

On the same host, another task has been running for 22 hours now, but I have stopped download of new tasks until it's clear what's going on.
ID: 59460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 50 · Next

Message boards : News : Experimental Python tasks (beta) - task description

©2025 Universitat Pompeu Fabra