Experimental Python tasks (beta)

Author	Message
Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59440 - Posted: 12 Oct 2022, 5:05:45 UTC I notice a big difference in VRAM use between various Python tasks and/or systems, eg: - GPU with running 3 tasks simultaneously: 5.250 MB - GPU with running 2 tasks simultaneously: 5.012 MB - GPU with running 2 tasks simulteanously: 8.055 MB with the third one cited above I was lucky, VRAM of the GPU is 8.142 MB (FYI, all values including a few hundred MB for the monitor). Does anyone else make the same experience? ID: 59440 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59441 - Posted: 12 Oct 2022, 10:38:02 UTC - in response to Message 59430. Hello Aleksey, Yes, I struggled a bit with the single command solution. BOINC job requires specifying tasks in the following way. <task> <application>XXXXXX.exe</application> <command_line>XXXXXXXXXXXXX"</command_line> </task> And this is the command that should work right? 7za x "X:\BOINC\projects\www.gpugrid.net\pythongpu_windows_x86_64__cuda1131.txz.1a152f102cdad20f16638f0f269a5a17" -so \| 7za x -aoa -si -ttar Isn't it actually using 7za 2 times? After some testing, the conclusion I arrived to is that in principle it actually requires 2 BOINC tasks to do it, because 7za decompresses .txz to .tar, and then .tar to plain files. The only way to do it in one task would be to compress the files into a format that 7za can decompress in a single call (like zip, but we already discussed that ziped filed are too big). Does anyone know is that reasoning is correct? can BOINC wrappers execute commands like the one Aleksey suggested? ID: 59441 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59442 - Posted: 12 Oct 2022, 10:54:24 UTC - in response to Message 59439. Last modified: 12 Oct 2022, 14:21:04 UTC Hello, of course, let me explain tasks names "demos25" and "demos25_2" belong to 2 different variants of the same experiment. In particular the selection of the agents sent to GPUGrid is different. In both experiments the AI agents sent to GPUGrid learn using Reinforcement Learning, a machine learning technique that allows them to learn specific behaviours from interactions with their simulated environment (actually to make it faster they interact with 32 copies of the environment at the same time, the famous 32 threads). Also in both cases, when the agents "discover" something relevant, the job finishes and the info is sent back to be shared with the rest of the population. The difference between "demos25" and "demos25_2" experiments is that in "demos25_2" I am experimenting with a more careful selection of the environment regions each agent is targeted to explore. I try to direct each agent to explore a different region of the environment (or with little overlap with the rest). The result is that agents in "demos25_2" are more likely to find something relevant that the rest of the population has not found yet and therefore more likely to finish earlier. The "demos25" experiment, contrarily, uses a more "brute force" approach, and as the population grows it becomes more difficult for new agents to discover new things. I hope the explanation will make sense. Let me know if you have any other doubt, I will try to answer to it as well. There is also an experiment "demos25_3" in process which is similar to "demos25_2". ID: 59442 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 218,778,292 RAC: 12,880 Level Scientific publications	Message 59443 - Posted: 12 Oct 2022, 11:33:22 UTC - in response to Message 59442. Last modified: 12 Oct 2022, 11:34:44 UTC Each task patches several dlls to disable ASLR and make .nv_fatb sections read-only and leaves 1.93 GB of backup files. 05.01.2022 10:28 70 403 584 cudnn_ops_train64_8.dll_bak 05.01.2022 10:23 88 405 504 cudnn_ops_infer64_8.dll_bak 03.08.2022 04:04 1 329 664 torch_cuda_cpp.dll_bak 05.01.2022 11:21 81 487 360 cudnn_cnn_train64_8.dll_bak 05.01.2022 10:36 129 872 896 cudnn_adv_infer64_8.dll_bak 05.01.2022 10:46 97 293 824 cudnn_adv_train64_8.dll_bak 03.08.2022 05:05 871 934 464 torch_cuda_cu.dll_bak 05.01.2022 11:15 736 718 848 cudnn_cnn_infer64_8.dll_bak Can patched dlls be included in pythongpu_windows_x86_64__cuda1131.txz? ID: 59443 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59444 - Posted: 12 Oct 2022, 12:22:35 UTC - in response to Message 59440. I notice a big difference in VRAM use between various Python tasks and/or systems, eg: - GPU with running 3 tasks simultaneously: 5.250 MB - GPU with running 2 tasks simultaneously: 5.012 MB - GPU with running 2 tasks simulteanously: 8.055 MB with the third one cited above I was lucky, VRAM of the GPU is 8.142 MB (FYI, all values including a few hundred MB for the monitor). Does anyone else make the same experience? more powerful GPUs will use more VRAM than less powerful GPUs, it scales roughly with core count of the GPU. so a 3090 would use more VRAM than say a 1050Ti on the same exact task. it's just the way it works when the GPU sets up the task, if the task has to scale to 10,000 cores instead of 2,000, it needs to use more memory. ID: 59444 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59445 - Posted: 12 Oct 2022, 14:02:02 UTC - in response to Message 59444. more powerful GPUs will use more VRAM than less powerful GPUs, it scales roughly with core count of the GPU. okay, I see. Many thanks for explaining :-) One thing here that's a pitty is that the GPU with the largest VRAM (Quadro P5000: 16GB) has the lowest number of cores (2.560) :-( But, as so many times: one cannot have everything in life :-) ID: 59445 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 218,778,292 RAC: 12,880 Level Scientific publications	Message 59446 - Posted: 12 Oct 2022, 15:14:53 UTC - in response to Message 59445. Is here anyone with NVIDIA A100 80GB? ID: 59446 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59447 - Posted: 12 Oct 2022, 16:06:25 UTC - in response to Message 59446. Is here anyone with NVIDIA A100 80GB? only those with $10,000 to spare to use for free on DC. so likely no one ;) lol faster GPUs don't provide much benefit for these tasks since they are so CPU bound. sure there's a lot of VRAM on this card, and maybe you could theoretically spin up 10-15 tasks on a single card, but unless you have A LOT of CPU power and bandwidth to feed it, you're gonna hit another bottleneck before you can hope to benefit from running that many tasks. just 6x tasks maxes out my EPYC 7443P 48 threads @ 3.9GHz. maybe in the future the project can get these tasks to the point where they lean more on the GPU tensor cores and a more GPU only environment, but for now it's mostly a CPU environment with a small contribution by the GPU. ID: 59447 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59449 - Posted: 13 Oct 2022, 5:57:44 UTC Last modified: 13 Oct 2022, 6:03:37 UTC just wanted to download another Python task, but BOINC event log tells me the following: 13.10.2022 07:49:38 \| GPUGRID \| Nachricht vom Server: Python apps for GPU hosts needs 1296.10MB more disk space. You currently have 32082.50 MB available and it needs 33378.60 MB. I wonder why a Python needs 33.378 MB free disk space. Experience has shown that a Python takes some 8 GB disk space when being processed. So how come it says it needs 33GB ? ID: 59449 · Rating: 0 · rate: / Reply Quote

[CSF] Aleksey Belkov Send message Joined: 26 Dec 13 Posts: 87 Credit: 1,292,358,731 RAC: 0 Level Scientific publications	Message 59450 - Posted: 13 Oct 2022, 11:42:16 UTC - in response to Message 59449. Last modified: 13 Oct 2022, 11:46:15 UTC Experience has shown that a Python takes some 8 GB disk space when being processed. So how come it says it needs 33GB ? Check my previous post about space usage at PythonGPU startup stage. Previously: tar.gz >> slotX (2,66 GiB) >> tar (5,48 GiB) >> app files (~8,13 GiB) = 16,27 GiB (Since archives(tar.gz & tar) were not deleted). Now, after implementation of some improvements, at peak, consumption is about 13,61 GiB, and then(after startup stage) ~8,13 GiB. In any case, it seems to require adjustment. ID: 59450 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59451 - Posted: 13 Oct 2022, 12:04:58 UTC - in response to Message 59450. Last modified: 13 Oct 2022, 12:05:21 UTC In any case, it seems to require adjustment. I agree ID: 59451 · Rating: 0 · rate: / Reply Quote

[CSF] Aleksey Belkov Send message Joined: 26 Dec 13 Posts: 87 Credit: 1,292,358,731 RAC: 0 Level Scientific publications	Message 59452 - Posted: 13 Oct 2022, 12:10:16 UTC - in response to Message 59441. Last modified: 13 Oct 2022, 12:14:24 UTC Isn't it actually using 7za 2 times? After some testing, the conclusion I arrived to is that in principle it actually requires 2 BOINC tasks to do it Yeah, it seems you are right. Try use this: <task> <application>C:\Windows\System32\cmd.exe</application> <command_line>/C ".\7za.exe x pythongpu_windows_x86_64__cuda1131.txz -so \| .\7za.exe x -aoa -si -ttar"</command_line> </task> ID: 59452 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59453 - Posted: 14 Oct 2022, 9:21:33 UTC - in response to Message 59443. Patching seemed to be required to run so many threads with pytorchrl as these jobs do. Otherwise windows used a lot of memory for every new thread. The script that does the patching is relatively fast. So doing it locally would not save a lot of time. However, are you saying that after the patching some files could be deleted to further optimise memory use? If this is the case, I can look into it. These .dll_bak files? I am not very used to windows... ID: 59453 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59454 - Posted: 14 Oct 2022, 9:27:18 UTC - in response to Message 59449. Does anyone know if these requirements are estimated by BOINC and adjusted over time like completion time? or if manual adjustment is required? ID: 59454 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 9,834 Level Scientific publications	Message 59455 - Posted: 14 Oct 2022, 12:33:28 UTC - in response to Message 59454. my runtime estimates have come down to basically reasonable and real levels now. so i think it will adjust on its own over time. ID: 59455 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59456 - Posted: 14 Oct 2022, 14:53:44 UTC - in response to Message 59455. abouh's message 59454 was in response to a question about disk storage requirements. No, they won't adjust themselves over time: the amount of disk space required by the task is set by the server, and the amount available to the client is calculated from readings taken of the current state of the host computer. They will only change if the user adjusts the hardware or BOINC client options, or the project staff adjust the job specifications passed to the workunit generator. One the subject of runtimes: the (calculated) runtime estimation relies on just three things: The job speed (sent by the server in the <app_version> specification). The job size (again set on the server) and the Duration Correction Factor (dynamically adjusted by the client) SPEED seems to have fallen by approaching a half over the last month, but I haven't currently got a job I can verify that for. SIZE has remained the same while I've been monitoring it. DCF will have fallen dramatically - mine is now below 1 ID: 59456 · Rating: 0 · rate: / Reply Quote

kotenok2000 Send message Joined: 18 Jul 13 Posts: 79 Credit: 218,778,292 RAC: 12,880 Level Scientific publications	Message 59457 - Posted: 15 Oct 2022, 19:13:21 UTC What can this output mean? e00003a00008-ABOU_rnd_ppod_expand_demos25_9-0-1-RND2053 Update 464, num samples collected 118784, FPS 344 Algorithm: loss 0.1224, value_loss 0.0002, ivalue_loss 0.0113, rnd_loss 0.0307, action_loss 0.0846, entropy_loss 0.0043, mean_intrinsic_rewards 0.0421, min_intrinsic_rewards 0.0084, max_intrinsic_rewards 0.1857, mean_embed_dist 0.0000, max_embed_dist 0.0000, min_embed_dist 0.0000, min_external_reward 0.0000 Episodes: TrainReward 0.0000, l 360.6000, t 649.8340, UnclippedReward 0.0000, VisitedRooms 1.0000 REWARD DEMOS 25, INTRINSIC DEMOS 25, RHO 0.05, PHI 0.05, REWARD THRESHOLD 0.0, MAX DEMO REWARD -inf, INTRINSIC THRESHOLD 1000 FRAMES TO AVOID: 0 Update 465, num samples collected 122880, FPS 347 Algorithm: loss 0.1329, value_loss 0.0002, ivalue_loss 0.0098, rnd_loss 0.0317, action_loss 0.0955, entropy_loss 0.0043, mean_intrinsic_rewards 0.0414, min_intrinsic_rewards 0.0082, max_intrinsic_rewards 0.1516, mean_embed_dist 0.0000, max_embed_dist 0.0000, min_embed_dist 0.0000, min_external_reward 0.0000 Episodes: TrainReward 0.0000, l 341.3529, t 658.7952, UnclippedReward 0.0000, VisitedRooms 1.00000 ID: 59457 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59458 - Posted: 15 Oct 2022, 22:06:06 UTC - in response to Message 59457. Nothing of any meaning or consequence for you. Pertinent only to the researcher. ID: 59458 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59459 - Posted: 16 Oct 2022, 7:34:24 UTC - in response to Message 59457. These are just the logs of the algorithm, printing out the relevant metrics during agent training. ID: 59459 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59460 - Posted: 17 Oct 2022, 16:23:51 UTC I now have had 5 tasks in a row which failed after some 2.100 secs, one after the other, within about half an hour. https://www.gpugrid.net/result.php?resultid=33098926 https://www.gpugrid.net/result.php?resultid=33100629 https://www.gpugrid.net/result.php?resultid=33100675 https://www.gpugrid.net/result.php?resultid=33100715 https://www.gpugrid.net/result.php?resultid=33100745 anyone any idea what is the problem? On the same host, another task has been running for 22 hours now, but I have stopped download of new tasks until it's clear what's going on. ID: 59460 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description