Experimental Python tasks (beta)

Author	Message
Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59101 - Posted: 12 Aug 2022, 1:45:16 UTC Last modified: 12 Aug 2022, 1:49:13 UTC The latest Python tasks I've done today have awarded 105,000 credits as compared to all the previous tasks at 75,000 credits. Looking back from the job_log, the estimated computation size has been at 1B GFLOPS for quite a while now. Nothing has changed in the current task parameters as far as I can tell. Estimated computation size 1,000,000,000 GFLOPs So I assume that Abouh has decided to award more credits for the work done. Anyone notice this new award level? They are generally taking longer to crunch than the previous ones, so maybe it is just scaling. ID: 59101 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 59102 - Posted: 12 Aug 2022, 1:50:33 UTC - in response to Message 59101. Anyone notice this new award level? I just got my first one. http://www.gpugrid.net/workunit.php?wuid=27270757 But not all the new ones receive that. A subsequent one received the usual 75,000 credit. ID: 59102 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59104 - Posted: 12 Aug 2022, 3:31:18 UTC - in response to Message 59102. Thanks for your report. It doesn't really track with scaling now that I examine my tasks. Some are getting the new higher reward for 2 hours of computation but some are still getting the lower reward for 8 hours of computation. I was getting what was the standard reward for tasks taking as little as 20 minutes of computation time. So the 75K was a little excessive in my opinion. These new ones are trending at 2-3 hours of computation time. But I also had one take 11 hours and was still rewarded with only the 105K. Maybe we are finally getting into the meat of the AI/ML investigation after all the initial training we have been doing. Still sitting on 3 new acemd3 tasks that haven't been looked at for two days and will only get the standard reward since the client scheduler feels no need to push them to the front since their APR and estimated completion times are correct and reasonable. Really would like to get the Python tasks to get realistic APR's and estimated completion times. But since they are predominately a cpu task with a little bit of gpu computation, BOINC has no clue how to handle them. Maybe Abouh can post some insight as to what the current investigation is doing. ID: 59104 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59105 - Posted: 12 Aug 2022, 6:30:00 UTC - in response to Message 59104. My first 'high rate' task (105K credits) was a workunit created at 10 Aug 2022 \| 2:03:51 UTC. Since then, I've only received one 75K task: my copy was issued to me at 10 Aug 2022 \| 21:15:47 UTC, but the underlying workunit was created at 9 Aug 2022 \| 13:44:09 UTC - I got a resend after two previous failures by other crunchers. My take is that the 'tariff' for GPUGrid tasks is set when the underlying workunit is created, and all subsequent tasks issued from that workunit inherit the same value. ID: 59105 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59107 - Posted: 12 Aug 2022, 15:28:04 UTC - in response to Message 59105. That implies the current release candidates are being assigned 105K credit based I assume on harder to crunch datasets. Don't think it depends on a recent release date either. I just had a 12 August _0 created task and it only awarded 75K after passing through one other before I got it. ID: 59107 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 59109 - Posted: 13 Aug 2022, 17:36:27 UTC Last modified: 13 Aug 2022, 17:40:02 UTC Which apps are running these days? The apps page is missing the column that shows how much is running: https://www.gpugrid.net/apps.php How many CPU threads do I need to run to finish Python WUs in a reasonable time for say an i9-9980XE? Trying to update my app_config to give a it a go. The last one I found was pretty old. Here's what I've cobbled together. Suggestions welcome. <app_config> <!-- i9-10980XE 18c36t 32 GB L3 Cache 24.75 MB --> <app> <name>acemd3</name> <plan_class>cuda1121</plan_class> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <fraction_done_exact/> </app> <app> <name>acemd4</name> <plan_class>cuda1121</plan_class> <gpu_versions> <cpu_usage>1.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <fraction_done_exact/> </app> <app> <name>PythonGPU</name> <plan_class>cuda1121</plan_class> <gpu_versions> <cpu_usage>4.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <app_version> <app_name>PythonGPU</app_name> <avg_ncpus>4</avg_ncpus> <ngpus>1</ngpus> <cmdline>--nthreads 4</cmdline> </app_version> <fraction_done_exact/> <max_concurrent>1</max_concurrent> </app> <app> <name>PythonGPUbeta</name> <plan_class>cuda1121</plan_class> <gpu_versions> <cpu_usage>4.0</cpu_usage> <gpu_usage>1.0</gpu_usage> </gpu_versions> <app_version> <app_name>PythonGPU</app_name> <avg_ncpus>4</avg_ncpus> <ngpus>1</ngpus> <cmdline>--nthreads 4</cmdline> </app_version> <fraction_done_exact/> <max_concurrent>1</max_concurrent> </app> <app> <name>Python</name> <plan_class>cuda1121</plan_class> <cpu_usage>4</cpu_usage> <gpu_versions> <cpu_usage>4</cpu_usage> <gpu_usage>1</gpu_usage> </gpu_versions> <app_version> <app_name>PythonGPU</app_name> <avg_ncpus>4</avg_ncpus> <ngpus>1</ngpus> <cmdline>--nthreads 4</cmdline> </app_version> <fraction_done_exact/> <max_concurrent>1</max_concurrent> </app> </app_config> ID: 59109 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59110 - Posted: 13 Aug 2022, 19:32:17 UTC I get away with only reserving 3 cpu threads. That does not impact or affect what the actual task does when it runs. Just BOINC cpu scheduling for other projects. It will always spawn 32 independent python processes when running. And you really should update or remove the plan class statements for Python on GPU since your plan_class is incorrect. Current plan_class is cuda1131 NOT cuda1121 You also can clean up your app_config as there only is PythonGPU application. No Python or PythonGPUBeta application. ID: 59110 · Rating: 0 · rate: / Reply Quote

[CSF] Aleksey Belkov Send message Joined: 26 Dec 13 Posts: 87 Credit: 1,292,358,731 RAC: 0 Level Scientific publications	Message 59111 - Posted: 14 Aug 2022, 22:20:08 UTC Last modified: 14 Aug 2022, 22:23:14 UTC Hi, guys! I have not particularly followed Python GPU app (for Windows) and this thread, so perhaps this issue has already been discussed somewhere on the forum. It seems I only tried once, and all tasks I received crashed almost immediately after start. I was surprised that at WU's starting, limit on Virtual memory(Commit Charge) in the system was reached. Today I tried to understand the problem in more detail and was surprised again to find that application addresses ~ 42 GiB Virtual Memory in total! At the same time, the total consumption of Physical Memory is about 4 times less (~ 10 GiB). For example So the question is - is that intended?.. I had to create a 30 GiB swap file to cover this difference so that I could run something else on the system besides one WU of Python GPU -_- ID: 59111 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59112 - Posted: 15 Aug 2022, 2:29:50 UTC - in response to Message 59111. Last modified: 15 Aug 2022, 2:37:41 UTC Hi, guys! I have not particularly followed Python GPU app (for Windows) and this thread, so perhaps this issue has already been discussed somewhere on the forum. It seems I only tried once, and all tasks I received crashed almost immediately after start. I was surprised that at WU's starting, limit on Virtual memory(Commit Charge) in the system was reached. Today I tried to understand the problem in more detail and was surprised again to find that application addresses ~ 42 GiB Virtual Memory in total! At the same time, the total consumption of Physical Memory is about 4 times less (~ 10 GiB). For example So the question is - is that intended?.. I had to create a 30 GiB swap file to cover this difference so that I could run something else on the system besides one WU of Python GPU -_- Yes, because of flaws in Windows memory management, that effect cannot be gotten around. You need to increase the size of your pagefile to the 50GB range to be safe. Linux does not have the problem and no changes are necessary to run the tasks. The project primarily develops Linux applications first as the development process is simpler. Then they tackle the difficulties of developing a Windows application with all the necessary workarounds. Just the way it is. For the reason why read this post. https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908 ID: 59112 · Rating: 0 · rate: / Reply Quote

[CSF] Aleksey Belkov Send message Joined: 26 Dec 13 Posts: 87 Credit: 1,292,358,731 RAC: 0 Level Scientific publications	Message 59113 - Posted: 15 Aug 2022, 3:04:54 UTC - in response to Message 59112. Last modified: 15 Aug 2022, 3:21:22 UTC Thank you for clarification. I was not familiar with subtleties of the memory allocation mechanism in Windows. That was useful. And I already increase swap to RAM value(64GB) to be sure ;) Upd. And the reward system for this app clearly begs for revision... : / ID: 59113 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59114 - Posted: 15 Aug 2022, 3:27:39 UTC Last modified: 15 Aug 2022, 3:33:03 UTC Task credits are fixed. Pay no attention to the running times. BOINC completely mishandles that since it has no recognition of the dual nature of these cpu-gpu application tasks. They should be thought of as primarily a cpu application with a little gpu use thrown in occasionally. [Edit] Look at the delta between sent time and returned time to determine the actual runtime that the task took. In your example, the first listed task took only 20 minutes to finish, the second took 4 1/2 hours and the last took 4 hours. it all depends on the different parameter sets for each task that is the criteria for the reinforcement learning on the gpu. ID: 59114 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59115 - Posted: 15 Aug 2022, 11:22:21 UTC Can anyone tell me what happened to this task: https://www.gpugrid.net/result.php?resultid=32997605 which failed after 301.281 seconds :-((( ID: 59115 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 59116 - Posted: 15 Aug 2022, 11:36:03 UTC - in response to Message 59115. RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:76] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes. It's possibly the Windows swap file settings, again. ID: 59116 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59117 - Posted: 15 Aug 2022, 14:51:53 UTC - in response to Message 59116. RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:76] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3612672 bytes. It's possibly the Windows swap file settings, again. thanks Richard for the quick reply. I now changed the page file size to max. 65MB. I did it on both drives: system drive C:/ and drive F:/ (on separate SSD) on which BOINC is running. Probably to change it for only one drive would have been okay, right? If so, which one? ID: 59117 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 42,316 Level Scientific publications	Message 59118 - Posted: 15 Aug 2022, 15:45:00 UTC - in response to Message 59117. The Windows one. ID: 59118 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 59119 - Posted: 15 Aug 2022, 16:10:21 UTC I am a bit surprised that I am able to run the pythons without problem under Ubuntu 20.04.4 on a GTX 1060. It has 3GB of video memory, and uses 2.8GB thus far. And the CPU is currently running two cores (down from the previous four cores), using about 3.7GB of memory, though reserving 19 GB. Even on Win10, my GTX 1650 Super has had no problems, though it has 4GB of memory and uses 3.6GB. But I have 32GB system memory, and for once I let Windows manage the virtual memory itself. It is reserving 42GB. I usually set it to a lower value. ID: 59119 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 91,654 Level Scientific publications	Message 59120 - Posted: 15 Aug 2022, 16:53:08 UTC - in response to Message 59118. The Windows one. thx :-) ID: 59120 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 11 Dec 08 Posts: 26 Credit: 668,444,294 RAC: 31,174 Level Scientific publications	Message 59141 - Posted: 20 Aug 2022, 15:08:53 UTC Can the CPU usage be adjusted correctly? its fine to use a number of cores but currently it say less than one and uses more than 1 ID: 59141 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59143 - Posted: 22 Aug 2022, 8:48:21 UTC - in response to Message 59107. Last modified: 22 Aug 2022, 10:44:01 UTC Hello! sorry for the late reply I adjusted the maximum length of some of the tasks and consequently also adjusted the credits for completing them. What I mean by that is that each one of my tasks contains an agent interacting with its environment and learning from a fixed number of total interaction steps. Previously I set that number to 25M steps. Now I increased it to 35M for some tasks and consequently also increased the reward.   This increase in the number of steps does not necessarily increase the completion time of the task, because if an agent discovers something relevant before reaching the maximum number of steps, the task ends and the “new information” is sent back to be shared with the other agents in the population. Whether that happens or not is random, but on average the task completion time will increase a bit due to the ones that reach 35M steps, so the reward has to increase as well. This change does not affect hardware requirements. This randomness also explains why some tasks are shorter but still receive the same reward (credits per task are fixed). However, the average credit reward should be similar for all hosts as they solve more and more tasks. Also the average task completion time should remain stable. As I have mentioned, I work with populations of AI agents that try to cooperatively solve a single complex problem. Note that as more things are discovered by agents in a population the harder it becomes to keep discovering new ones. In general, early tasks in an experiment return quite fast, while as the experiment progresses the 35M steps mark gets hit more and more often (and tasks take longer to complete). ID: 59143 · Rating: 0 · rate: / Reply Quote

abouh Send message Joined: 31 May 21 Posts: 200 Credit: 0 RAC: 0 Level Scientific publications	Message 59144 - Posted: 22 Aug 2022, 10:16:58 UTC - in response to Message 59076. Last modified: 22 Aug 2022, 10:24:44 UTC current value of rsc_fpops_est is 1e18, with 10e18 as limit. I remember we had to increase it because otherwise produced false “task aborted by host” from some users side. Do you think we should change it again? Regarding cpu_usage, I remember having this discussion with Toni and I think the reason why we set the number of cores to that number is because with a single core the jobs can actually be executed. Even if they create 32 threads. Definitely do not require 32 cores. Is there an advantage of setting it to an arbitrary number higher than 1? Couldn't that cause some allocation problems? sorry it is a bit outside of my knowledge zone... ID: 59144 · Rating: 0 · rate: / Reply Quote

Experimental Python tasks (beta) - task description