Message boards :
Graphics cards (GPUs) :
Python apps for GPU hosts errors
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 21 Oct 21 Posts: 4 Credit: 223,165,413 RAC: 43 Level ![]() Scientific publications
|
I'm fairly certain I've been running these "Python apps for GPU hosts" successfully before. Now I see 85-90% with "Error while computing status". If I check I am one of 4-8 others with same status, although not necessarily the same underlying error. http://www.gpugrid.net/workunit.php?wuid=27392690 http://www.gpugrid.net/result.php?resultid=33277602 Anyway the error I'm seeing is:
Define learner
Created Learner.
Look for a progress_last_chk file - if exists, adjust target_env_steps
Define train loop
Traceback (most recent call last):
File "C:\ProgramData\BOINC\slots\3\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data
self.next_batch = self.batches.__next__()
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Last in the traceback is the following, which I'm not sure if it is the original exception or not. If it is can I adjust max_split_size_mb (how and where) and what is a good value for it? RuntimeError: CUDA out of memory. Tried to allocate 202.00 MiB (GPU 0; 2.00 GiB total capacity; 1.23 GiB already allocated; 0 bytes free; 1.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 19:02:23 (17760): python.exe exited; CPU time 1095.984375 Thoughts, suggestions... Thanks in advance. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
the answer is simple: GPUs with only 2 GB VRAM are too small for processing Python tasks. |
|
Send message Joined: 21 Oct 21 Posts: 4 Credit: 223,165,413 RAC: 43 Level ![]() Scientific publications
|
GPUs with only 2 GB VRAM are too small for processing Python tasks. Ok sure. Then what has changed since I've been running these tasks successfully prior to 2 (or maybe 3) weeks ago? My system is the same. Did I miss something? |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
The latest series of 1000 tasks uses more VRAM as posted by the researcher. |
|
Send message Joined: 21 Oct 21 Posts: 4 Credit: 223,165,413 RAC: 43 Level ![]() Scientific publications
|
The latest series of 1000 tasks uses more VRAM as posted by the researcher. Thanks, figures... I've missed something. Is there a link or forum post from the researcher that you could point me to? |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
GPUs with only 2 GB VRAM are too small for processing Python tasks. about 3 weeks ago, ACEMD3 tasks were distributed for a while, but no Pythons. Maybe you crunched ACEMD3 tasks at that time? They do not nearly need as much VRAM as the Pythons do. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The latest series of 1000 tasks uses more VRAM as posted by the researcher. hm, that's strange - here it seems to happen the other way round: from what I can see e.g. on my Quadro P5000: with 4 Pythons running concurrently, before VRAM use was nearly 16GB, now it's below 12 GB. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
The latest series of 1000 tasks uses more VRAM as posted by the researcher. Most recently, 4 Pythons running concurrently on the P5000 use roughly 9,8 GB VRAM - so it's becoming less all the time |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
The latest series of 1000 tasks uses more VRAM as posted by the researcher. you should check at which stage of running the tasks are on for a more insightful picture on what's happening. when the task fist starts, for about the first 5 minutes, its only extracting the archive. it will use no VRAM during this time. from about 5 minutes to 10minutes or so, it will use a reduced amount. 2-3GB. then after 10-15mins or so, it gets to the main process and will use the full VRAM amount. about 3-4GB. so far I have noticed two main sizes with the new batches. I have some tasks using about 3GB (which is the same as a few weeks ago) and some tasks using about 4GB which lines more with the recent tasks. I have not noticed any key indicator in the file names to determine which tasks are using the lower VRAM and which are using more.
|
|
Send message Joined: 21 Oct 21 Posts: 4 Credit: 223,165,413 RAC: 43 Level ![]() Scientific publications
|
about 3 weeks ago, ACEMD3 tasks were distributed for a while, but no Pythons. Fair enough as I can't say for sure which Application(s) was(/were) shown in my GPUGRID task list. I think my assumption was based on the processes seen in (Windows) Task Manager, where I would see dozens of Python processes when processing a GPUGRID task. i.e. maybe other applications than "Python apps for GPU hosts" use Python(?) And, going back further than 3 weeks, never until now have I seen so many tasks failing. By luck I've processed one "Python apps for GPU hosts" task overnight and another is currently running longer than the usual failure point. It still would be nice to see a link or forum post from researcher(s) with requirements and release notes for the applications. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
you should check at which stage of running the tasks are on for a more insightful picture on what's happening. on the Quadro P5000, the status of this moment is as follows: task 1: 82% - 19:58 hrs task 2: 31% - 7:09 hrs task 3: 14% - 2:43 hrs task 4: 22% - 4:36 hrs VRAM use: 9.834 MB- and this even includes a few hundred MB for the monitor. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
All the pertinent information about the Python tasks is always posted in the main thread in News. https://www.gpugrid.net/forum_thread.php?id=5233 This statement about the memory reduction for the next series is here. https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59838 |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This statement about the memory reduction for the next series is here. From what I can see on all my hosts which crunch Pythons: the VRAM requirement of the recent tasks has dropped considerably |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
Maybe this is some change affecting windows only. All my tasks are still using 3-4GB each.
|
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I'm still seeing 3-4GB each for the Python tasks also on my Linux Ubuntu hosts. |
|
Send message Joined: 8 Aug 19 Posts: 252 Credit: 458,054,251 RAC: 0 Level ![]() Scientific publications ![]()
|
Maybe this is some change affecting windows only. No guys, my Windows hosts are using the same ~4GBs graphics mem on the latest released WUs. Earlier I've noticed some "exp" tasks have used over 6GB, so there must be a variance among tasks. I wonder if he saw some ACEMD tasks go through and mistook them for PythonGPUs maybe. Running a PythonGPU on 2GB seems almost impossible to me. "Together we crunch To check out a hunch And wish all our credit Could just buy us lunch" Piasa Tribe - Illini Nation |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
you should check at which stage of running the tasks are on for a more insightful picture on what's happening. The 4 Pythons which have been running for several hours ea. right now are using even less VRAM then the ones reported above from 2 days ago - total VRAM use is 8.840 MB. So there seems to be quite some variance between these Pythons. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I don't believe your numbers. Whatever utility you are using in Windows is not reporting correctly or more likely you are interpreting what it displays or looking at the wrong numbers. I will believe what nvidia-smi.exe shows. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I don't believe your numbers. Whatever utility you are using in Windows is not reporting correctly or more likely you are interpreting what it displays or looking at the wrong numbers. The utility I use is GPU-Z. So, maybe it indeed shows wrong figures, I cannot tell for sure, of course. As already said in another thread about a week ago: nvidia-smi unfortunately does not function here, no idea why. There is a problem "access denied" |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
There must be some way to run the command in a Windows terminal with elevated rights. The application is a user level application that Nvidia provides in all distributions. https://www.minitool.com/news/elevated-command-prompt.html |
©2025 Universitat Pompeu Fabra