Python apps for GPU hosts errors

Message boards : Graphics cards (GPUs) : Python apps for GPU hosts errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
r_podl

Send message
Joined: 21 Oct 21
Posts: 4
Credit: 223,165,413
RAC: 43
Level
Leu
Scientific publications
wat
Message 59861 - Posted: 2 Feb 2023, 2:47:15 UTC

I'm fairly certain I've been running these "Python apps for GPU hosts" successfully before. Now I see 85-90% with "Error while computing status". If I check I am one of 4-8 others with same status, although not necessarily the same underlying error.
http://www.gpugrid.net/workunit.php?wuid=27392690
http://www.gpugrid.net/result.php?resultid=33277602

Anyway the error I'm seeing is:
Define learner
Created Learner.
Look for a progress_last_chk file - if exists, adjust target_env_steps
Define train loop
Traceback (most recent call last):
  File "C:\ProgramData\BOINC\slots\3\lib\site-packages\pytorchrl\scheme\gradients\g_worker.py", line 196, in get_data
    self.next_batch = self.batches.__next__()
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

Last in the traceback is the following, which I'm not sure if it is the original exception or not. If it is can I adjust max_split_size_mb (how and where) and what is a good value for it?
RuntimeError: CUDA out of memory. Tried to allocate 202.00 MiB (GPU 0; 2.00 GiB total capacity; 1.23 GiB already allocated; 0 bytes free; 1.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
19:02:23 (17760): python.exe exited; CPU time 1095.984375

Thoughts, suggestions...
Thanks in advance.
ID: 59861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59864 - Posted: 2 Feb 2023, 6:29:50 UTC - in response to Message 59861.  

the answer is simple:

GPUs with only 2 GB VRAM are too small for processing Python tasks.
ID: 59864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
r_podl

Send message
Joined: 21 Oct 21
Posts: 4
Credit: 223,165,413
RAC: 43
Level
Leu
Scientific publications
wat
Message 59865 - Posted: 2 Feb 2023, 22:17:15 UTC - in response to Message 59864.  
Last modified: 2 Feb 2023, 22:17:45 UTC

GPUs with only 2 GB VRAM are too small for processing Python tasks.

Ok sure. Then what has changed since I've been running these tasks successfully prior to 2 (or maybe 3) weeks ago? My system is the same. Did I miss something?
ID: 59865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59866 - Posted: 2 Feb 2023, 23:41:17 UTC - in response to Message 59865.  

The latest series of 1000 tasks uses more VRAM as posted by the researcher.
ID: 59866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
r_podl

Send message
Joined: 21 Oct 21
Posts: 4
Credit: 223,165,413
RAC: 43
Level
Leu
Scientific publications
wat
Message 59867 - Posted: 3 Feb 2023, 2:54:01 UTC - in response to Message 59866.  

The latest series of 1000 tasks uses more VRAM as posted by the researcher.

Thanks, figures... I've missed something. Is there a link or forum post from the researcher that you could point me to?
ID: 59867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59868 - Posted: 3 Feb 2023, 6:30:06 UTC - in response to Message 59865.  

GPUs with only 2 GB VRAM are too small for processing Python tasks.

Ok sure. Then what has changed since I've been running these tasks successfully prior to 2 (or maybe 3) weeks ago? My system is the same. Did I miss something?

about 3 weeks ago, ACEMD3 tasks were distributed for a while, but no Pythons.
Maybe you crunched ACEMD3 tasks at that time? They do not nearly need as much VRAM as the Pythons do.
ID: 59868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59869 - Posted: 3 Feb 2023, 6:46:15 UTC - in response to Message 59866.  

The latest series of 1000 tasks uses more VRAM as posted by the researcher.

hm, that's strange - here it seems to happen the other way round:
from what I can see e.g. on my Quadro P5000: with 4 Pythons running concurrently, before VRAM use was nearly 16GB, now it's below 12 GB.
ID: 59869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59870 - Posted: 3 Feb 2023, 14:49:05 UTC - in response to Message 59869.  

The latest series of 1000 tasks uses more VRAM as posted by the researcher.

hm, that's strange - here it seems to happen the other way round:
from what I can see e.g. on my Quadro P5000: with 4 Pythons running concurrently, before VRAM use was nearly 16GB, now it's below 12 GB.

Most recently, 4 Pythons running concurrently on the P5000 use roughly 9,8 GB VRAM - so it's becoming less all the time
ID: 59870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 59871 - Posted: 3 Feb 2023, 15:01:48 UTC - in response to Message 59870.  

The latest series of 1000 tasks uses more VRAM as posted by the researcher.

hm, that's strange - here it seems to happen the other way round:
from what I can see e.g. on my Quadro P5000: with 4 Pythons running concurrently, before VRAM use was nearly 16GB, now it's below 12 GB.

Most recently, 4 Pythons running concurrently on the P5000 use roughly 9,8 GB VRAM - so it's becoming less all the time


you should check at which stage of running the tasks are on for a more insightful picture on what's happening.

when the task fist starts, for about the first 5 minutes, its only extracting the archive. it will use no VRAM during this time.

from about 5 minutes to 10minutes or so, it will use a reduced amount. 2-3GB. then after 10-15mins or so, it gets to the main process and will use the full VRAM amount. about 3-4GB.

so far I have noticed two main sizes with the new batches. I have some tasks using about 3GB (which is the same as a few weeks ago) and some tasks using about 4GB which lines more with the recent tasks. I have not noticed any key indicator in the file names to determine which tasks are using the lower VRAM and which are using more.
ID: 59871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
r_podl

Send message
Joined: 21 Oct 21
Posts: 4
Credit: 223,165,413
RAC: 43
Level
Leu
Scientific publications
wat
Message 59872 - Posted: 3 Feb 2023, 15:18:13 UTC - in response to Message 59868.  

about 3 weeks ago, ACEMD3 tasks were distributed for a while, but no Pythons.
Maybe you crunched ACEMD3 tasks at that time? They do not nearly need as much VRAM as the Pythons do.

Fair enough as I can't say for sure which Application(s) was(/were) shown in my GPUGRID task list. I think my assumption was based on the processes seen in (Windows) Task Manager, where I would see dozens of Python processes when processing a GPUGRID task. i.e. maybe other applications than "Python apps for GPU hosts" use Python(?) And, going back further than 3 weeks, never until now have I seen so many tasks failing.

By luck I've processed one "Python apps for GPU hosts" task overnight and another is currently running longer than the usual failure point.

It still would be nice to see a link or forum post from researcher(s) with requirements and release notes for the applications.
ID: 59872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59873 - Posted: 3 Feb 2023, 15:45:00 UTC - in response to Message 59871.  
Last modified: 3 Feb 2023, 15:49:17 UTC

you should check at which stage of running the tasks are on for a more insightful picture on what's happening.

on the Quadro P5000, the status of this moment is as follows:

task 1: 82% - 19:58 hrs
task 2: 31% - 7:09 hrs
task 3: 14% - 2:43 hrs
task 4: 22% - 4:36 hrs

VRAM use: 9.834 MB- and this even includes a few hundred MB for the monitor.
ID: 59873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59874 - Posted: 3 Feb 2023, 17:48:39 UTC - in response to Message 59872.  


It still would be nice to see a link or forum post from researcher(s) with requirements and release notes for the applications.


All the pertinent information about the Python tasks is always posted in the main thread in News.
https://www.gpugrid.net/forum_thread.php?id=5233

This statement about the memory reduction for the next series is here.
https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59838
ID: 59874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59875 - Posted: 3 Feb 2023, 20:57:27 UTC - in response to Message 59874.  

This statement about the memory reduction for the next series is here.
https://www.gpugrid.net/forum_thread.php?id=5233&nowrap=true#59838

From what I can see on all my hosts which crunch Pythons: the VRAM requirement of the recent tasks has dropped considerably
ID: 59875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 59876 - Posted: 4 Feb 2023, 3:10:19 UTC - in response to Message 59875.  

Maybe this is some change affecting windows only. All my tasks are still using 3-4GB each.
ID: 59876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59877 - Posted: 4 Feb 2023, 4:57:24 UTC

I'm still seeing 3-4GB each for the Python tasks also on my Linux Ubuntu hosts.
ID: 59877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59878 - Posted: 5 Feb 2023, 1:15:26 UTC - in response to Message 59876.  

Maybe this is some change affecting windows only.


No guys, my Windows hosts are using the same ~4GBs graphics mem on the latest released WUs. Earlier I've noticed some "exp" tasks have used over 6GB, so there must be a variance among tasks.

I wonder if he saw some ACEMD tasks go through and mistook them for PythonGPUs maybe. Running a PythonGPU on 2GB seems almost impossible to me.
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59880 - Posted: 5 Feb 2023, 15:34:01 UTC - in response to Message 59873.  

you should check at which stage of running the tasks are on for a more insightful picture on what's happening.

on the Quadro P5000, the status of this moment is as follows:

task 1: 82% - 19:58 hrs
task 2: 31% - 7:09 hrs
task 3: 14% - 2:43 hrs
task 4: 22% - 4:36 hrs

VRAM use: 9.834 MB- and this even includes a few hundred MB for the monitor.


The 4 Pythons which have been running for several hours ea. right now are using even less VRAM then the ones reported above from 2 days ago - total VRAM use is 8.840 MB.

So there seems to be quite some variance between these Pythons.
ID: 59880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59881 - Posted: 5 Feb 2023, 20:48:45 UTC - in response to Message 59880.  

I don't believe your numbers. Whatever utility you are using in Windows is not reporting correctly or more likely you are interpreting what it displays or looking at the wrong numbers.

I will believe what nvidia-smi.exe shows.
ID: 59881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59882 - Posted: 5 Feb 2023, 20:56:29 UTC - in response to Message 59881.  
Last modified: 5 Feb 2023, 20:57:14 UTC

I don't believe your numbers. Whatever utility you are using in Windows is not reporting correctly or more likely you are interpreting what it displays or looking at the wrong numbers.

I will believe what nvidia-smi.exe shows.

The utility I use is GPU-Z. So, maybe it indeed shows wrong figures, I cannot tell for sure, of course.

As already said in another thread about a week ago: nvidia-smi unfortunately does not function here, no idea why. There is a problem "access denied"
ID: 59882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59884 - Posted: 5 Feb 2023, 21:41:17 UTC - in response to Message 59882.  
Last modified: 5 Feb 2023, 21:47:29 UTC

There must be some way to run the command in a Windows terminal with elevated rights.

The application is a user level application that Nvidia provides in all distributions.

https://www.minitool.com/news/elevated-command-prompt.html
ID: 59884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : Python apps for GPU hosts errors

©2025 Universitat Pompeu Fabra