PYSCFbeta: Quantum chemistry calculations on GPU

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next

AuthorMessage
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61154 - Posted: 2 Feb 2024, 10:54:03 UTC
Last modified: 2 Feb 2024, 11:32:19 UTC

I've disabled getting new GPUGrid tasks GPUGrid on my host with "small" amount (below 24GB) of GPU memory.
This gigantic memory requirement is ridiculous in my opinion.
This is not a user error, if the workunits can't be changed, then the project should not send these tasks to hosts that have less than ~20GB of GPU memory.
There could be another solution, if the workunit would allocate memory in a less careless way.
I've started a task on my RTX 4090 (it has 24GiB RAM), and I've monitored the memory usage:
           idle:   305 MiB
  task starting:   895 MiB
GPU usage rises:  6115 MiB
GPU usage drops:  7105 MiB
 GPU usage 100%:  7205 MiB
GPU usage drops:  8495 MiB
GPU usage rises:  9961 MiB
GPU usage drops: 14327 MiB (it would have failed on my GTX 1080 Ti at this point)
GPU usage rises:  6323 MiB
GPU usage drops: 15945 MiB
 GPU usage 100%:  6205 MiB
...and so on
So the memory usage doubles at some points of processing for a short while, and this cause the workunits to fail on GPUs that have "small" amount of memory. If this behaviour could be eliminated, much more hosts could process these workunits.
ID: 61154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61155 - Posted: 2 Feb 2024, 11:59:29 UTC - in response to Message 61154.  

Nothing to do at this time for my currently working GPUs with PYSCFbeta tasks.
5 GTX 1650 4GB, 1 GTX 1650 SUPER 4GB, 1 GTX 1660 Ti 6GB.
100% errors with current PYSCFbeta tasks, now I can realize why...
I've disabled Quantum chemistry on GPU (beta) at my project preferences in the wait for a correction, if any.
Conversely, they are performing right with ATMbeta tasks.
ID: 61155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Freewill

Send message
Joined: 18 Mar 10
Posts: 28
Credit: 41,810,583,419
RAC: 13,276
Level
Trp
Scientific publications
watwatwatwatwat
Message 61156 - Posted: 2 Feb 2024, 12:09:55 UTC

I agree it does seem these tasks have a spike in memory usage. I "rented" an RTX A5000 GPU which also has 24 GB memory, and running 1 task at a time, at least the first task completed:
https://www.gpugrid.net/workunit.php?wuid=27678500
I will try a few more
ID: 61156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 68
Credit: 12,293,491,875
RAC: 3,176
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61157 - Posted: 2 Feb 2024, 12:16:07 UTC - in response to Message 61155.  
Last modified: 2 Feb 2024, 12:17:30 UTC


I've disabled Quantum chemistry on GPU (beta) at my project preferences in the wait for a correction, if any.
Conversely, they are performing right with ATMbeta tasks.

Exactly the same here. After 29 consecutive errors on a RTX4070Ti, I have disabled 'Quantum chemistry on GPU (beta)'.
ID: 61157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8
Avatar

Send message
Joined: 3 Jul 16
Posts: 31
Credit: 2,248,809,169
RAC: 0
Level
Phe
Scientific publications
watwat
Message 61158 - Posted: 2 Feb 2024, 12:25:43 UTC

I have one machine still taking on GPUGrid tasks.
The others are using their GPUs for the Tour de Primes over at PrimeGrid only.
If there really is a driver issue (see earlier post and answers) with this machine I'd like to know which, as its GPU is running fine on other BOINC projects apart from SRBase. Not being able to run SRBase is related to libc, not the GPU driver.
- - - - - - - - - -
Greetings, Jens
ID: 61158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pascal

Send message
Joined: 15 Jul 20
Posts: 95
Credit: 2,550,803,412
RAC: 248
Level
Phe
Scientific publications
wat
Message 61159 - Posted: 2 Feb 2024, 12:37:34 UTC
Last modified: 2 Feb 2024, 12:38:16 UTC

bonjour
existe t'il un moyen de simuler de la vram pour gpu en utilisant la ram ou un ssd sous linux.
cela éviterait les erreurs de calcul.
J'ai augmenter le swap file a 50 gigas comme sous windows mais cela ne fonctionne pas.
Merci

hello
Is there a way to simulate vram for GPU using RAM or SSD under linux.
this would avoid miscalculation.
I increased the swap file to 50 gigas as under windows but it does not work.
Thanks
ID: 61159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61160 - Posted: 2 Feb 2024, 13:36:47 UTC - in response to Message 61151.  
Last modified: 2 Feb 2024, 13:42:31 UTC

Boca,

How much VRAM do you see actually being used on some of these tasks? Mind watching a few? You’ll have to run a watch command to see continuous output of VRAM utilization since the usage isn’t constant. It spikes up and down. I’m just curious how much is actually needed. Most of the tasks I was running I would see spike up to about 8GB. But i assume the tasks that needed more just failed instead so I can’t know how much they are trying to use. Even though these Titan Vs are great DP performers they only have 12GB VRAM. Even most of the 16GB cards like V100 and P100 are seeing very high error rates.

MPS helps. But not enough with this current batch. I was getting good throughput with running 3x tasks at once on the batches last week.


This was wild...

For a single work unit:

Hovers around 3-4GB
Rises to 8-9GB
Spikes to ~11GB regularly.

Highest Spike (seen): 12.5GB
Highest Spike (estimated based on psensor): ~20GB. Additionally, Psensor caught a highest memory usage spike of 76% of the 48GB of the RTX A6000 for one work unit but I did not see when this happened or if it happened at all.

I graphically captured the VRAM memory usage for one work unit. I have no idea how to imbed images here. So, here is a Google Doc:

https://docs.google.com/document/d/1xpOpNJ93finciJQW7U07dMHOycSVlbYq9G6h0Xg7GtA/edit?usp=sharing

EDIT: I think they just purged these work units from the server?
ID: 61160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61161 - Posted: 2 Feb 2024, 14:02:10 UTC - in response to Message 61160.  
Last modified: 2 Feb 2024, 14:06:34 UTC

thanks. that's kind of what I expected was happening.

and yeah, they must have seen the problems and just abandoned the remainder of this run to reassess how to tweak them.

it seemed like they sweaked the input files to give the assertion error instead of just hanging like the earlier (index numbers below ~1000). the early tasks would hang with the fallback to CPU issue, and after that it changed to the assertion error if it ran out of vram. that was better behavior for the user since a quick failure is better than hanging for hours on end doing nothing. but they were probably getting back a majority of errors as the VRAM requirements grew beyond what most people have for available hardware.
ID: 61161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61162 - Posted: 2 Feb 2024, 15:30:46 UTC

New batch just come through- seeing the same VRAM spikes and patterns.
ID: 61162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61163 - Posted: 2 Feb 2024, 15:32:14 UTC - in response to Message 61162.  
Last modified: 2 Feb 2024, 15:39:40 UTC

I'm seeing the same spikes, but so far so good. biggest spike i saw was ~9GB

no errors ...yet.

spoke too soon. did get one failure

https://gpugrid.net/result.php?resultid=33801391
ID: 61163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61164 - Posted: 2 Feb 2024, 15:37:15 UTC - in response to Message 61163.  

Hi. I have been tweaking settings. All WUs I have tried now work on my 1080(8GB).


Sending a new batch of smaller WUs out now. From our end we will need to see how to assign WU's based on GPU memory. (Previous apps have been compute bound rather than GPU memory bound and have only been assigned based on driver version)
ID: 61164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61165 - Posted: 2 Feb 2024, 16:07:04 UTC - in response to Message 61164.  

seeing some errors on Titan V (12GB). not a huge amount. but certainly a noteworthy amount. maybe you can correlate these specific WUs and see why these kind (number of atoms or molecules?) might be requesting more VRAM than the ones you tried on your 1080.

most of the ones i've observed running will hover around ~3-4GB constant VRAM use, with spikes to the 8-11GB range.

https://gpugrid.net/result.php?resultid=33802055
https://gpugrid.net/result.php?resultid=33801492
https://gpugrid.net/result.php?resultid=33801447
https://gpugrid.net/result.php?resultid=33801391
https://gpugrid.net/result.php?resultid=33801238
ID: 61165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pututu

Send message
Joined: 8 Oct 16
Posts: 27
Credit: 4,153,801,869
RAC: 0
Level
Arg
Scientific publications
watwatwatwat
Message 61166 - Posted: 2 Feb 2024, 16:08:36 UTC

Still seeing a vram spike above 8GB

2024/02/02 08:07:08.774, 71, 100 %, 40 %, 8997 MiB
2024/02/02 08:07:09.774, 71, 100 %, 34 %, 8999 MiB
2024/02/02 08:07:10.775, 71, 22 %, 1 %, 8989 MiB
2024/02/02 08:07:11.775, 70, 96 %, 2 %, 10209 MiB
2024/02/02 08:07:12.775, 71, 98 %, 7 %, 10721 MiB
2024/02/02 08:07:13.775, 71, 93 %, 8 %, 5023 MiB
2024/02/02 08:07:14.775, 72, 96 %, 24 %, 5019 MiB
2024/02/02 08:07:15.776, 72, 100 %, 0 %, 5019 MiB
2024/02/02 08:07:16.776, 72, 100 %, 0 %, 5019 MiB

Seems like credit has gone down from 150K to 15K.
ID: 61166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61167 - Posted: 2 Feb 2024, 16:20:20 UTC - in response to Message 61166.  

Agreed- it seems that there are fewer spikes and most of them are in the 8-9GB range. A few higher but it seems less frequent? Difficult to quantify an actual difference since the work units can be so different. Is there a difference in VRAM usage or does the actual work unit just happen to need less VRAM?
ID: 61167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pascal

Send message
Joined: 15 Jul 20
Posts: 95
Credit: 2,550,803,412
RAC: 248
Level
Phe
Scientific publications
wat
Message 61168 - Posted: 2 Feb 2024, 16:40:21 UTC

Seems like credit has gone down from 150K to 15K.
ID: 61168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pututu

Send message
Joined: 8 Oct 16
Posts: 27
Credit: 4,153,801,869
RAC: 0
Level
Arg
Scientific publications
watwatwatwat
Message 61169 - Posted: 2 Feb 2024, 17:33:47 UTC
Last modified: 2 Feb 2024, 17:34:29 UTC

Occasionally 8G of vram card is not sufficient. Still seeing error on these cards.

Example: two of the hosts below have 8G vram while the other one returned successfully has 16G.
http://gpugrid.net/workunit.php?wuid=27683202
ID: 61169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 61171 - Posted: 2 Feb 2024, 17:55:00 UTC - in response to Message 61169.  

Even that 16GB GPU had one failure with the new v3 batch

http://gpugrid.net/result.php?resultid=33802340
ID: 61171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61172 - Posted: 2 Feb 2024, 18:47:46 UTC - in response to Message 61171.  

Even that 16GB GPU had one failure with the new v3 batch

http://gpugrid.net/result.php?resultid=33802340



Based on the times of tasks, it looks like those were running at 1x?

ID: 61172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pascal

Send message
Joined: 15 Jul 20
Posts: 95
Credit: 2,550,803,412
RAC: 248
Level
Phe
Scientific publications
wat
Message 61173 - Posted: 2 Feb 2024, 18:52:03 UTC
Last modified: 2 Feb 2024, 18:55:03 UTC

bonsoir chez moi ça marche bien maintenant.
je viens de finir 5 unités de calcul sans probleme avec ma gtx 1650 et ma rtx 4060.
espérons que cela continue.
j'ai reformaté mon pc aujourd'hui et j'ai réinstallé linux mint 21,3,une fois de plus.


Good evening at my place it works well now.
I just finished 5 computing units without problems with my gtx 1650 and my rtx 4060.
let’s hope this continues.
I reformatted my pc today and reinstalled linux mint 21,3,once again.

https://www.gpugrid.net/results.php?userid=563937
ID: 61173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 68
Credit: 12,293,491,875
RAC: 3,176
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61174 - Posted: 2 Feb 2024, 19:00:05 UTC - in response to Message 61168.  

14 tasks of the latest batch completed successfully without any error.
Great progress!

Seems like credit has gone down from 150K to 15K.

Perhaps 150k was a little too generous. But 15k is not on par with other GPU projects. I expect there will be fairer credits again soon - with the next batch?
ID: 61174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU

©2025 Universitat Pompeu Fabra