Message boards :
News :
PYSCFbeta: Quantum chemistry calculations on GPU
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next
| Author | Message |
|---|---|
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've disabled getting new GPUGrid tasks GPUGrid on my host with "small" amount (below 24GB) of GPU memory. This gigantic memory requirement is ridiculous in my opinion. This is not a user error, if the workunits can't be changed, then the project should not send these tasks to hosts that have less than ~20GB of GPU memory. There could be another solution, if the workunit would allocate memory in a less careless way. I've started a task on my RTX 4090 (it has 24GiB RAM), and I've monitored the memory usage: idle: 305 MiB task starting: 895 MiB GPU usage rises: 6115 MiB GPU usage drops: 7105 MiB GPU usage 100%: 7205 MiB GPU usage drops: 8495 MiB GPU usage rises: 9961 MiB GPU usage drops: 14327 MiB (it would have failed on my GTX 1080 Ti at this point) GPU usage rises: 6323 MiB GPU usage drops: 15945 MiB GPU usage 100%: 6205 MiB ...and so onSo the memory usage doubles at some points of processing for a short while, and this cause the workunits to fail on GPUs that have "small" amount of memory. If this behaviour could be eliminated, much more hosts could process these workunits. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Nothing to do at this time for my currently working GPUs with PYSCFbeta tasks. 5 GTX 1650 4GB, 1 GTX 1650 SUPER 4GB, 1 GTX 1660 Ti 6GB. 100% errors with current PYSCFbeta tasks, now I can realize why... I've disabled Quantum chemistry on GPU (beta) at my project preferences in the wait for a correction, if any. Conversely, they are performing right with ATMbeta tasks. |
|
Send message Joined: 18 Mar 10 Posts: 28 Credit: 41,810,583,419 RAC: 13,276 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I agree it does seem these tasks have a spike in memory usage. I "rented" an RTX A5000 GPU which also has 24 GB memory, and running 1 task at a time, at least the first task completed: https://www.gpugrid.net/workunit.php?wuid=27678500 I will try a few more |
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,491,875 RAC: 3,176 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Exactly the same here. After 29 consecutive errors on a RTX4070Ti, I have disabled 'Quantum chemistry on GPU (beta)'. |
|
Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,248,809,169 RAC: 0 Level ![]() Scientific publications ![]()
|
I have one machine still taking on GPUGrid tasks. The others are using their GPUs for the Tour de Primes over at PrimeGrid only. If there really is a driver issue (see earlier post and answers) with this machine I'd like to know which, as its GPU is running fine on other BOINC projects apart from SRBase. Not being able to run SRBase is related to libc, not the GPU driver. - - - - - - - - - - Greetings, Jens |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
bonjour existe t'il un moyen de simuler de la vram pour gpu en utilisant la ram ou un ssd sous linux. cela éviterait les erreurs de calcul. J'ai augmenter le swap file a 50 gigas comme sous windows mais cela ne fonctionne pas. Merci hello Is there a way to simulate vram for GPU using RAM or SSD under linux. this would avoid miscalculation. I increased the swap file to 50 gigas as under windows but it does not work. Thanks |
|
Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level ![]() Scientific publications
|
Boca, This was wild... For a single work unit: Hovers around 3-4GB Rises to 8-9GB Spikes to ~11GB regularly. Highest Spike (seen): 12.5GB Highest Spike (estimated based on psensor): ~20GB. Additionally, Psensor caught a highest memory usage spike of 76% of the 48GB of the RTX A6000 for one work unit but I did not see when this happened or if it happened at all. I graphically captured the VRAM memory usage for one work unit. I have no idea how to imbed images here. So, here is a Google Doc: https://docs.google.com/document/d/1xpOpNJ93finciJQW7U07dMHOycSVlbYq9G6h0Xg7GtA/edit?usp=sharing EDIT: I think they just purged these work units from the server? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
thanks. that's kind of what I expected was happening. and yeah, they must have seen the problems and just abandoned the remainder of this run to reassess how to tweak them. it seemed like they sweaked the input files to give the assertion error instead of just hanging like the earlier (index numbers below ~1000). the early tasks would hang with the fallback to CPU issue, and after that it changed to the assertion error if it ran out of vram. that was better behavior for the user since a quick failure is better than hanging for hours on end doing nothing. but they were probably getting back a majority of errors as the VRAM requirements grew beyond what most people have for available hardware.
|
|
Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level ![]() Scientific publications
|
New batch just come through- seeing the same VRAM spikes and patterns. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I'm seeing the same spikes, but so far so good. biggest spike i saw was ~9GB no errors ...yet. spoke too soon. did get one failure https://gpugrid.net/result.php?resultid=33801391
|
|
Send message Joined: 21 Dec 23 Posts: 51 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Hi. I have been tweaking settings. All WUs I have tried now work on my 1080(8GB). Sending a new batch of smaller WUs out now. From our end we will need to see how to assign WU's based on GPU memory. (Previous apps have been compute bound rather than GPU memory bound and have only been assigned based on driver version) |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
seeing some errors on Titan V (12GB). not a huge amount. but certainly a noteworthy amount. maybe you can correlate these specific WUs and see why these kind (number of atoms or molecules?) might be requesting more VRAM than the ones you tried on your 1080. most of the ones i've observed running will hover around ~3-4GB constant VRAM use, with spikes to the 8-11GB range. https://gpugrid.net/result.php?resultid=33802055 https://gpugrid.net/result.php?resultid=33801492 https://gpugrid.net/result.php?resultid=33801447 https://gpugrid.net/result.php?resultid=33801391 https://gpugrid.net/result.php?resultid=33801238
|
|
Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Still seeing a vram spike above 8GB 2024/02/02 08:07:08.774, 71, 100 %, 40 %, 8997 MiB 2024/02/02 08:07:09.774, 71, 100 %, 34 %, 8999 MiB 2024/02/02 08:07:10.775, 71, 22 %, 1 %, 8989 MiB 2024/02/02 08:07:11.775, 70, 96 %, 2 %, 10209 MiB 2024/02/02 08:07:12.775, 71, 98 %, 7 %, 10721 MiB 2024/02/02 08:07:13.775, 71, 93 %, 8 %, 5023 MiB 2024/02/02 08:07:14.775, 72, 96 %, 24 %, 5019 MiB 2024/02/02 08:07:15.776, 72, 100 %, 0 %, 5019 MiB 2024/02/02 08:07:16.776, 72, 100 %, 0 %, 5019 MiB Seems like credit has gone down from 150K to 15K. |
|
Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level ![]() Scientific publications
|
Agreed- it seems that there are fewer spikes and most of them are in the 8-9GB range. A few higher but it seems less frequent? Difficult to quantify an actual difference since the work units can be so different. Is there a difference in VRAM usage or does the actual work unit just happen to need less VRAM? |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
Seems like credit has gone down from 150K to 15K. |
|
Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Occasionally 8G of vram card is not sufficient. Still seeing error on these cards. Example: two of the hosts below have 8G vram while the other one returned successfully has 16G. http://gpugrid.net/workunit.php?wuid=27683202 |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
Even that 16GB GPU had one failure with the new v3 batch http://gpugrid.net/result.php?resultid=33802340
|
|
Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level ![]() Scientific publications
|
Even that 16GB GPU had one failure with the new v3 batch Based on the times of tasks, it looks like those were running at 1x? |
|
Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,550,803,412 RAC: 248 Level ![]() Scientific publications
|
bonsoir chez moi ça marche bien maintenant. je viens de finir 5 unités de calcul sans probleme avec ma gtx 1650 et ma rtx 4060. espérons que cela continue. j'ai reformaté mon pc aujourd'hui et j'ai réinstallé linux mint 21,3,une fois de plus. Good evening at my place it works well now. I just finished 5 computing units without problems with my gtx 1650 and my rtx 4060. let’s hope this continues. I reformatted my pc today and reinstalled linux mint 21,3,once again. https://www.gpugrid.net/results.php?userid=563937 |
|
Send message Joined: 11 May 10 Posts: 68 Credit: 12,293,491,875 RAC: 3,176 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
14 tasks of the latest batch completed successfully without any error. Great progress! Seems like credit has gone down from 150K to 15K. Perhaps 150k was a little too generous. But 15k is not on par with other GPU projects. I expect there will be fairer credits again soon - with the next batch? |
©2025 Universitat Pompeu Fabra