PYSCFbeta: Quantum chemistry calculations on GPU

Author	Message
Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61098 - Posted: 26 Jan 2024, 16:00:09 UTC - in response to Message 61097. The work-units require a lot of GPU memory. How much is "a lot" exactly? I have a pacal card, so it meets the compute capability requirement. But it has only 2gb of VRAM. But without knowing the amount of VRAM required, I am not sure if it will work. The highest being used today on my Pascal cards is 795 MB. Might want to watch that on a longer time scale, the VRAM use is not static, it fluctuates up and down ID: 61098 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 61099 - Posted: 26 Jan 2024, 16:18:23 UTC Last modified: 26 Jan 2024, 16:32:42 UTC Retraction: I'm monitoring with the BoincTasks Js 2.4.2.2 and it has bugs. I loaded NVITOP and it does use 2 GB VRAM with 100% GPU utilization. BTW, if anyone wants to try NVITOP here's my notes to install for Ubuntu 22.04: sudo apt update sudo apt upgrade -y sudo apt install python3-pip -y python3 -m pip install --user pipx python3 -m pip install --user --upgrade pipx python3 -m pipx ensurepath # if requested: sudo apt install python3.8-venv -y For LM 21.3: sudo apt install python3.10-venv -y Open a new terminal: pip3 install --upgrade nvitop pipx run nvitop --colorful -m full ID: 61099 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61100 - Posted: 26 Jan 2024, 16:26:34 UTC - in response to Message 61099. I'm not seeing any different behavior on my titan Vs. the VRAM use still exceeds 3GB at times. but it's spikey. you have to watch it for a few mins. instantaneous measurements might not catch it. ID: 61100 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61101 - Posted: 26 Jan 2024, 17:04:26 UTC - in response to Message 61100. I am seeing spikes to ~7.6 GB with these. Not long lasting (in the context of the whole work unit) but consistently elevated during that part of the work unit. I want to say that I saw that spike at about 5% complete and then at 95% complete, but that also could be somewhat coincidental versus factual. ID: 61101 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61102 - Posted: 26 Jan 2024, 17:11:14 UTC - in response to Message 61101. Last modified: 26 Jan 2024, 17:14:12 UTC I am seeing spikes to ~7.6 GB with these. Not long lasting (in the context of the whole work unit) but consistently elevated during that part of the work unit. I want to say that I saw that spike at about 5% complete and then at 95% complete, but that also could be somewhat coincidental versus factual. to add on to this, for everyone's info. these tasks (and a lot of CUDA applications in general) do not require any set absolute value of VRAM. VRAM will scale to the GPU individually. generally, the more SMs you have, to more VRAM will be used. it's not linear, but there is some portion of the allocated VRAM that scales directly with how many SMs are being used. to put it simply, different GPUs with different core counts, will have different amounts of VRAM utilization. so even if one powerful GPU like an RTX 4090 with 100+ SMs on the die might need 7+GB, doesn't mean that something much smaller like a GTX 1070 needs that much. it needs to be evaluated on a case by case basis. ID: 61102 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61103 - Posted: 26 Jan 2024, 17:20:59 UTC - in response to Message 61102. I am seeing spikes to ~7.6 GB with these. Not long lasting (in the context of the whole work unit) but consistently elevated during that part of the work unit. I want to say that I saw that spike at about 5% complete and then at 95% complete, but that also could be somewhat coincidental versus factual. to add on to this, for everyone's info. these tasks (and a lot of CUDA applications in general) do not require any set absolute value of VRAM. VRAM will scale to the GPU individually. generally, the more SMs you have, to more VRAM will be used. it's not linear, but there is some portion of the allocated VRAM that scales directly with how many SMs are being used. to put it simply, different GPUs with different core counts, will have different amounts of VRAM utilization. so even if one powerful GPU like an RTX 4090 with 100+ SMs on the die might need 7+GB, doesn't mean that something much smaller like a GTX 1070 needs that much. it needs to be evaluated on a case by case basis. Thanks for this! I did not know about the scaling and I don't think this is something I ever thought about (the correlation between SMs and VRAM usage). ID: 61103 · Rating: 0 · rate: / Reply Quote

bibi Send message Joined: 4 May 17 Posts: 15 Credit: 17,494,375,743 RAC: 289,230 Level Scientific publications	Message 61108 - Posted: 29 Jan 2024, 13:55:27 UTC Why do I allways get segmentation fault on Windows/wsl2/Ubuntu 22.04.3 LTS 12 processors, 28 GB memory, 16GB swap, GPU RTX 4070 Ti Super with 16 GB, driver version 551.23 https://www.gpugrid.net/result.php?resultid=33759912 https://www.gpugrid.net/result.php?resultid=33758940 https://www.gpugrid.net/result.php?resultid=33759139 https://www.gpugrid.net/result.php?resultid=33759328 ID: 61108 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61109 - Posted: 29 Jan 2024, 14:01:46 UTC - in response to Message 61108. Why do I allways get segmentation fault on Windows/wsl2/Ubuntu 22.04.3 LTS 12 processors, 28 GB memory, 16GB swap, GPU RTX 4070 Ti Super with 16 GB, driver version 551.23 https://www.gpugrid.net/result.php?resultid=33759912 https://www.gpugrid.net/result.php?resultid=33758940 https://www.gpugrid.net/result.php?resultid=33759139 https://www.gpugrid.net/result.php?resultid=33759328 something wrong with your environment or drivers likely. try running a native Linux OS install, WSL might not be well supported ID: 61109 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61117 - Posted: 30 Jan 2024, 12:54:49 UTC - in response to Message 61109. Last modified: 30 Jan 2024, 13:15:58 UTC Steve, these TEST units you have out right now. they seem to be using a ton of reserved memory. one process right now is using 30+GB. that seems much higher than usual. and i even have another one reserving 64GB of memory. that's way too high. ID: 61117 · Rating: 0 · rate: / Reply Quote

Freewill Send message Joined: 18 Mar 10 Posts: 28 Credit: 42,166,087,419 RAC: 76,788 Level Scientific publications	Message 61118 - Posted: 30 Jan 2024, 13:17:30 UTC Last modified: 30 Jan 2024, 13:19:20 UTC Here's one that died on my Ubuntu system which has 32 GB RAM: https://www.gpugrid.net/result.php?resultid=33764282 ID: 61118 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61119 - Posted: 30 Jan 2024, 14:33:26 UTC - in response to Message 61117. Last modified: 30 Jan 2024, 15:13:20 UTC i see v3 being deployed now the memory limiting you're trying isn't working. I'm seeing it spike to near 100% i see you put export CUPY_GPU_MEMORY_LIMIT=50% a quick google seems to indicate that you need to put the percentage in quotes. like this - export CUPY_GPU_MEMORY_LIMIT="50%". or additionally you can set a discrete memory amount as the limit. for example, export CUPY_GPU_MEMORY_LIMIT="1073741824" to limit to 1GB. and the system memory use is still a little high, around 10GB each. EDIT - system memory use still climbed to ~30GB by the end ID: 61119 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61120 - Posted: 30 Jan 2024, 16:01:04 UTC - in response to Message 61119. Last modified: 30 Jan 2024, 16:01:30 UTC v4 report. i see you attempted to add some additional VRAM limiting. but the task is still trying to allocate more VRAM, and instead of using more VRAM, the process gets killed for trying to allocate more than the limit. https://gpugrid.net/result.php?resultid=33764464 https://gpugrid.net/result.php?resultid=33764469 ID: 61120 · Rating: 0 · rate: / Reply Quote

Steve Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 21 Dec 23 Posts: 51 Credit: 0 RAC: 0 Level Scientific publications	Message 61121 - Posted: 30 Jan 2024, 16:11:32 UTC Yes I was doing some testing to see how large molecules we can compute properties for. The previous batches have been for small molecules which all work very well. The memory use scales very quickly with increased molecule size. This test today had molecules 3 to 4 times the size of the previous batches. As you can see I have not solved the memory limiting issue it. It should be possible to limit instantaneous GPU memory use (at the cost of runtime performance and increased CPU memory use). But due to the different levels of CUDA libraries in play in this code it is rather complicated. I will work on this locally for now and resume sending out the batches that were working well tomorrow! Thank you for the assistance and compute availability, it is much appreciated! ID: 61121 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61122 - Posted: 30 Jan 2024, 16:13:47 UTC - in response to Message 61121. no problem! glad to see you were monitoring my feedback and making changes. looking forward to another stable batch tomorrow :) should be similar to previous runs like yesterday right? ID: 61122 · Rating: 0 · rate: / Reply Quote

Steve Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 21 Dec 23 Posts: 51 Credit: 0 RAC: 0 Level Scientific publications	Message 61123 - Posted: 30 Jan 2024, 16:18:55 UTC - in response to Message 61122. Yes It will be same as yesterday but roughly 10x the work units released. Each workunit contains 100 small molecules. ID: 61123 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61124 - Posted: 30 Jan 2024, 16:19:50 UTC - in response to Message 61123. looking forward to it :) ID: 61124 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 61126 - Posted: 31 Jan 2024, 12:38:25 UTC I have Task 33765246 running on a RTX 3060 Ti under Linux Mint 21.3 It's running incredibly slowly, and with zero GPU usage. I've found this in stderr.txt: + python compute_dft.py /hdd/boinc-client/slots/5/lib/python3.11/site-packages/pyscf/dft/libxc.py:771: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, ' /hdd/boinc-client/slots/5/lib/python3.11/site-packages/gpu4pyscf/lib/cutensor.py:174: UserWarning: using cupy as the tensor contraction engine. warnings.warn(f'using {contract_engine} as the tensor contraction engine.') /hdd/boinc-client/slots/5/lib/python3.11/site-packages/pyscf/gto/mole.py:1280: UserWarning: Function mol.dumps drops attribute charge because it is not JSON-serializable warnings.warn(msg) Exception: Fallback to CPU Exception: Fallback to CPU ID: 61126 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61127 - Posted: 31 Jan 2024, 12:40:53 UTC - in response to Message 61124. Last modified: 31 Jan 2024, 13:08:03 UTC Steve, this new batch, right off the bat, is loading up the GPU VRAM nearly full again. edit, that's for a v1 tasks, will check out the v2s ID: 61127 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61128 - Posted: 31 Jan 2024, 13:12:40 UTC - in response to Message 61127. OK. looks like the v2 tasks are back to normal. it was only that v1 task that was using lots of vram ID: 61128 · Rating: 0 · rate: / Reply Quote

Steve Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 21 Dec 23 Posts: 51 Credit: 0 RAC: 0 Level Scientific publications	Message 61129 - Posted: 31 Jan 2024, 13:19:52 UTC - in response to Message 61127. Ok my previous post was incorrect. It turns out the previous large batch was not a respresentative test set. It only contained very small molecules. This is why the GPU RAM usage was low. As per my previous post these task use a lot of GPU memory. You can see more detail in this post: http://gpugrid.org/forum_thread.php?id=5428&nowrap=true#60945 The work units are now just 10 molecules. They vary in size from 10 to 20 atoms per molecule. All molecules in a WU are the same size. Tests WU's (smallest and largest sized molecules) pass on my GTX1080 (8GB) test machine without failing. The CPU fallback part was left over from testing this should have been removed but appears it was not. ID: 61129 · Rating: 0 · rate: / Reply Quote