PYSCFbeta: Quantum chemistry calculations on GPU

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 14 · Next

AuthorMessage
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61175 - Posted: 2 Feb 2024, 19:04:15 UTC - in response to Message 61174.  

14 tasks of the latest batch completed successfully without any error.
Great progress!

Seems like credit has gone down from 150K to 15K.

Perhaps 150k was a little too generous. But 15k is not on par with other GPU projects. I expect there will be fairer credits again soon - with the next batch?


Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x.
ID: 61175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 61176 - Posted: 2 Feb 2024, 19:13:14 UTC - in response to Message 61175.  

sometimes more than 12GB as about 4% (16 out of 372) of my tasks failed all on GPUs with 12GB, all running at 1x only for the v3 batch. not sure how much VRAM is needed to be 100% successful. I did have one success that was a resend of one of your errors from a 4090 24GB. so i'm guessing you were running that one at 2x and got unlucky with two big tasks at the same time.
ID: 61176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61177 - Posted: 2 Feb 2024, 19:30:11 UTC - in response to Message 61176.  

sometimes more than 12GB as about 4% (16 out of 372) of my tasks failed all on GPUs with 12GB, all running at 1x only for the v3 batch. not sure how much VRAM is needed to be 100% successful. I did have one success that was a resend of one of your errors from a 4090 24GB. so i'm guessing you were running that one at 2x and got unlucky with two big tasks at the same time.


Correct- I was playing around with the two 4090 systems running these to make some comparisons. And you are also correct- it seems that even with 24GB, running 2x is still not really ideal. Those random, huge spikes seem to find each other when running 2x.
ID: 61177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 68
Credit: 12,293,491,875
RAC: 2,606
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61178 - Posted: 2 Feb 2024, 19:40:54 UTC - in response to Message 61175.  

Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x.

The GPU is an MSI 4070 Ti GAMING X SLIM with 12GB GDDR6X, run at 1x. Obviously sufficient for the latest batch to run flawlessly.
ID: 61178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pututu

Send message
Joined: 8 Oct 16
Posts: 27
Credit: 4,153,801,869
RAC: 0
Level
Arg
Scientific publications
watwatwatwat
Message 61179 - Posted: 2 Feb 2024, 19:43:15 UTC - in response to Message 61174.  

14 tasks of the latest batch completed successfully without any error.
Great progress!

Seems like credit has gone down from 150K to 15K.

Perhaps 150k was a little too generous. But 15k is not on par with other GPU projects. I expect there will be fairer credits again soon - with the next batch?


Assuming someone with a 3080Ti card, it will be better to run ATMbeta task first and then Quantum chemistry (if former has no available tasks) if credits granted is an important factor for some crunchers.

For me, I've 3080Ti and P100, so I will likely run ATMbeta on 3080Ti and Quantum chemistry on P100, if both tasks are available.
ID: 61179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Boca Raton Community HS

Send message
Joined: 27 Aug 21
Posts: 38
Credit: 7,254,068,306
RAC: 0
Level
Tyr
Scientific publications
wat
Message 61180 - Posted: 2 Feb 2024, 19:49:01 UTC - in response to Message 61178.  

Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x.

The GPU is an MSI 4070 Ti GAMING X SLIM with 12GB GDDR6X, run at 1x. Obviously sufficient for the latest batch to run flawlessly.



Thanks for the info. If you don't mind me asking- how many ran (in a row) without any errors?
ID: 61180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CallMeFoxie

Send message
Joined: 6 Jan 21
Posts: 2
Credit: 56,925,024
RAC: 0
Level
Thr
Scientific publications
wat
Message 61181 - Posted: 2 Feb 2024, 22:28:30 UTC

I have got a rig with 9 pieces of P106, which are slightly modified GTX1060 6GB used for Ethereum mining back in the day. I can run only two GPUgrid tasks at once (main CPU is only a dual core Celeron) but so far I have had one error and several tasks finish and validate. Hoping for good results for the rest!
ID: 61181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 68
Credit: 12,293,491,875
RAC: 2,606
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61182 - Posted: 3 Feb 2024, 0:00:22 UTC - in response to Message 61180.  
Last modified: 3 Feb 2024, 0:02:16 UTC

Are you running them at 1x and with how much VRAM? Trying to get a feel for what the actual "cutoff" is for these tasks right now. I am still feeling 24GB VRAM is needed for the success running 1x and double that for 2x.

The GPU is an MSI 4070 Ti GAMING X SLIM with 12GB GDDR6X, run at 1x. Obviously sufficient for the latest batch to run flawlessly.



Thanks for the info. If you don't mind me asking- how many ran (in a row) without any errors?

14 consecutive tasks without any error.
ID: 61182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CallMeFoxie

Send message
Joined: 6 Jan 21
Posts: 2
Credit: 56,925,024
RAC: 0
Level
Thr
Scientific publications
wat
Message 61183 - Posted: 3 Feb 2024, 11:04:43 UTC - in response to Message 61181.  

I have got a rig with 9 pieces of P106, which are slightly modified GTX1060 6GB used for Ethereum mining back in the day. I can run only two GPUgrid tasks at once (main CPU is only a dual core Celeron) but so far I have had one error and several tasks finish and validate. Hoping for good results for the rest!


So managed to get 11 tasks, from which 9 passed and validated and 2 failed some time into the process.
ID: 61183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,187
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61184 - Posted: 3 Feb 2024, 13:59:03 UTC - in response to Message 61164.  

...From our end we will need to see how to assign WU's based on GPU memory. (Previous apps have been compute bound rather than GPU memory bound and have only been assigned based on driver version)

Probably (I don't know if it is viable), a better solution would be to include some portion in the code to limit peak VRAM according to the true device assigned.
The reason, based in an example:
My Host #482132 is shown by BOINC as [2] NVIDIA NVIDIA GeForce GTX 1660 Ti (5928MB) driver: 550.40
This is true for Device 0, NVIDIA NVIDIA GeForce GTX 1660 Ti (5928MB) driver: 550.40
But Device 1 completing this host, should be shown as NVIDIA NVIDIA GeForce GTX 1650 SUPER (3895MB) driver: 550.40
Tasks sent according to Device 0 VRAM (6 GB), would likely run out of memory when striking Device 1 (4 GB VRAM)
ID: 61184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 61185 - Posted: 3 Feb 2024, 14:24:40 UTC - in response to Message 61184.  
Last modified: 3 Feb 2024, 14:25:45 UTC

...From our end we will need to see how to assign WU's based on GPU memory. (Previous apps have been compute bound rather than GPU memory bound and have only been assigned based on driver version)

Probably (I don't know if it is viable), a better solution would be to include some portion in the code to limit peak VRAM according to the true device assigned.
The reason, based in an example:
My Host #482132 is shown by BOINC as [2] NVIDIA NVIDIA GeForce GTX 1660 Ti (5928MB) driver: 550.40
This is true for Device 0, NVIDIA NVIDIA GeForce GTX 1660 Ti (5928MB) driver: 550.40
But Device 1 completing this host, should be shown as NVIDIA NVIDIA GeForce GTX 1650 SUPER (3895MB) driver: 550.40
Tasks sent according to Device 0 VRAM (6 GB), would likely run out of memory when striking Device 1 (4 GB VRAM)


the only caveat with this is that the application or project doesnt have any ability to select which GPUs you have or which GPU will run the task. in your example, if a task was sent that required >4GB, the project has no idea that GPU1 only has 4GB. the project can only see the "first/best" GPU in the system, that is communicated via your boinc client, and the boinc client is the one that selects which tasks go to which GPU. the science application is called after the GPU selection has already been made. and similarly, BOINC has no mechanism to assign tasks based on GPU VRAM use.

you will have to manage things yourself after observing behavior. if you notice one GPU consistently has too little VRAM, you can exclude that GPU from running the QChem project via setting the <exclude_gpu> statement in the cc_config.xml file.

<options>
<exclude_gpu>
<url>https://www.gpugrid.net/</url>
<app>PYSCFbeta</app>
<device_num>1</device_num>
</exclude_gpu>
</options>
ID: 61185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,187
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61186 - Posted: 3 Feb 2024, 18:49:12 UTC - in response to Message 61185.  
Last modified: 3 Feb 2024, 18:53:24 UTC

you will have to manage things yourself after observing behavior.

Certainly.
Your advice is always very appreciated.
Would be fine an update of minimum requirements when PYSCF taks arrive to production stage, as a help for excluding non accomplishing hosts / GPUs.
ID: 61186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pututu

Send message
Joined: 8 Oct 16
Posts: 27
Credit: 4,153,801,869
RAC: 0
Level
Arg
Scientific publications
watwatwatwat
Message 61187 - Posted: 3 Feb 2024, 21:04:18 UTC - in response to Message 61186.  

you will have to manage things yourself after observing behavior.

Certainly.
Your advice is always very appreciated.
Would be fine an update of minimum requirements when PYSCF taks arrive to production stage, as a help for excluding non accomplishing hosts / GPUs.


I would imagine something like what WCG posted may be useful showing system requirements such as memory, disk space, one-time download file size, etc https://www.worldcommunitygrid.org/help/topic.s?shortName=minimumreq.
Other than WCG not running smoothly since the IBM migration, I notice that the WCG system requirements are outdated. I guess it takes effort to maintain such information and keeping it up to date.

So far, this is my limited knowledge about the quantum chemistry task as I'm still learning. Anyone is welcome to chime in for the system requirements.
1) One time download file is about 2GB. Be prepare to wait for hours if you have very slow internet speed.
2) The more gpu vram the better. Seems like 24GB cards or more perform the best.
3) GPUs with faster memory bandwidth and faster FP64 have advantage in shorter run time. Typically this is found in datacenter/server/workstation cards.
ID: 61187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gemini8
Avatar

Send message
Joined: 3 Jul 16
Posts: 31
Credit: 2,248,809,169
RAC: 0
Level
Phe
Scientific publications
watwat
Message 61188 - Posted: 4 Feb 2024, 8:18:18 UTC

Implementing a possibility to choose work with certain demands to the hardware through the preferences would be nice as well.
After lots of problems with the ECM subproject claiming too much system memory yoyo@home divided the subproject into smaller and bigger tasks, which can both be ticked (or be left unticked) in the project preferences.
So, my suggestion is to hand out work that comes in 4, 6, 8, 12, 16 and 24 GB flavours which the user can choose from.
As the machine's system also claims GPU memory it should naturally be considered to leavy about half a gig untouched by the GPUGrid tasks.
- - - - - - - - - -
Greetings, Jens
ID: 61188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61189 - Posted: 5 Feb 2024, 11:04:22 UTC - in response to Message 61188.  

Ok so it seems like things are improved with the latest settings.

I am keeping the WUs short (10 molecule configurations per WU) to minimize the effect of the errors.

I am going to send out some batches of WUs to get through a large dataset we have.

I think this
After lots of problems with the ECM subproject claiming too much system memory yoyo@home divided the subproject into smaller and bigger tasks, which can both be ticked (or be left unticked) in the project preferences.
So, my suggestion is to hand out work that comes in 4, 6, 8, 12, 16 and 24 GB flavours which the user can choose from.

Might be the most workable solution for the future once the current batch of work is done.

The memory use is mainly determined by the size of molecule and number of heavy elememts. So before WUs are sent out we can make a rough estimate of the memory use. There is an elemnt of randomness that comes from high memory use for specific physical configurations that are harder to converge. We cannot estimate this before sending and it will only happen during the calculation.
ID: 61189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pascal

Send message
Joined: 15 Jul 20
Posts: 95
Credit: 2,550,803,412
RAC: 203
Level
Phe
Scientific publications
wat
Message 61190 - Posted: 5 Feb 2024, 11:35:20 UTC - in response to Message 61189.  

Seems like credit has gone down from 150K to 15K?
ID: 61190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Freewill

Send message
Joined: 18 Mar 10
Posts: 28
Credit: 41,810,583,419
RAC: 10,891
Level
Trp
Scientific publications
watwatwatwatwat
Message 61191 - Posted: 5 Feb 2024, 11:42:50 UTC - in response to Message 61190.  

Seems like credit has gone down from 150K to 15K?

Yes, and the memory use this morning seems to require running 1 at a time on GPUs with less than 16 GB, which hurts performance even more.

Steve, what determines point value for a task?
ID: 61191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pascal

Send message
Joined: 15 Jul 20
Posts: 95
Credit: 2,550,803,412
RAC: 203
Level
Phe
Scientific publications
wat
Message 61192 - Posted: 5 Feb 2024, 12:03:37 UTC

Pour le moment ça va pas trop mal avec les nouvelles unités de calcul.
Une erreur sur 4.

For the moment it is not too bad with the new units of calculation.
One in four mistakes.





Nom inputs_v3_ace_pch_ms_gc_filt_af05_index_64000_to_64000-SFARR_PYSCF_ace_pch_ms_gc_filt_af05_v4-0-1-RND0521_0
Unité de travail (WU) 27684102
Créé 5 Feb 2024 | 10:40:37 UTC
Envoyé 5 Feb 2024 | 10:47:37 UTC
Reçu 5 Feb 2024 | 10:49:50 UTC
État du serveur Sur
Résultats Erreur de calcul
État du client Erreur de calcul
État à la sortie 195 (0xc3) EXIT_CHILD_FAILED
ID de l'ordinateur 617458
Date limite de rapport 10 Feb 2024 | 10:47:37 UTC
Temps de fonctionnement 45.93
Temps de CPU 9.59
Valider l'état Invalide
Crédit 0.00
Version de l'application Quantum chemistry calculations on GPU v1.04 (cuda1121)
Stderr output

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
11:47:47 (5931): wrapper (7.7.26016): starting
11:48:16 (5931): wrapper (7.7.26016): starting
11:48:16 (5931): wrapper: running bin/python (bin/conda-unpack)
11:48:17 (5931): bin/python exited; CPU time 0.157053
11:48:17 (5931): wrapper: running bin/tar (xjvf input.tar.bz2)
11:48:18 (5931): bin/tar exited; CPU time 0.002953
11:48:18 (5931): wrapper: running bin/bash (run.sh)
+ echo 'Setup environment'
+ source bin/activate
++ _conda_pack_activate
++ local _CONDA_SHELL_FLAVOR
++ '[' -n x ']'
++ _CONDA_SHELL_FLAVOR=bash
++ local script_dir
++ case "$_CONDA_SHELL_FLAVOR" in
+++ dirname bin/activate
++ script_dir=bin
+++ cd bin
+++ pwd
++ local full_path_script_dir=/home/pascal/slots/3/bin
+++ dirname /home/pascal/slots/3/bin
++ local full_path_env=/home/pascal/slots/3
+++ basename /home/pascal/slots/3
++ local env_name=3
++ '[' -n '' ']'
++ export CONDA_PREFIX=/home/pascal/slots/3
++ CONDA_PREFIX=/home/pascal/slots/3
++ export _CONDA_PACK_OLD_PS1=
++ _CONDA_PACK_OLD_PS1=
++ PATH=/home/pascal/slots/3/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
++ PS1='(3) '
++ case "$_CONDA_SHELL_FLAVOR" in
++ hash -r
++ local _script_dir=/home/pascal/slots/3/etc/conda/activate.d
++ '[' -d /home/pascal/slots/3/etc/conda/activate.d ']'
+ export PATH=/home/pascal/slots/3:/home/pascal/slots/3/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ PATH=/home/pascal/slots/3:/home/pascal/slots/3/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ echo 'Create a temporary directory'
+ export TMP=/home/pascal/slots/3/tmp
+ TMP=/home/pascal/slots/3/tmp
+ mkdir -p /home/pascal/slots/3/tmp
+ export OMP_NUM_THREADS=1
+ OMP_NUM_THREADS=1
+ export CUDA_VISIBLE_DEVICES=1
+ CUDA_VISIBLE_DEVICES=1
+ export CUPY_CUDA_LIB_PATH=/home/pascal/slots/3/cupy
+ CUPY_CUDA_LIB_PATH=/home/pascal/slots/3/cupy
+ echo 'Running PySCF'
+ python compute_dft.py
/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/lib/cutensor.py:174: UserWarning: using cupy as the tensor contraction engine.
warnings.warn(f'using {contract_engine} as the tensor contraction engine.')
/home/pascal/slots/3/lib/python3.11/site-packages/pyscf/dft/libxc.py:771: UserWarning: Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, corresponding to the original definition by Stephens et al. (issue 1480) and the same as the B3LYP functional in Gaussian. To restore the VWN5 definition, you can put the setting "B3LYP_WITH_VWN5 = True" in pyscf_conf.py
warnings.warn('Since PySCF-2.3, B3LYP (and B3P86) are changed to the VWN-RPA variant, '
nao = 570
/home/pascal/slots/3/lib/python3.11/site-packages/pyscf/gto/mole.py:1280: UserWarning: Function mol.dumps drops attribute charge because it is not JSON-serializable
warnings.warn(msg)
Traceback (most recent call last):
File "/home/pascal/slots/3/lib/python3.11/site-packages/pyscf/lib/misc.py", line 1094, in __exit__
handler.result()
File "/home/pascal/slots/3/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/pascal/slots/3/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/df/df_jk.py", line 52, in build_df
rsh_df.build(omega=omega)
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/df/df.py", line 102, in build
self._cderi = cholesky_eri_gpu(intopt, mol, auxmol, self.cd_low, omega=omega)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/df/df.py", line 256, in cholesky_eri_gpu
if lj>1: ints_slices = cart2sph(ints_slices, axis=1, ang=lj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/lib/cupy_helper.py", line 333, in cart2sph
t_sph = contract('min,ip->mpn', t_cart, c2s, out=out)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/lib/cutensor.py", line 177, in contract
return cupy.asarray(einsum(pattern, a, b), order='C')
^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/cupy/linalg/_einsum.py", line 676, in einsum
arr_out, sub_out = reduced_binary_einsum(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/cupy/linalg/_einsum.py", line 418, in reduced_binary_einsum
tmp1, shapes1 = _flatten_transpose(arr1, [bs1, cs1, ts1])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/lib/python3.11/site-packages/cupy/linalg/_einsum.py", line 298, in _flatten_transpose
a.transpose(transpose_axes).reshape(
File "cupy/_core/core.pyx", line 752, in cupy._core.core._ndarray_base.reshape
File "cupy/_core/_routines_manipulation.pyx", line 81, in cupy._core._routines_manipulation._ndarray_reshape
File "cupy/_core/_routines_manipulation.pyx", line 357, in cupy._core._routines_manipulation._reshape
File "cupy/_core/core.pyx", line 611, in cupy._core.core._ndarray_base.copy
File "cupy/_core/core.pyx", line 570, in cupy._core.core._ndarray_base.astype
File "cupy/_core/core.pyx", line 132, in cupy._core.core.ndarray.__new__
File "cupy/_core/core.pyx", line 220, in cupy._core.core._ndarray_base._init
File "cupy/cuda/memory.pyx", line 740, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1426, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1447, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1118, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1139, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 1346, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
File "cupy/cuda/memory.pyx", line 1358, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 595,413,504 bytes (allocated so far: 3,207,694,336 bytes, limit set to: 3,684,158,668 bytes).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/pascal/slots/3/compute_dft.py", line 121, in <module>
e,f,dip,q = compute_gpu(mol)
^^^^^^^^^^^^^^^^
File "/home/pascal/slots/3/compute_dft.py", line 24, in compute_gpu
e_dft = mf.kernel() # compute total energy
^^^^^^^^^^^
File "<string>", line 2, in kernel
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/scf/hf.py", line 586, in scf
_kernel(self, self.conv_tol, self.conv_tol_grad,
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/scf/hf.py", line 393, in _kernel
mf.init_workflow(dm0=dm)
File "/home/pascal/slots/3/lib/python3.11/site-packages/gpu4pyscf/df/df_jk.py", line 56, in init_workflow
with lib.call_in_background(build_df) as build:
File "/home/pascal/slots/3/lib/python3.11/site-packages/pyscf/lib/misc.py", line 1096, in __exit__
raise ThreadRuntimeError('Error on thread %s:\n%s' % (self, e))
pyscf.lib.misc.ThreadRuntimeError: Error on thread <pyscf.lib.misc.call_in_background object at 0x7fec06934850>:
Out of memory allocating 595,413,504 bytes (allocated so far: 3,207,694,336 bytes, limit set to: 3,684,158,668 bytes).
11:48:31 (5931): bin/bash exited; CPU time 11.139443
11:48:31 (5931): app exit status: 0x1
11:48:31 (5931): called boinc_finish(195)

</stderr_txt>
]]>

ID: 61192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 61193 - Posted: 5 Feb 2024, 12:32:37 UTC

I'm seeing about 10% failure rate with 12GB cards.
ID: 61193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61194 - Posted: 5 Feb 2024, 12:55:51 UTC - in response to Message 61193.  

Credits should now be at 75k for rest of the batch. They should be consistent based on comparisons of runtime on our test machines across the other Apps, but this is complicated with this new memory intensive app. I will investigate before sending the next batch.

ID: 61194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 14 · Next

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU

©2025 Universitat Pompeu Fabra