PYSCFbeta: Quantum chemistry calculations on GPU

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 14 · Next

AuthorMessage
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60963 - Posted: 12 Jan 2024, 13:03:21 UTC

Hello GPUGRID!

We are deploying a new app "PYSCFbeta: Quantum chemistry calculations on GPU". It is currently in testing/beta stage. It is only on Linux at the moment.

The app performs quantum chemistry calculations. At the moment we are using it specifically for Density Functional Theory calculations: http://en.wikipedia.org/wiki/Density_functional_theory

These types of calculations allow us to accurately compute specific properties of small molecules.


The current test work units have a runtime of the order 1hr (very much dependent on the GPU speed and size of molecule). Each work unit currently contains 1 molecule with ~10 configurations.

The app will not work on GPUs with compute capability less than 6.0. It should not be sending them to these cards but I think at the moment this functionality is not working properly.

The work-units require a lot of GPU memory. It works best if the work-unit is the only thing running on the GPU. If other programs are using significant GPU memory the work-unit might fail.

Looking forward to hearing feedback from you.

Steve
ID: 60963 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skillz

Send message
Joined: 6 Jun 17
Posts: 4
Credit: 14,161,410,479
RAC: 36
Level
Trp
Scientific publications
wat
Message 60964 - Posted: 12 Jan 2024, 13:32:48 UTC

When can we expect to start getting these new tasks?
ID: 60964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60965 - Posted: 12 Jan 2024, 13:56:16 UTC

Now, if you are using Linux and have "run test applications?" selected
ID: 60965 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 66
Credit: 10,660,580,875
RAC: 12,103,378
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60966 - Posted: 12 Jan 2024, 13:57:24 UTC - in response to Message 60964.  
Last modified: 12 Jan 2024, 14:19:26 UTC

When can we expect to start getting these new tasks?

They are being distributed RIGHT NOW.
The first 6 WU have arrived here.
ID: 60966 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bormolino

Send message
Joined: 16 May 13
Posts: 41
Credit: 144,231,947
RAC: 4,159,337
Level
Cys
Scientific publications
watwatwatwatwatwat
Message 60967 - Posted: 12 Jan 2024, 14:00:40 UTC

I only get "No tasks sent".

Test applications are allowed and i have compute capability 8.6 with 12GB of GPU Mem running Ubuntu.
ID: 60967 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60968 - Posted: 12 Jan 2024, 15:14:43 UTC

Steve,

there is an issue with this application, that will only be apparent for multi-GPU systems.

the application seems to be hard coded in some way to always use GPU0, or the BOINC device assignment is somehow not being correctly communicated to the app.

this results in all tasks running on the same GPU when they should be split up to different GPUs. due to the high VRAM use, this fills the VRAM on most GPUs and causes errors.

see here:
GLaDOS:~$ nvidia-smi
Fri Jan 12 10:05:59 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05 Driver Version: 535.86.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA TITAN V On | 00000000:21:00.0 On | N/A |
| 80% 55C P2 88W / 150W | 9453MiB / 12288MiB | 100% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA TITAN V On | 00000000:22:00.0 Off | N/A |
| 80% 34C P2 36W / 150W | 42MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA TITAN V On | 00000000:42:00.0 Off | N/A |
| 80% 42C P2 39W / 150W | 42MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA TITAN V On | 00000000:61:00.0 Off | N/A |
| 80% 35C P2 36W / 150W | 42MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1612 G /usr/lib/xorg/Xorg 94MiB |
| 0 N/A N/A 1961 C+G ...libexec/gnome-remote-desktop-daemon 311MiB |
| 0 N/A N/A 2000 G /usr/bin/gnome-shell 67MiB |
| 0 N/A N/A 5931 C nvidia-cuda-mps-server 30MiB |
| 0 N/A N/A 223543 M+C python 4490MiB |
| 0 N/A N/A 223769 M+C python 4462MiB |

| 1 N/A N/A 1612 G /usr/lib/xorg/Xorg 6MiB |
| 1 N/A N/A 5931 C nvidia-cuda-mps-server 30MiB |
| 2 N/A N/A 1612 G /usr/lib/xorg/Xorg 6MiB |
| 2 N/A N/A 5931 C nvidia-cuda-mps-server 30MiB |
| 3 N/A N/A 1612 G /usr/lib/xorg/Xorg 6MiB |
| 3 N/A N/A 5931 C nvidia-cuda-mps-server 30MiB |
+---------------------------------------------------------------------------------------+


in bold, both processes running on the same GPU.
ID: 60968 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60969 - Posted: 12 Jan 2024, 15:19:15 UTC

also, could you please add explicit QChem for GPU selections in the project preferences page? currently it is only possible to get this app if you have ALL apps selected + test apps. I want to exclude some apps but still get this one.
ID: 60969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60970 - Posted: 12 Jan 2024, 15:21:46 UTC - in response to Message 60968.  

Ah yes thank you for confirming this! This is an omission in the scripts from my end. My test machine has one GPU so I missed it. This can be fixed thank you.
ID: 60970 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60971 - Posted: 12 Jan 2024, 15:23:29 UTC

I will try and get the web interface updated but this will take longer due to my unfamiliarity with it. Thanks
ID: 60971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60972 - Posted: 12 Jan 2024, 16:20:22 UTC

just a hunch but I think the problem is with your export command in the run.sh

you have:
export CUDA_VISIBLE_DEVICES=$CUDA_DEVICE


which if I'm reading it right, will set all visible devices to just one GPU. this will have a bad impact for any other tasks running in the BOINC environment i think.

normally on my 4x GPU system, I have CUDA_VISIBLE_DEVICES=0,1,2,3, and if you override that to just the single CUDA device it seems to shuffle all tasks there instead.
ID: 60972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60973 - Posted: 12 Jan 2024, 17:45:08 UTC - in response to Message 60972.  

just a hunch but I think the problem is with your export command in the run.sh

you have:
export CUDA_VISIBLE_DEVICES=$CUDA_DEVICE


which if I'm reading it right, will set all visible devices to just one GPU. this will have a bad impact for any other tasks running in the BOINC environment i think.

normally on my 4x GPU system, I have CUDA_VISIBLE_DEVICES=0,1,2,3, and if you override that to just the single CUDA device it seems to shuffle all tasks there instead.


I guess this wasnt the problem after all :) I see a new small batch went out and i downloaded some and they are working fine now.

ID: 60973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60974 - Posted: 12 Jan 2024, 18:02:58 UTC - in response to Message 60973.  

just a hunch but I think the problem is with your export command in the run.sh

you have:
export CUDA_VISIBLE_DEVICES=$CUDA_DEVICE


which if I'm reading it right, will set all visible devices to just one GPU. this will have a bad impact for any other tasks running in the BOINC environment i think.

normally on my 4x GPU system, I have CUDA_VISIBLE_DEVICES=0,1,2,3, and if you override that to just the single CUDA device it seems to shuffle all tasks there instead.


I guess this wasnt the problem after all :) I see a new small batch went out and i downloaded some and they are working fine now.


Hello, Can you confirm the latest WUs are getting assigned to different GPUs in the way you would expect?


The line in the script you have mentioned is actually the fix I just did. In the first round I had forgotten to put this line.

When the boinc client runs the app via the wrapper mechanism it specifies the gpu device which we capture in the variable CUDA_DEVICE. The Python CUDA code in our app uses the CUDA_VISIBLE_DEVICES variable to choose the GPU. When it is not set (as in the first round of jobs) it defaults to zero. So all jobs end up on GPU zero. With this fix the WUs will be run on the device specified by the boinc client.

ID: 60974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60975 - Posted: 12 Jan 2024, 18:09:30 UTC - in response to Message 60974.  

yup. I just ran 4 tasks on the same 4-GPU system and each one went to a different GPU as it should.

I see in the stderr that the device was selected properly.
ID: 60975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60976 - Posted: 12 Jan 2024, 18:12:10 UTC - in response to Message 60975.  

Thanks very much for the help!

ID: 60976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60977 - Posted: 12 Jan 2024, 18:13:10 UTC - in response to Message 60975.  
Last modified: 12 Jan 2024, 18:31:33 UTC

also, does this app make much use of FP64? I'm noticing very fast runtimes on a Titan V, even faster than something like a RTX 3090. the titan V is slower in FP32, but like 14x faster in FP64.

it's hard to follow the code, but I did see that you use cupy a lot, and maybe something in cupy is able to accelerate the Titan V in some way.

or maybe Tensor core difference? does this QChem app use the tensor cores?
ID: 60977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60978 - Posted: 12 Jan 2024, 19:07:31 UTC - in response to Message 60977.  

Yes this app does make use of some double precision arithmetic. High precision is needed in QM calculations. The bulk of the crunching is done by Nvidia's cusolver library which I believe uses tensor cores when available.
ID: 60978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1099
Credit: 40,331,687,595
RAC: 101,874
Level
Trp
Scientific publications
wat
Message 60979 - Posted: 12 Jan 2024, 19:10:08 UTC - in response to Message 60978.  

Awesome, thanks for that info.

Looking forward to you re-releasing all the tasks you had to pull back earlier :)
ID: 60979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 21 Dec 23
Posts: 51
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60980 - Posted: 12 Jan 2024, 19:16:45 UTC - in response to Message 60979.  

Yes we will restart the large scale test next week!
ID: 60980 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers

Send message
Joined: 13 Dec 17
Posts: 1387
Credit: 8,176,692,190
RAC: 6,609,403
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60981 - Posted: 12 Jan 2024, 20:52:58 UTC - in response to Message 60980.  

+1
ID: 60981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GWGeorge007
Avatar

Send message
Joined: 4 Mar 23
Posts: 10
Credit: 2,912,996,934
RAC: 190
Level
Phe
Scientific publications
wat
Message 60982 - Posted: 13 Jan 2024, 11:51:35 UTC

+1
ID: 60982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 14 · Next

Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU

©2025 Universitat Pompeu Fabra