Advanced search

Message boards : Number crunching : Quantum chemistry calculations on GPU

Author Message
roundup
Send message
Joined: 11 May 10
Posts: 63
Credit: 9,115,555,193
RAC: 54,272,784
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60938 - Posted: 8 Jan 2024 | 19:26:02 UTC

I have received 4 'Quantum chemistry calculations on GPU' WU, 3 of them calculated successfully on 2 linux machines with 4080 and 4070ti.
Example here:
https://www.gpugrid.net/result.php?resultid=33727490
150 credits per successful WU? Seems odd.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,573,724
RAC: 13,216,170
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60939 - Posted: 8 Jan 2024 | 20:33:53 UTC - in response to Message 60938.

I've had 3 failures and 3 successes. Seems to be test tasks. Interesting bit is that they seem to be employing Nvidia Tensor core calculation paths.

Very little written into the result.txt file and they use very little of the card's resources.

Let's hope that these precursor test tasks are an indication of more substantive QC tasks. Similar to what we see on the ATMbeta app.

Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 21 Dec 23
Posts: 46
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60941 - Posted: 8 Jan 2024 | 21:00:27 UTC

Hello,

Steve here from the computational science lab.

This is indeed a new test app (just for linux at the moment). You may be getting some test jobs for this app if you have selected the run test applications option.

There will be in depth post about the new app soon and then once we have it running properly some substantial work!

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 60942 - Posted: 8 Jan 2024 | 21:08:16 UTC - in response to Message 60941.

Hello,

Steve here from the computational science lab.

This is indeed a new test app (just for linux at the moment). You may be getting some test jobs for this app if you have selected the run test applications option.

There will be in depth post about the new app soon and then once we have it running properly some substantial work!



Thanks Steve.

is it intended that these tasks do not use the GPU right now? most are reporting that they run a process on the GPU, but no GPU utilization and no significant power draw over idle.

will they use tensor cores on RTX cards?
if so, are they necessary? what about older GTX cards?
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,573,724
RAC: 13,216,170
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60943 - Posted: 8 Jan 2024 | 23:18:03 UTC

I did now catch the card using VRAM and power resources if I just ignore what nvidia-smi is telling me which gpu has the job task on it. Nvidia-smi is getting confused with these QC test tasks but reports properly for the ATMbeta python tasks.

About 7.6 GB of VRAM usage on the gpu and brief bursts of full power usage of the gpu. Up to 76GB of virtual memory used for the python process on the cpu.


Thanks for the QC task news Steve. Looking forward to the real work to come.

Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 21 Dec 23
Posts: 46
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 60945 - Posted: 9 Jan 2024 | 8:40:08 UTC
Last modified: 9 Jan 2024 | 8:44:41 UTC

Thanks for the feedback and thanks all for running the tests!

Some of the older test runs will have been using CPU and no GPU. This was for debugging purposes.

All of the most recent tests will have been using the GPU. There should be high utilisation, high memory use, and high power draw. Running test locally on a RTX3090 the test work unit takes 6minutes and is at 100% gpu, 10GB max memory and draws max power. Here are the charts:



https://i.imgur.com/5ceCoW1.png

This test workunit represents calculating the forces and energy of a single small molecule with Quantum Chemistry methods. In future a work unit will comprise multiple of these calculations.


This application will only work on GPUs with compute capability 6.0 or newer. So it works fine on 10XX cards. I can see a few failures on older cards.

The code uses the NVIDIA linear algebra libraries so on newer cards you may see more utilisation. (Nvidia spend more time optimising these libraries for the most recent generation cards).

The other error I am now seeing, and you may see, is a CUDA out of memory error. This does not seem to be due to the maximum memory of the GPU. I can run the test successfully on a GTX1080 with 8GB and I am seeing successful results from hosts with GPUs with less memory. I believe this error occurs when the workunit is sharing the GPU with another process.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,829,016,430
RAC: 19,679,200
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60946 - Posted: 9 Jan 2024 | 9:03:05 UTC - in response to Message 60945.
Last modified: 9 Jan 2024 | 9:04:29 UTC



(You can't use https for images at this project - the web software is too old)

Boca Raton Community HS
Send message
Joined: 27 Aug 21
Posts: 36
Credit: 6,741,331,809
RAC: 47,200,234
Level
Tyr
Scientific publications
wat
Message 61024 - Posted: 18 Jan 2024 | 15:29:37 UTC

Saw a big batch of work units yesterday for this project and all of them were successful on our end. For a relatively early beta, that's fantastic. What are others seeing with that big batch?

No issues running, ran at 2x on our systems without issues. Are there no checkpoints for these work units? I did notice that when I had to exit BOINC a few times to make some changes to the app config.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 61027 - Posted: 18 Jan 2024 | 15:49:56 UTC - in response to Message 61024.
Last modified: 18 Jan 2024 | 15:55:40 UTC

I think no checkpoints right now. but they at least restart from the beginning without an error.

VRAM utilization was greatly reduced from the previous batch (which I like since it lets you more easily run more than one at a time). runtime was roughly twice as long as the previous batch as the admin indicated that the tasks were twice as large (100 molecules instead of 50)

the application makes use of FP64 hardware and high performing FP64 Nvidia cards show great benefit. Titan V, V100, P100 will perform the best here. the FP64 performance of most other GeForce/Quadro cards are much lower. My Titan V runtimes look to be about 3x faster than your 4090 for example. power draw fluctuated between 80-150W per card, probably about 120W average per card.

I processed about 1500 of the tasks from yesterday on my 16x Titan Vs, would have been more but I was having a lot of issues getting enough work due to the task download limits, the rate i was completing them per host, and the limits in how often a single IP is allowed to make requests at this project.
____________

Boca Raton Community HS
Send message
Joined: 27 Aug 21
Posts: 36
Credit: 6,741,331,809
RAC: 47,200,234
Level
Tyr
Scientific publications
wat
Message 61030 - Posted: 18 Jan 2024 | 17:03:05 UTC - in response to Message 61027.

Makes sense. Were you running them 1x or 2x?

1,500 tasks is incredible- with no invalids?

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 61031 - Posted: 18 Jan 2024 | 17:14:46 UTC - in response to Message 61030.
Last modified: 18 Jan 2024 | 17:15:32 UTC

i was running them mostly at 3x. one system I was running them at 4x actually to see if the VRAM was sufficient. it was fastest overall at 4x with ~2000s runtimes. so the fastest tasks were completing in about 8.5 minutes effective.

no invalids.
____________

Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 21 Dec 23
Posts: 46
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61038 - Posted: 19 Jan 2024 | 14:18:23 UTC - in response to Message 61031.

Wow that is impressive!

This app is one of the cases where the professional grade nvidia cards with their double precision performance really shine.

Drago
Send message
Joined: 3 May 20
Posts: 18
Credit: 836,994,060
RAC: 4,119,062
Level
Glu
Scientific publications
wat
Message 61460 - Posted: 12 Apr 2024 | 13:32:15 UTC

I would really appreciate it if the task requirements and properties such as VRAM requirement and no chechkpointing, only Linux or Windows, etc would be highlighted in the preference section right were you mark the sub projects that you would like to support. That would save us volunteers a lot of time instead of finding out eventually that your GPU isn't capable of handling them or by digging through pages and pages of forum entries.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1340
Credit: 7,653,573,724
RAC: 13,216,170
Level
Tyr
Scientific publications
watwatwatwatwat
Message 61462 - Posted: 12 Apr 2024 | 16:45:04 UTC - in response to Message 61460.

You should PM Gianni or Toni and point them at your post request. Steve, the developer for the science app discussed here has nothing to do with the project web pages.

But if you brought this request nicety to Gianni, he may decide to add this additional project and subproject requirements to the subproject selection page in Project Preferences.

Drago
Send message
Joined: 3 May 20
Posts: 18
Credit: 836,994,060
RAC: 4,119,062
Level
Glu
Scientific publications
wat
Message 61463 - Posted: 13 Apr 2024 | 1:06:34 UTC

Ok Keith. Thanks for the info. I will ask nicely. :-)

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,210,882,676
RAC: 29,470,062
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 61468 - Posted: 16 Apr 2024 | 9:34:03 UTC

For quite a while now, only QC tasks have been available, with sometimes more than 100.000 unsent tasks, as seen in the project status page.

All other subprojects have obviously been stopped, which excludes all Windows crunchers from participating in GPUGRID :-(
Is this the way GPUGRID will go now ?

tomaras
Send message
Joined: 4 Mar 20
Posts: 15
Credit: 2,077,250,079
RAC: 11,714,682
Level
Phe
Scientific publications
wat
Message 61600 - Posted: 16 Jul 2024 | 23:27:23 UTC

What computer/os does it take to run the Quantum chemistry calculations on GPU? I've got a powerful Windows 11 machine with a top end I-9 processor and NVIDIA RTX 4090 sitting here idle.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 485
Credit: 11,083,903,479
RAC: 15,586,791
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61601 - Posted: 17 Jul 2024 | 1:01:13 UTC - in response to Message 61600.

What computer/os does it take to run the Quantum chemistry calculations on GPU? I've got a powerful Windows 11 machine with a top end I-9 processor and NVIDIA RTX 4090 sitting here idle.


Linux.


Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 61603 - Posted: 17 Jul 2024 | 2:07:34 UTC - in response to Message 61600.

What computer/os does it take to run the Quantum chemistry calculations on GPU? I've got a powerful Windows 11 machine with a top end I-9 processor and NVIDIA RTX 4090 sitting here idle.


an old P100 or V100 is many times faster than a 4090 for these tasks.

but yeah. only available for Linux anyway.
____________

pututu
Send message
Joined: 8 Oct 16
Posts: 25
Credit: 4,153,801,869
RAC: 11,857,495
Level
Arg
Scientific publications
watwatwatwat
Message 61613 - Posted: 20 Jul 2024 | 5:23:43 UTC
Last modified: 20 Jul 2024 | 5:24:38 UTC

Got one task where all eight hosts are failing due to "Nuclear gradients of %s not converged" at step #6
https://www.gpugrid.net/workunit.php?wuid=28944368

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,829,016,430
RAC: 19,679,200
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61811 - Posted: 17 Sep 2024 | 10:51:11 UTC

Heads up: problem with new Linux BOINC installation script.

This will affect new users only - existing users need not make any adjustments.

BOINC makes substantial use of a BOINC data directory, which for many Linux flavours has become established at '/var/lib/boinc-client/'. Instead, the new BOINC installer creates it at '/var/lib/boinc/'.

This breaks the current Quantum Chemistry application, which fails with

FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/boinc-client/.cupy'

(see errors for host 625407)

If you have sudo access to your machine (and I assume you do, if you are installing your own software), you should find /var/lib/boinc, and ctreate a symlink folder which points to it, and call that /var/lib/boinc-client

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 61812 - Posted: 17 Sep 2024 | 12:34:27 UTC - in response to Message 61811.
Last modified: 17 Sep 2024 | 12:47:47 UTC

Heads up: problem with new Linux BOINC installation script.

This will affect new users only - existing users need not make any adjustments.

BOINC makes substantial use of a BOINC data directory, which for many Linux flavours has become established at '/var/lib/boinc-client/'. Instead, the new BOINC installer creates it at '/var/lib/boinc/'.

This breaks the current Quantum Chemistry application, which fails with

FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/boinc-client/.cupy'

(see errors for host 625407)

If you have sudo access to your machine (and I assume you do, if you are installing your own software), you should find /var/lib/boinc, and ctreate a symlink folder which points to it, and call that /var/lib/boinc-client


i'm willing to bet that the directory listed is just a relative file path definition, not hard coded. something like assuming you're in the running slot. then going to "../../.cupy" or maybe utilizing an environment variable with $PATH or even BOINC's internal path variables.

I don't even install boinc to that directory, nor do i have anything related to boinc in my /var/lib directory. i have it in my home folder. if it were hard coded, no one with a standalone install would be doing work, and no one has reported any issues, so...

if you migrated an existing system with apps and stuff already downloaded, you might have some lingering configurations from the old setup? try resetting the project.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,829,016,430
RAC: 19,679,200
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61813 - Posted: 17 Sep 2024 | 15:58:11 UTC - in response to Message 61812.

This is a brand-new machine (less than 3 weeks old) - supplied with no OS installed, so I've installed Linux Mint and BOINC from cold - no history.

BOINC is running as a service, but the data file is actually called 'boinc-data' and is located on a different SSD - I've learned how to manage folder redirects!

A GPUGrid contributor has joined the parallel conversation at BOINC, and we'll resolve it there. But the work-round works in the meantime, if you need it.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 61814 - Posted: 17 Sep 2024 | 16:55:00 UTC - in response to Message 61813.
Last modified: 17 Sep 2024 | 17:39:35 UTC

This is a brand-new machine (less than 3 weeks old) - supplied with no OS installed, so I've installed Linux Mint and BOINC from cold - no history.

BOINC is running as a service, but the data file is actually called 'boinc-data' and is located on a different SSD - I've learned how to manage folder redirects!

A GPUGrid contributor has joined the parallel conversation at BOINC, and we'll resolve it there. But the work-round works in the meantime, if you need it.


i think you running it on a separate SSD is likely the issue, or a problem with Linux Mint. and it wont impact most people.

it's clearly not hard coded, since I do not have any /var/lib/boinc /var/lib/boinc-client /var/lib/boinc-data or otherwise in my directories at all. in fact, the .cupy directory in use on my system is just put in the users home directory (/home/ian/.cupy) and this works perfectly fine for QChem.

you seem to have something else going on with the system to get the environment variables confused between $HOME and /var/lib/boinc-client, cause it's not hard coded to that on GPUGRID's end.
____________

Steve
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 21 Dec 23
Posts: 46
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 61815 - Posted: 17 Sep 2024 | 20:32:49 UTC - in response to Message 61814.
Last modified: 17 Sep 2024 | 20:33:52 UTC

Yes that temp dir used by cupy should be located at $HOME/.cupy by default as I mentioned here: https://github.com/BOINC/boinc/discussions/5811#discussioncomment-10670615
And as shown by Ian’s path. I don’t quite know why it would be different for your setup. Maybe a side effect of the new client version you have installed combined with our older wrapper script.

Although as mentioned by the BOINC developers it is better if this runtime folder is instead located in the boinc slot or project directory. I will add this change to the next app update.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 220,113
Level
Trp
Scientific publications
watwatwat
Message 61818 - Posted: 22 Sep 2024 | 14:25:38 UTC - in response to Message 61813.

This is a brand-new machine (less than 3 weeks old) - supplied with no OS installed, so I've installed Linux Mint and BOINC from cold - no history.

BOINC is running as a service, but the data file is actually called 'boinc-data' and is located on a different SSD - I've learned how to manage folder redirects!

A GPUGrid contributor has joined the parallel conversation at BOINC, and we'll resolve it there. But the work-round works in the meantime, if you need it.

I've got what may be the same problem with a fresh install of Linux Mint 22 Ubuntu 24.04 Noble. Can't run BOINC on it following these instructions:
https://isaac.ssl.berkeley.edu/linux_install.php?os_num=6&build=alpha
Please post a link to your "parallel conversation at BOINC" or better yet the solution. TIA

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1069
Credit: 40,231,533,983
RAC: 527
Level
Trp
Scientific publications
wat
Message 61819 - Posted: 22 Sep 2024 | 16:09:46 UTC - in response to Message 61818.

This is a brand-new machine (less than 3 weeks old) - supplied with no OS installed, so I've installed Linux Mint and BOINC from cold - no history.

BOINC is running as a service, but the data file is actually called 'boinc-data' and is located on a different SSD - I've learned how to manage folder redirects!

A GPUGrid contributor has joined the parallel conversation at BOINC, and we'll resolve it there. But the work-round works in the meantime, if you need it.

I've got what may be the same problem with a fresh install of Linux Mint 22 Ubuntu 24.04 Noble. Can't run BOINC on it following these instructions:
https://isaac.ssl.berkeley.edu/linux_install.php?os_num=6&build=alpha
Please post a link to your "parallel conversation at BOINC" or better yet the solution. TIA


Sounds like richards problem was in relation to the QChem tasks specifically, not BOINC as a whole. if you're having a problem running BOINC, you have a separate issue.
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,829,016,430
RAC: 19,679,200
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61820 - Posted: 22 Sep 2024 | 16:48:41 UTC - in response to Message 61819.

Sounds like richards problem was in relation to the QChem tasks specifically, not BOINC as a whole. if you're having a problem running BOINC, you have a separate issue.

Yes, it was specific to Quantum Chemistry tasks. As soon as I saw the results of the first night's run (all very quick errors), I switched to ATMML and they ran flawlessly. Then I investigated the error message about the missing file and the location it was looking in - I knew that didn't match the installation I'd only just completed.

So I devised the workround I posted before, and it worked with no other changes. It's an easy fix, so I suggested BOINC cover it - but after discussion, it was decided to fix it at the project end.

Unfortunately, the date on the apps page still shows an installation data of 9 Jul 2024, so the problem probably still exists.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1620
Credit: 8,829,016,430
RAC: 19,679,200
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 61821 - Posted: 22 Sep 2024 | 17:17:39 UTC - in response to Message 61818.

I've got what may be the same problem with a fresh install of Linux Mint 22 Ubuntu 24.04 Noble. Can't run BOINC on it following these instructions:
https://isaac.ssl.berkeley.edu/linux_install.php?os_num=6&build=alpha

I've had a quick look at your host 624473 - that's the only Mint 22 I can see in the list. You only let two tasks run to the point where they reported a failure.

Both were ACEND 3 tasks. One was 'Particle coordinate is nan': the other was 'Cannot use a restart file on a different device!'. Both are well known processing errors here, and not related to the version of BOINC used.

Please post a link to your "parallel conversation at BOINC" or better yet the solution.

It's at https://github.com/BOINC/boinc/discussions/5811

Pascal
Send message
Joined: 15 Jul 20
Posts: 77
Credit: 1,563,272,434
RAC: 11,391,156
Level
His
Scientific publications
wat
Message 61822 - Posted: 23 Sep 2024 | 7:57:20 UTC - in response to Message 61821.

https://www.gpugrid.net/hosts_user.php?userid=563937


pc sous linux mint 22
____________

Post to thread

Message boards : Number crunching : Quantum chemistry calculations on GPU

//