PYSCFbeta: Quantum chemistry calculations on GPU

Author	Message
pututu Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level Scientific publications	Message 61195 - Posted: 5 Feb 2024, 15:09:41 UTC There are some tasks that spike over 10G. Seems like nvidia-smi doesn't allow logging time shorter than 1s. Anyone has a workaround? Likely that the momentarily spike could be higher than 10G as recorded. 2024/02/05 07:06:39.675, 88 %, 1328 MHz, 5147 MiB, 115.28 W, 65 2024/02/05 07:06:40.678, 96 %, 1278 MHz, 5147 MiB, 117.58 W, 65 2024/02/05 07:06:41.688, 100 %, 1328 MHz, 5177 MiB, 111.94 W, 65 2024/02/05 07:06:42.691, 100 %, 1328 MHz, 6647 MiB, 70.23 W, 64 2024/02/05 07:06:43.694, 30 %, 1328 MHz, 8475 MiB, 69.65 W, 64 2024/02/05 07:06:44.697, 100 %, 1328 MHz, 9015 MiB, 81.81 W, 64 2024/02/05 07:06:45.700, 100 %, 1328 MHz, 9007 MiB, 46.32 W, 63 2024/02/05 07:06:46.705, 98 %, 1278 MHz, 9941 MiB, 46.08 W, 63 2024/02/05 07:06:47.708, 99 %, 1328 MHz, 10251 MiB, 57.06 W, 63 2024/02/05 07:06:48.711, 97 %, 1088 MHz, 4553 MiB, 133.72 W, 65 2024/02/05 07:06:49.714, 95 %, 1075 MHz, 4553 MiB, 132.99 W, 65 ID: 61195 · Rating: 0 · rate: / Reply Quote

pututu Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level Scientific publications	Message 61196 - Posted: 5 Feb 2024, 16:21:57 UTC - in response to Message 61195. Got a biggie. This one is 14.6G. I'm running 16G card. One task per gpu. 2024/02/05 08:20:03.043, 100 %, 1328 MHz, 9604 MiB, 107.19 W, 71 2024/02/05 08:20:04.046, 94 %, 1328 MHz, 11970 MiB, 97.69 W, 71 2024/02/05 08:20:05.049, 99 %, 1328 MHz, 12130 MiB, 123.24 W, 70 2024/02/05 08:20:06.052, 100 %, 1316 MHz, 12130 MiB, 122.21 W, 71 2024/02/05 08:20:07.055, 100 %, 1328 MHz, 12130 MiB, 121.26 W, 71 2024/02/05 08:20:08.058, 100 %, 1328 MHz, 12130 MiB, 118.64 W, 71 2024/02/05 08:20:09.061, 17 %, 1328 MHz, 12116 MiB, 56.48 W, 70 2024/02/05 08:20:10.064, 95 %, 1189 MHz, 14646 MiB, 73.99 W, 71 2024/02/05 08:20:11.071, 99 %, 1139 MHz, 14646 MiB, 194.84 W, 71 2024/02/05 08:20:12.078, 96 %, 1316 MHz, 14650 MiB, 65.82 W, 70 2024/02/05 08:20:13.081, 85 %, 1328 MHz, 8952 MiB, 84.32 W, 70 2024/02/05 08:20:14.084, 100 %, 1075 MHz, 8952 MiB, 130.53 W, 71 ID: 61196 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61197 - Posted: 5 Feb 2024, 16:35:36 UTC - in response to Message 61196. Last modified: 5 Feb 2024, 16:36:34 UTC yeah i think you'll only ever see the spike if you actually have the VRAM for it. if you don't have enough, it will error out before hitting it and you'll never see it. I'm just gonna deal with the errors. cost of doing business lol. I have my system set for 70% ATP through MPS. QChem gpu_usage set to 0.55 ATMbeta gpu_usage set to 0.44 this way when both tasks are available, it will run either ATMbeta+ATMbeta, or ATMbeta+QChem on the same GPU, but will not allow 2x Qchem on the same GPU. i do this because ATMbeta uses a really small amount of the GPU VRAM and can utilize some of the spare compute cycles without hurting QChem VRAM use much. but when it's running only QChem and only running 1x tasks, it's not using absolutely the most compute that it could (only 70%), so maybe a little slower, but Titan Vs are fast enough anyway. most tasks finishing in about 6mins or so. some outliers running ~18mins. ID: 61197 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61198 - Posted: 5 Feb 2024, 16:37:51 UTC - in response to Message 61196. pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. ID: 61198 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61199 - Posted: 5 Feb 2024, 16:53:51 UTC - in response to Message 61197. QChem gpu_usage set to 0.55 ATMbeta gpu_usage set to 0.44 We did this as well this morning for the 4090 GPUs since they have 24GB but with E@H work. To little VRAM to run QChem at 2x but too much compute power left on the table for running them at 1x. ID: 61199 · Rating: 0 · rate: / Reply Quote

pututu Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level Scientific publications	Message 61200 - Posted: 5 Feb 2024, 17:02:55 UTC - in response to Message 61198. Last modified: 5 Feb 2024, 17:20:10 UTC pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. 0 failure after 19 completed tasks on one P100 with 16G. So far 14.6G is the highest I've seen with 1 sec interval monitoring More than half of the tasks processed momentarily hit 8G or more. Didn't record any actual data, just watching the nvidia-smi from time to time. Edit: another task with more than 12G but with ominous 6666M, lol 2024/02/05 09:17:58.869, 99 %, 1328 MHz, 10712 MiB, 131.69 W, 70 2024/02/05 09:17:59.872, 100 %, 1328 MHz, 10712 MiB, 101.87 W, 70 2024/02/05 09:18:00.877, 100 %, 1328 MHz, 10700 MiB, 50.15 W, 69 2024/02/05 09:18:01.880, 92 %, 1240 MHz, 11790 MiB, 54.34 W, 69 2024/02/05 09:18:02.883, 95 %, 1240 MHz, 12364 MiB, 53.20 W, 69 2024/02/05 09:18:03.886, 83 %, 1126 MHz, 6666 MiB, 137.77 W, 70 2024/02/05 09:18:04.889, 100 %, 1075 MHz, 6666 MiB, 130.53 W, 71 2024/02/05 09:18:05.892, 92 %, 1164 MHz, 6666 MiB, 129.84 W, 71 2024/02/05 09:18:06.902, 100 %, 1063 MHz, 6666 MiB, 129.82 W, 71 ID: 61200 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61201 - Posted: 6 Feb 2024, 2:51:01 UTC - in response to Message 61198. pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. been running all day across my 18x Titan Vs. the effective error rate is right around 5%. so 5% of the tasks needed more than 12GB. running only 1 task per GPU. i rented an A100 40GB for the day. running 3x on this GPU with MPS set to 40%, it's done about 300 tasks and only 1 task failed from out of memory. highest spike i saw was 39GB, but usually stays around 20GB utilized ID: 61201 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61202 - Posted: 6 Feb 2024, 5:02:10 UTC - in response to Message 61201. pututu, have you had any failed tasks? Ian&Steve C. reports ~10% failure rate with 12GB so I am curious about 16GB. I am guessing this is about the minimum for error-free (related to memory limitations) processing of the current work. been running all day across my 18x Titan Vs. the effective error rate is right around 5%. so 5% of the tasks needed more than 12GB. running only 1 task per GPU. i rented an A100 40GB for the day. running 3x on this GPU with MPS set to 40%, it's done about 300 tasks and only 1 task failed from out of memory. highest spike i saw was 39GB, but usually stays around 20GB utilized Wow, the A100 is powerful. I can't believe how fast it can chew through these (well, I can believe it, but it's still amazing). I am somewhat new to MPS and I understand the general concept, but what do you mean when you say it is set to 40%? ID: 61202 · Rating: 0 · rate: / Reply Quote

Pascal Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,586,053,412 RAC: 7,959 Level Scientific publications	Message 61203 - Posted: 6 Feb 2024, 8:44:37 UTC eh bien moi j'ai abandonné trop d'erreurs. well I gave up too many mistakes ID: 61203 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61204 - Posted: 6 Feb 2024, 11:39:54 UTC - in response to Message 61202. I am somewhat new to MPS and I understand the general concept, but what do you mean when you say it is set to 40%? CUDA MPS has a setting called active thread percentage. It basically limits how many SMs of the GPU get used for each process. Without MPS, each process will call for all available SMs all the time, in separate contexts (MPS also shares a single context). I set that to 40%, so each task is only using 40% of the available SMs. With 3x running that’s slightly over provisioning the GPU, but it usually works well and runs faster than 3x without MPS. It also has the benefit of reducing VRAM use most of the time, but it doesn’t seem to limit these tasks much. The only caveat is that when you run low on work, the remaining one or two tasks won’t use all the GPU, instead using only the 40% and none of the rest of the idle GPU. ID: 61204 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 61205 - Posted: 6 Feb 2024, 13:11:48 UTC - in response to Message 61195. Seems like nvidia-smi doesn't allow logging time shorter than 1s. Anyone has a workaround? Have you tried NVITOP? https://github.com/XuehaiPan/nvitop ID: 61205 · Rating: 0 · rate: / Reply Quote

pututu Send message Joined: 8 Oct 16 Posts: 27 Credit: 4,153,801,869 RAC: 0 Level Scientific publications	Message 61206 - Posted: 6 Feb 2024, 17:00:06 UTC - in response to Message 61205. Last modified: 6 Feb 2024, 17:00:33 UTC Seems like nvidia-smi doesn't allow logging time shorter than 1s. Anyone has a workaround? Have you tried NVITOP? https://github.com/XuehaiPan/nvitop No. A quick search seems to indicate that it uses nvidia-smi command, so likely to have similar limitation. Anyway after a day or running (>100+ tasks) I didn't see any failures on the 16GB card, so I'm good, at least for now. ID: 61206 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61207 - Posted: 7 Feb 2024, 15:04:51 UTC - in response to Message 61204. I am somewhat new to MPS and I understand the general concept, but what do you mean when you say it is set to 40%? CUDA MPS has a setting called active thread percentage. It basically limits how many SMs of the GPU get used for each process. Without MPS, each process will call for all available SMs all the time, in separate contexts (MPS also shares a single context). I set that to 40%, so each task is only using 40% of the available SMs. With 3x running that’s slightly over provisioning the GPU, but it usually works well and runs faster than 3x without MPS. It also has the benefit of reducing VRAM use most of the time, but it doesn’t seem to limit these tasks much. The only caveat is that when you run low on work, the remaining one or two tasks won’t use all the GPU, instead using only the 40% and none of the rest of the idle GPU. Thank you for the explanation! ID: 61207 · Rating: 0 · rate: / Reply Quote

Pascal Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,586,053,412 RAC: 7,959 Level Scientific publications	Message 61208 - Posted: 7 Feb 2024, 20:38:28 UTC bonsoir , y a t'il des unités de travail Windows a calculer ou faut il que je repasse sous linux? Merci Good evening, Are there Windows work units to calculate or do I have to go back to linux? Thanks ID: 61208 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61210 - Posted: 7 Feb 2024, 21:10:58 UTC - in response to Message 61208. bonsoir , y a t'il des unités de travail Windows a calculer ou faut il que je repasse sous linux? Merci Good evening, Are there Windows work units to calculate or do I have to go back to linux? Thanks Only Linux still. ID: 61210 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 61212 - Posted: 8 Feb 2024, 13:05:33 UTC - in response to Message 61210. Good evening, Are there Windows work units to calculate or do I have to go back to linux? Thanks Only Linux still. :-( :-( :-( ID: 61212 · Rating: 0 · rate: / Reply Quote

Pascal Send message Joined: 15 Jul 20 Posts: 95 Credit: 2,586,053,412 RAC: 7,959 Level Scientific publications	Message 61213 - Posted: 8 Feb 2024, 16:12:49 UTC je viens de repasser sous linux et c'est reparti.bye bye windows 10. I just came back under linux and it’s gone again.bye bye windows 10 ID: 61213 · Rating: 0 · rate: / Reply Quote

Boca Raton Community HS Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level Scientific publications	Message 61214 - Posted: 8 Feb 2024, 16:43:11 UTC We have definitely noticed a sharp decrease in "errors" with these tasks. Steve (or anyone), can you offer some insight into the filenames? As example: inputs_v3_ace_pch_ms_gc_filt_af05_index_263591_to_263591-SFARR_PYSCF_ace_pch_ms_gc_filt_af05_v4-0-1-RND5514_2 Are there two different references to version? I see a "_v3_" and then a "_v4-0-1". Then, the app version: v1.04 I thought that "_v4-0-1" would equate to the app version, but it doesn't look like it does. Thanks! ID: 61214 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 61215 - Posted: 8 Feb 2024, 17:35:30 UTC - in response to Message 61214. Last modified: 8 Feb 2024, 17:41:34 UTC “0-1”notation with all GPUGRID tasks seems to indicate the segment you are on and how many total segments there are So here, 0 = which segment you are on 1= how many segments there are in total The segment you are on seems to always be in the 0-first kind of notation. We see/saw the same behavior with ATM. Where you will have tasks like 0-5, 1-5, 2-5, etc. and they stop at 4-5, there was a batch that had ten segment for 0-10 through 9-10. they likely have some kind of process on the server side which stiches the results together based on these (and other) numbers ID: 61215 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 34,713 Level Scientific publications	Message 61216 - Posted: 8 Feb 2024, 17:36:34 UTC - in response to Message 61214. Looks like they transitioned from v3-0-1 on Feb 2 to a test result on Feb 3 and then started the v4-0-1 run on Feb 5 That was looking back through 360 validated tasks. I had two errors on the v4-0-1 tasks right at their beginning. Then they all validated since then. All run on two 2080 Ti cards. ID: 61216 · Rating: 0 · rate: / Reply Quote