Update acemd3 app

Author	Message
Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 57123 - Posted: 4 Jul 2021, 18:16:56 UTC - in response to Message 57120. Aurum wrote: But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. these options still work. I use them for my 3080Ti. not sure what you mean? this is exactly what I use for my 3080Ti (same on my Turing hosts) /usr/bin/nvidia-smi -pm 1 /usr/bin/nvidia-smi -acp UNRESTRICTED /usr/bin/nvidia-smi -i 0 -pl 320 /usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100" it works as desired. Aurum wrote: It seems Nvidia chooses a performance level but I can't see how to force it to a desired level: what do you mean by "performance level"? if you mean forcing a certain P-state, no you can't do that. and these cards will not allow getting into P0 state unless you're running a 3D application. any compute application will get a best of P2 state. this has been the case ever since Maxwell. workarounds to force P0 state stopped working since Pascal, so this isnt new. if you mean the PowerMizer preferred mode (which is analogous to the power settings in Windows) you can select that easily in Linux too. I always run mine at "prefer max performance" do this with the following command: /usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" I'm unsure if this really makes much difference though except increasing idle power consumption (forcing higher clocks). the GPU seems to detect loads properly and clock up even when left on the default "Auto" selection. Aurum wrote: Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off. I'm not sure this was intentional, probably something that fell through the cracks that not enough people have complained about for them to dedicate resources to fixing. there's no gain for nvidia disabling this function. but again, this stopped working with Turing, so it's been this way for like 3 years, not something new. I have mostly EVGA cards, so when I want to mess with the lighting, I just throw the card on my test bench, boot into Windows, change the LED settings there, and then put it back in the crunching rig. the settings are preserved internal to the card (for my cards) so it stays and whatever I left it as. you can probably do the same ID: 57123 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 57124 - Posted: 4 Jul 2021, 18:26:11 UTC It sure does not look like running multiple GG WUs on the same GPU has any benefit. My 3080 is stuck in P2. I'd like to try it in P3 and P4 but I can't make it change. I tried: nvidia-smi -lmc 9251 Memory clocks set to "(memClkMin 9501, memClkMax 9501)" for GPU 00000000:65:00.0 All done. nvidia-smi -lgc 240,2130 GPU clocks set to "(gpuClkMin 240, gpuClkMax 2130)" for GPU 00000000:65:00.0 All done. But it's still in P2. ID: 57124 · Rating: 0 · rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level Scientific publications	Message 57125 - Posted: 4 Jul 2021, 18:34:34 UTC - in response to Message 57123. Aurum wrote: But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. these options still work. I use them for my 3080Ti. not sure what you mean? this is exactly what I use for my 3080Ti (same on my Turing hosts) /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100" it works as desired. How do you prove to yourself they work? They don't even exist any more. Run nvidia-settings -q all \| grep -C 10 -i GPUMemoryTransferRateOffset and you will not find either of them. ID: 57125 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 57126 - Posted: 4 Jul 2021, 18:44:18 UTC but all the slightly off-topic aside. It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks. I think the two priorities now should be: 1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute. 2. investigate the cause and provide a remedy for the ~30% slowdown in application performance from the older cuda100 app. this isn't just affecting Ampere, but affecting all GPUs equally it seems. maybe some optimization flag was omitted or some change to the code was made that was undesirable or unintended. just changing from cuda100 to cuda1121 should not in itself have caused this if there were no other code changes. sometimes you can see slight performance changes like 1-2%, but a 30% reduction is a sign that something is clearly wrong. ID: 57126 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 57127 - Posted: 4 Jul 2021, 18:54:32 UTC - in response to Message 57125. Last modified: 4 Jul 2021, 18:56:04 UTC Aurum wrote: But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. these options still work. I use them for my 3080Ti. not sure what you mean? this is exactly what I use for my 3080Ti (same on my Turing hosts) /usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100" it works as desired. How do you prove to yourself they work? They don't even exist any more. Run nvidia-settings -q all \| grep -C 10 -i GPUMemoryTransferRateOffset and you will not find either of them. I prove they work by opening Nvidia X Server Settings and observing that the clock speed offsets have been changed in accordance with the commands and don't give any error when running them. and they have. the commands work 100%. I see you're referencing some other command. I have no idea the function of the command you're trying to use. but my command works. see for yourself: https://i.imgur.com/UFHbhNt.png ID: 57127 · Rating: 0 · rate: / Reply Quote

888 Send message Joined: 28 Jan 21 Posts: 6 Credit: 106,022,917 RAC: 0 Level Scientific publications	Message 57139 - Posted: 5 Jul 2021, 12:12:35 UTC I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds. http://www.gpugrid.net/result.php?resultid=32636087 I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers. ID: 57139 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 57140 - Posted: 5 Jul 2021, 12:31:15 UTC - in response to Message 57139. I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds. http://www.gpugrid.net/result.php?resultid=32636087 I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers. How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh. ID: 57140 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 57142 - Posted: 5 Jul 2021, 13:01:27 UTC - in response to Message 57126. Ian&Steve C wrote: It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks. I think the two priorities now should be: 1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute. 2. investigate the cause and provide a remedy for the ~30% slowdown in application performance from the older cuda100 app. ... and last, but not least: an app for Windows would be nice :-) ID: 57142 · Rating: 0 · rate: / Reply Quote

888 Send message Joined: 28 Jan 21 Posts: 6 Credit: 106,022,917 RAC: 0 Level Scientific publications	Message 57143 - Posted: 5 Jul 2021, 13:31:53 UTC - in response to Message 57140. I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds. http://www.gpugrid.net/result.php?resultid=32636087 I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers. How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh. Thanks for the quick reply. I had the CUDA toolkit ver 10 installed, but after seeing your previous post about you problem, I had already removed it. I'll try purging and reinstalling my nvidia drivers, thanks. ID: 57143 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 57145 - Posted: 5 Jul 2021, 13:45:03 UTC - in response to Message 57143. did you use the included removal script to remove the toolkit? or did you manually delete some files? definitely try the removal script if you havent already. good luck! ID: 57145 · Rating: 0 · rate: / Reply Quote

trigggl Send message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level Scientific publications	Message 57147 - Posted: 5 Jul 2021, 14:56:33 UTC - in response to Message 57126. ... 1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute. ... For those of us who are using the python app, the correct version is installed in the miniconda folder. locate libboost_filesystem /usr/lib64/libboost_filesystem-mt.so /usr/lib64/libboost_filesystem.so /usr/lib64/libboost_filesystem.so.1.76.0 /usr/lib64/cmake/boost_filesystem-1.76.0/libboost_filesystem-variant-shared.cmake /var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so /var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so.1.74.0 /var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake /var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so /var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so.1.74.0 /var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake I definitely don't want to downgrade my system version to run a project. Perhaps gpugrid could include the libboost that they already supply for a different app. Could the miniconda folder be somehow included in the app? ID: 57147 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 12,249,686,510 RAC: 1,140,567 Level Scientific publications	Message 57192 - Posted: 10 Jul 2021, 8:02:28 UTC Richard Haselgrove sait at Message #57177: Look at that timeout: host 528201. Oh, Mr. Kevvy, where art thou? 156 libboost errors? You can fix that... Finally, Mr. Kevvy host #537616 processed successfully today these two tasks: e4s113_e1s796p0f577-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND7908_0 e5s9_e3s99p0f334-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND8007_4 If it was due to your fix, congratulations Mr. Kevvy, you've found the right way. Or perhaps it was some fix at tasks from server side? Hard to know till there are plenty of new tasks ready to send. Currently, 7:51:20 UTC, there are 0 tasks left ready to send, 28 tasks left in progress, as Server status page shows. ID: 57192 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 57193 - Posted: 10 Jul 2021, 8:11:36 UTC - in response to Message 57192. I got a note back from Mr. K - he saw the errors, and was going to check his machines. I imagine he's applied Ian's workround. Curing the world's diseases, one computer at a time. It would be better if that bug could be fixed at source, for a universal cure. ID: 57193 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 12,249,686,510 RAC: 1,140,567 Level Scientific publications	Message 57222 - Posted: 22 Jul 2021, 21:39:41 UTC On July 3rd 2021, Ian&Steve C. wrote at Message #57087: But it’s not just 3000-series being slow. All cards seem to be proportionally slower with 11.2 vs 10.0, by about 30% While organizing screenshots on one of my hosts, I happened to find comparative images for tasks of old Linux APP V2.11 (CUDA 10.0) and new APP V2.12 (CUDA 11.2) * ACEMD V2.11 tasks on 14/06/2021: * ACEMD V2.12 task on 20/07/2021: Pay attention to device 0, the only comparable one. - ACEMD V2.11 task: 08:10:18 = 29418 seconds past to process 15,04%. Extrapolating, this leads to 195598 seconds of total processing time (2d 06:19:58) - ACEMD V2.12 task: 3d 02:51:01 = 269461 seconds past to process 96,48%. Extrapolating, this leads to 279292 seconds of total processing time (3d 05:34:52) That is, about 42,8% of excess processing time for this particular host and device 0 (GTX 1650 GPU) ID: 57222 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 57223 - Posted: 23 Jul 2021, 10:04:15 UTC - in response to Message 57222. Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task. They are different, and not directly comparable. How much of the observed slowdown is down to the data/algorithm, and how much is down to the new application, will need further examples to unravel. ID: 57223 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 12,249,686,510 RAC: 1,140,567 Level Scientific publications	Message 57225 - Posted: 23 Jul 2021, 13:12:06 UTC - in response to Message 57223. Last modified: 23 Jul 2021, 13:13:05 UTC Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task. Bright observer, and sharp appointment, as always. I agree that tasks aren't probably fully comparable, but they are the most comparable I found: Same host, same device, same ADRIA WUs family, same base credit amount granted: 450000... Now I'm waiting for the next move, and wondering about what will it consist of: An amended V2.12 APP?, a new V2.13 APP?, a "superstitious-proof" new V2.14 APP? ... ;-) ID: 57225 · Rating: 0 · rate: / Reply Quote

RJ The Bike Guy Send message Joined: 2 Apr 20 Posts: 20 Credit: 35,363,533 RAC: 0 Level Scientific publications	Message 57230 - Posted: 4 Aug 2021, 2:35:51 UTC Is GPU grid still doing anything? I haven't gotten any work in like a month or more. And before that is was just sporadic. I used to always have work units. Now, nothing. ID: 57230 · Rating: 0 · rate: / Reply Quote

Bill F Send message Joined: 21 Nov 16 Posts: 36 Credit: 164,429,114 RAC: 0 Level Scientific publications	Message 57231 - Posted: 4 Aug 2021, 7:53:39 UTC I am not receiving Windows tasks anymore. My configuration is Boinc 7.16.11 GenuineIntel Intel(R) Xeon(R) CPU E5620 @ 2.40GHz [Family 6 Model 44 Stepping 2](4 processors) NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 461.40 Microsoft Windows 10 Professional x64 Edition, (10.00.19043.00) Am I still within Spec's to get Windows acemd3 work ? Thanks Bill F In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. ID: 57231 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 8,067 Level Scientific publications	Message 57232 - Posted: 4 Aug 2021, 13:43:59 UTC - in response to Message 57231. there hasnt been an appreciable amount of work available for over a month. ID: 57232 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,317,898,501 RAC: 75,187 Level Scientific publications	Message 57233 - Posted: 5 Aug 2021, 12:32:19 UTC - in response to Message 57232. there hasnt been an appreciable amount of work available for over a month. :-( :-( :-( ID: 57233 · Rating: 0 · rate: / Reply Quote