Update acemd3 app

Message boards : News : Update acemd3 app
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57123 - Posted: 4 Jul 2021, 18:16:56 UTC - in response to Message 57120.  

Aurum wrote:
But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux.


these options still work. I use them for my 3080Ti. not sure what you mean?

this is exactly what I use for my 3080Ti (same on my Turing hosts)

/usr/bin/nvidia-smi -pm 1
/usr/bin/nvidia-smi -acp UNRESTRICTED

/usr/bin/nvidia-smi -i 0 -pl 320

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100"


it works as desired.

Aurum wrote:
It seems Nvidia chooses a performance level but I can't see how to force it to a desired level:


what do you mean by "performance level"? if you mean forcing a certain P-state, no you can't do that. and these cards will not allow getting into P0 state unless you're running a 3D application. any compute application will get a best of P2 state. this has been the case ever since Maxwell. workarounds to force P0 state stopped working since Pascal, so this isnt new.

if you mean the PowerMizer preferred mode (which is analogous to the power settings in Windows) you can select that easily in Linux too. I always run mine at "prefer max performance" do this with the following command:

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"


I'm unsure if this really makes much difference though except increasing idle power consumption (forcing higher clocks). the GPU seems to detect loads properly and clock up even when left on the default "Auto" selection.

Aurum wrote:
Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off.

I'm not sure this was intentional, probably something that fell through the cracks that not enough people have complained about for them to dedicate resources to fixing. there's no gain for nvidia disabling this function. but again, this stopped working with Turing, so it's been this way for like 3 years, not something new. I have mostly EVGA cards, so when I want to mess with the lighting, I just throw the card on my test bench, boot into Windows, change the LED settings there, and then put it back in the crunching rig. the settings are preserved internal to the card (for my cards) so it stays and whatever I left it as. you can probably do the same

ID: 57123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57124 - Posted: 4 Jul 2021, 18:26:11 UTC

It sure does not look like running multiple GG WUs on the same GPU has any benefit.
My 3080 is stuck in P2. I'd like to try it in P3 and P4 but I can't make it change. I tried:
nvidia-smi -lmc 9251
Memory clocks set to "(memClkMin 9501, memClkMax 9501)" for GPU 00000000:65:00.0
All done.
nvidia-smi -lgc 240,2130
GPU clocks set to "(gpuClkMin 240, gpuClkMax 2130)" for GPU 00000000:65:00.0
All done.

But it's still in P2.
ID: 57124 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57125 - Posted: 4 Jul 2021, 18:34:34 UTC - in response to Message 57123.  

Aurum wrote:
But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux.
these options still work. I use them for my 3080Ti. not sure what you mean?

this is exactly what I use for my 3080Ti (same on my Turing hosts)
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100"
it works as desired.

How do you prove to yourself they work? They don't even exist any more. Run
nvidia-settings -q all | grep -C 10 -i GPUMemoryTransferRateOffset
and you will not find either of them.
ID: 57125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57126 - Posted: 4 Jul 2021, 18:44:18 UTC

but all the slightly off-topic aside.

It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks.

I think the two priorities now should be:

1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute.

2. investigate the cause and provide a remedy for the ~30% slowdown in application performance from the older cuda100 app. this isn't just affecting Ampere, but affecting all GPUs equally it seems. maybe some optimization flag was omitted or some change to the code was made that was undesirable or unintended. just changing from cuda100 to cuda1121 should not in itself have caused this if there were no other code changes. sometimes you can see slight performance changes like 1-2%, but a 30% reduction is a sign that something is clearly wrong.
ID: 57126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57127 - Posted: 4 Jul 2021, 18:54:32 UTC - in response to Message 57125.  
Last modified: 4 Jul 2021, 18:56:04 UTC

Aurum wrote:
But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux.
these options still work. I use them for my 3080Ti. not sure what you mean?

this is exactly what I use for my 3080Ti (same on my Turing hosts)
/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[4]=500" -a "[gpu:0]/GPUGraphicsClockOffset[4]=100"
it works as desired.

How do you prove to yourself they work? They don't even exist any more. Run
nvidia-settings -q all | grep -C 10 -i GPUMemoryTransferRateOffset
and you will not find either of them.


I prove they work by opening Nvidia X Server Settings and observing that the clock speed offsets have been changed in accordance with the commands and don't give any error when running them. and they have. the commands work 100%. I see you're referencing some other command. I have no idea the function of the command you're trying to use. but my command works.

see for yourself:
https://i.imgur.com/UFHbhNt.png
ID: 57127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
888

Send message
Joined: 28 Jan 21
Posts: 6
Credit: 106,022,917
RAC: 0
Level
Cys
Scientific publications
wat
Message 57139 - Posted: 5 Jul 2021, 12:12:35 UTC

I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds.

http://www.gpugrid.net/result.php?resultid=32636087

I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers.
ID: 57139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57140 - Posted: 5 Jul 2021, 12:31:15 UTC - in response to Message 57139.  

I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds.

http://www.gpugrid.net/result.php?resultid=32636087

I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers.


How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh.
ID: 57140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57142 - Posted: 5 Jul 2021, 13:01:27 UTC - in response to Message 57126.  

Ian&Steve C wrote:

It was a great first step to getting the app working for Ampere. it's been long awaited and the new app is much appreciated and now many more cards can help contribute to the project, especially with these newer long running tasks lately. we need powerful cards to handle these tasks.

I think the two priorities now should be:

1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute.

2. investigate the cause and provide a remedy for the ~30% slowdown in application performance from the older cuda100 app. ...

and last, but not least: an app for Windows would be nice :-)
ID: 57142 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
888

Send message
Joined: 28 Jan 21
Posts: 6
Credit: 106,022,917
RAC: 0
Level
Cys
Scientific publications
wat
Message 57143 - Posted: 5 Jul 2021, 13:31:53 UTC - in response to Message 57140.  

I'm still getting the CUDA compiler permission denied error. I've added the PPA and installed libboost1.74 as above, and reset the project multiple times. But every downloaded task fails after 2 seconds.

http://www.gpugrid.net/result.php?resultid=32636087

I'm running Mint 20.1, with rtx2070 and rtx3070 cards running 465.31 drivers.


How did you install the drivers? Have you ever installed the CUDA toolkit? This was my problem. If you have a CUDA toolkit installed, remove it. I would also be safe and totally purge your nvidia drivers and re-install fresh.



Thanks for the quick reply. I had the CUDA toolkit ver 10 installed, but after seeing your previous post about you problem, I had already removed it. I'll try purging and reinstalling my nvidia drivers, thanks.
ID: 57143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57145 - Posted: 5 Jul 2021, 13:45:03 UTC - in response to Message 57143.  

did you use the included removal script to remove the toolkit? or did you manually delete some files? definitely try the removal script if you havent already. good luck!
ID: 57145 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile trigggl

Send message
Joined: 6 Mar 09
Posts: 25
Credit: 102,324,681
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 57147 - Posted: 5 Jul 2021, 14:56:33 UTC - in response to Message 57126.  

...
1. remedy the dependency on boost. either include the necessary library in the package distribution to clients, or recompile the app with boost statically linked. otherwise only those hosts who recognize the problem and know how to manually install the proper boost package will be able to contribute.
...

For those of us who are using the python app, the correct version is installed in the miniconda folder.
locate libboost_filesystem
/usr/lib64/libboost_filesystem-mt.so
/usr/lib64/libboost_filesystem.so
/usr/lib64/libboost_filesystem.so.1.76.0
/usr/lib64/cmake/boost_filesystem-1.76.0/libboost_filesystem-variant-shared.cmake
/var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so
/var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/libboost_filesystem.so.1.74.0
/var/lib/boinc/projects/www.gpugrid.net/miniconda/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake
/var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so
/var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/libboost_filesystem.so.1.74.0
/var/lib/boinc/projects/www.gpugrid.net/miniconda/pkgs/boost-cpp-1.74.0-h312852a_4/lib/cmake/boost_filesystem-1.74.0/libboost_filesystem-variant-shared.cmake

I definitely don't want to downgrade my system version to run a project. Perhaps gpugrid could include the libboost that they already supply for a different app.

Could the miniconda folder be somehow included in the app?
ID: 57147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57192 - Posted: 10 Jul 2021, 8:02:28 UTC

Richard Haselgrove sait at Message #57177:

Look at that timeout: host 528201. Oh, Mr. Kevvy, where art thou? 156 libboost errors? You can fix that...

Finally, Mr. Kevvy host #537616 processed successfully today these two tasks:
e4s113_e1s796p0f577-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND7908_0
e5s9_e3s99p0f334-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND8007_4
If it was due to your fix, congratulations Mr. Kevvy, you've found the right way.

Or perhaps it was some fix at tasks from server side?
Hard to know till there are plenty of new tasks ready to send.
Currently, 7:51:20 UTC, there are 0 tasks left ready to send, 28 tasks left in progress, as Server status page shows.
ID: 57192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57193 - Posted: 10 Jul 2021, 8:11:36 UTC - in response to Message 57192.  

I got a note back from Mr. K - he saw the errors, and was going to check his machines. I imagine he's applied Ian's workround.

Curing the world's diseases, one computer at a time. It would be better if that bug could be fixed at source, for a universal cure.
ID: 57193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57222 - Posted: 22 Jul 2021, 21:39:41 UTC

On July 3rd 2021, Ian&Steve C. wrote at Message #57087:

But it’s not just 3000-series being slow. All cards seem to be proportionally slower with 11.2 vs 10.0, by about 30%

While organizing screenshots on one of my hosts, I happened to find comparative images for tasks of old Linux APP V2.11 (CUDA 10.0) and new APP V2.12 (CUDA 11.2)

* ACEMD V2.11 tasks on 14/06/2021:


* ACEMD V2.12 task on 20/07/2021:


Pay attention to device 0, the only comparable one.
- ACEMD V2.11 task: 08:10:18 = 29418 seconds past to process 15,04%. Extrapolating, this leads to 195598 seconds of total processing time (2d 06:19:58)
- ACEMD V2.12 task: 3d 02:51:01 = 269461 seconds past to process 96,48%. Extrapolating, this leads to 279292 seconds of total processing time (3d 05:34:52)
That is, about 42,8% of excess processing time for this particular host and device 0 (GTX 1650 GPU)
ID: 57222 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 295,172
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57223 - Posted: 23 Jul 2021, 10:04:15 UTC - in response to Message 57222.  

Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task.

They are different, and not directly comparable. How much of the observed slowdown is down to the data/algorithm, and how much is down to the new application, will need further examples to unravel.
ID: 57223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57225 - Posted: 23 Jul 2021, 13:12:06 UTC - in response to Message 57223.  
Last modified: 23 Jul 2021, 13:13:05 UTC

Also bear in mind that your first screenshot shows a D3RBandit task, and your second shows a AdaptiveBandit task.

Bright observer, and sharp appointment, as always.
I agree that tasks aren't probably fully comparable, but they are the most comparable I found: Same host, same device, same ADRIA WUs family, same base credit amount granted: 450000...
Now I'm waiting for the next move, and wondering about what will it consist of: An amended V2.12 APP?, a new V2.13 APP?, a "superstitious-proof" new V2.14 APP? ... ;-)
ID: 57225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RJ The Bike Guy

Send message
Joined: 2 Apr 20
Posts: 20
Credit: 35,363,533
RAC: 0
Level
Val
Scientific publications
wat
Message 57230 - Posted: 4 Aug 2021, 2:35:51 UTC

Is GPU grid still doing anything? I haven't gotten any work in like a month or more. And before that is was just sporadic. I used to always have work units. Now, nothing.
ID: 57230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 21 Nov 16
Posts: 36
Credit: 164,429,114
RAC: 12,554
Level
Ile
Scientific publications
wat
Message 57231 - Posted: 4 Aug 2021, 7:53:39 UTC

I am not receiving Windows tasks anymore. My configuration is
Boinc 7.16.11 GenuineIntel Intel(R) Xeon(R) CPU E5620 @ 2.40GHz [Family 6 Model 44 Stepping 2](4 processors)

NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 461.40

Microsoft Windows 10 Professional x64 Edition, (10.00.19043.00)

Am I still within Spec's to get Windows acemd3 work ?

Thanks
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 57231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,722,595
RAC: 4,266,994
Level
Trp
Scientific publications
wat
Message 57232 - Posted: 4 Aug 2021, 13:43:59 UTC - in response to Message 57231.  

there hasnt been an appreciable amount of work available for over a month.
ID: 57232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 869
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57233 - Posted: 5 Aug 2021, 12:32:19 UTC - in response to Message 57232.  

there hasnt been an appreciable amount of work available for over a month.

:-( :-( :-(
ID: 57233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : News : Update acemd3 app

©2025 Universitat Pompeu Fabra