Update acemd3 app

Message boards : News : Update acemd3 app
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57102 - Posted: 3 Jul 2021, 19:29:54 UTC

Yea, snagged a WU and it's running. My guesstimate is 19:44:13 on my 3080 dialed down to 230 Watts. Record breaking long heat wave here and summer peak Time-of-Use electric rates (8.5x higher) have started. Summer is not BOINC season in The Great Basin.

Rxn time, now that makes sense. Thx.

Linux Mint repository offers 465.31 and 460.84. Is it actually worth reverting to 460.84??? I wouldn't do it until after this WU completes anyway.
ID: 57102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57103 - Posted: 3 Jul 2021, 19:56:15 UTC - in response to Message 57102.  

Linux Mint repository offers 465.31 and 460.84. Is it actually worth reverting to 460.84??? I wouldn't do it until after this WU completes anyway.


probably wont matter if the driver you have is working. i don't expect any performance difference between the two. I was just saying that I would use a more recent non-beta driver if i was updating, unless you need some feature in 465 branch specifically.
ID: 57103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57104 - Posted: 3 Jul 2021, 20:00:01 UTC

second 3080ti task completed in 11hrs

http://gpugrid.net/result.php?resultid=32632580
ID: 57104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57105 - Posted: 3 Jul 2021, 20:59:16 UTC

peak 28,9°C here today so suspend during daytime after 2 task done.
I run evening and night these days if temp i high. Ambient temp was above 35 inside and fan gone up to 80% on gpu i checked.

So i manage to go to 460.84 after a few remove and --purge nvidia*. Apparently there was a libnvidia-compute left and hold it back.

Got name correct but detect vram wrong (4095MB). Lets see if it would work.
ID: 57105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57106 - Posted: 3 Jul 2021, 21:32:29 UTC

Just take in mind that any change in Nvidia driver version while a GPUgrid task is in progress, will cause it to fail when computing is restarted.
Commented in message #56909
ID: 57106 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57107 - Posted: 3 Jul 2021, 21:58:34 UTC - in response to Message 57105.  

peak 28,9°C here today so suspend during daytime after 2 task done.
I run evening and night these days if temp i high. Ambient temp was above 35 inside and fan gone up to 80% on gpu i checked.

So i manage to go to 460.84 after a few remove and --purge nvidia*. Apparently there was a libnvidia-compute left and hold it back.

Got name correct but detect vram wrong (4095MB). Lets see if it would work.


The VRAM reported wrong is not because of the driver. It’s a problem with BOINC. BOINC uses a detection technique that is only 32-bit (4GB). This can only be fixed by fixing the code in BOINC.
ID: 57107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57108 - Posted: 3 Jul 2021, 23:18:13 UTC - in response to Message 57107.  
Last modified: 3 Jul 2021, 23:50:23 UTC

I went back to my host and driver crashed. smi unable to open and task failed on another project. Restarted it and back on track. Few minutes later it fetch new task from GPUGrid. Let's hope it does not crash again.

https://www.gpugrid.net/result.php?resultid=32634065

# Speed: average 225.91 ns/day, current 226.09 ns/day
That is more like. this much better then my 3070 and 3060Ti got.
ID: 57108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57109 - Posted: 3 Jul 2021, 23:32:42 UTC - in response to Message 57108.  

GPU detection is handled by BOINC, not any individual projects.

Driver updates always require a reboot to take effect.
ID: 57109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57110 - Posted: 4 Jul 2021, 9:09:53 UTC

Finally, my first result of a new version 2.12 task came out in my fastest card:
e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7
It took 141948 seconds of total processing time. That is: 1 day 15 hours 25 minutes and 48 seconds
Predicted time in table shown at message #57091 was 142074 seconds after 61,439% done.
There is a slight difference of 126 seconds between estimated and true execution time. 0,09% deviation.
For me, it is approximate enough, and validates Ian&Steve C. theory of progress for these tasks being quite linear along their execution.
ID: 57110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 57111 - Posted: 4 Jul 2021, 10:08:18 UTC

Compare old and new app on 2070S

old
52,930.87 New version of ACEMD v2.11 (cuda100)
WU 27069210 e130s1888_e70s25p0f44-ADRIA_D3RBandit_batch_nmax5000-0-1-RND2852_1

new
80,484.11 New version of ACEMD v2.12 (cuda1121)
WU: 27077230 e5s177_e4s56p0f117-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND4081_4

Not sure if size of units grown that much to be able compare them.
ID: 57111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57112 - Posted: 4 Jul 2021, 13:40:41 UTC - in response to Message 57102.  
Last modified: 4 Jul 2021, 13:46:54 UTC

My guesstimate is 19:44:13 on my 3080 dialed down to 230 Watts.

16:06:54
https://www.gpugrid.net/workunit.php?wuid=27077289
ID: 57112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 998,578
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57113 - Posted: 4 Jul 2021, 14:23:15 UTC

At this moment, every of my 7 currently working GPUs have any new version 2.12 task in process.
Two tasks received today completed the quota.
Task e4s120_e3s763p0f798-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9850_3, hanging from WU #27076712
Task e5s90_e4s138p0f962-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6130_4, hanging from WU #27077322
Something to remark: These two tasks are repetitive resends of previously failed tasks with the following known problem:

acemd3: error while loading shared libraries: libboost_filesystem.so.1.74.0: cannot open shared object file: No such file or directory

Chance to remember that there is a remedy for this problem, commented at message #57064 in this same thread.

One last update for estimated times to completion on my GPUs:



An editable version of the spreadsheet used can be downloaded from this link
Changes since previous version:
- Lines for two more GPUs are added.
- A new cell is added for seconds to D:H:M:S conversion
ID: 57113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57114 - Posted: 4 Jul 2021, 14:50:02 UTC - in response to Message 57081.  
Last modified: 4 Jul 2021, 14:52:34 UTC

So the number of the usable CUDA cores in the 30xx series are half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti.


Does using half of CUDA cores have implications for BOINCing?
GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine.
GG+DaggerHashimoto crashes GG instantly.
I hope to try 2xGG today.
ID: 57114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57115 - Posted: 4 Jul 2021, 15:06:42 UTC - in response to Message 57114.  

Does using half of CUDA cores have implications for BOINCing?
GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine.
GG+DaggerHashimoto crashes GG instantly.
I hope to try 2xGG today.
You can't utilize the "extra" CUDA cores by running a second task (regardless of the project).
The 30xx series improved gaming experience much more, than the crunching performance.
ID: 57115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57116 - Posted: 4 Jul 2021, 15:10:22 UTC - in response to Message 57114.  

So the number of the usable CUDA cores in the 30xx series are half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti.


Does using half of CUDA cores have implications for BOINCing?
GG+OPNG at <cpu_usage>1.0</cpu_usage> & <gpu_usage>0.5</gpu_usage> works fine.
GG+DaggerHashimoto crashes GG instantly.
I hope to try 2xGG today.


I think you misunderstand what's happening.

running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.
ID: 57116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 21 May 21
Posts: 1
Credit: 12,242,500
RAC: 0
Level
Pro
Scientific publications
wat
Message 57118 - Posted: 4 Jul 2021, 15:46:13 UTC - in response to Message 57116.  

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.

The way Ampere works is that half the cores are FP32, and the other half are either FP32 or INT32 depending on need. On Turing (and older), the INT32 half was always INT32. So you're probably right - either GPUGRID has some INT32 load that is using the cores instead, or some kind of application change is required to get it to use the other half.
ID: 57118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57119 - Posted: 4 Jul 2021, 16:30:28 UTC - in response to Message 57118.  
Last modified: 4 Jul 2021, 16:32:20 UTC

I'm not convinced that the extra cores "aren't being used" at all, ie, the cores are sitting idle 100% of the time as a direct result of the architecture or something like that. I think both the application and the hardware are fully aware of the available cores/SMs. just that the application is coded in such a way that it can't take advantage of the extra resources, either in optimization or in the number of INT instructions required.

nvidia's press notes do seem to show a 1.5x improvement in molecular modeling load for A100 vs V100, so maybe the amount of INT calls is inherent to this kind of load anyway. (granted the A100 is based on the GA100 core, which is a different architecture without the shared FP/INT cores for the doubling of FP cores like on GA102)

but in the case of GPUGRID, i think it's just their application. on folding Ampere performs much closer to the claims. a 3070 being only a bit slower than a 2080ti, which is what I would expect.
ID: 57119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57120 - Posted: 4 Jul 2021, 16:33:22 UTC - in response to Message 57115.  
Last modified: 4 Jul 2021, 16:57:55 UTC

The 30xx series improved gaming experience much more, than the crunching performance.

I'm thoroughly unimpressed by my 3080. Its performance does not scale with price making it much more expensive for doing calculations. I'll probably test it for a few more days and then sell it.

I like to use some metric that's proportional to calculations and optimize calcs/Watt. In the past my experience has been reducing max power improves performance. But since Nvidia eliminated the nvidia-settings options -a [gpu:0]/GPUGraphicsClockOffset & -a [gpu:0]/GPUMemoryTransferRateOffset that I used I haven't found a good way to do it using Linux. nvidia-settings -q all

It seems Nvidia chooses a performance level but I can't see how to force it to a desired level:
sudo DISPLAY=:0 XAUTHORITY=/var/run/lightdm/root/:0 nvidia-settings -q '[gpu:0]/GPUPerfModes'
3080: 0, 1, 2, 3 & 4
Attribute 'GPUPerfModes' (Rig-05:0[gpu:0]):
perf=0, nvclock=210, nvclockmin=210, nvclockmax=420, nvclockeditable=1, memclock=405, memclockmin=405, memclockmax=405, memclockeditable=1, memTransferRate=810, memTransferRatemin=810, memTransferRatemax=810, memTransferRateeditable=1 ;
perf=1, nvclock=210, nvclockmin=210, nvclockmax=2100, nvclockeditable=1, memclock=810, memclockmin=810, memclockmax=810, memclockeditable=1, memTransferRate=1620, memTransferRatemin=1620, memTransferRatemax=1620, memTransferRateeditable=1 ;
perf=2, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=5001, memclockmin=5001, memclockmax=5001, memclockeditable=1, memTransferRate=10002, memTransferRatemin=10002, memTransferRatemax=10002, memTransferRateeditable=1 ;
perf=3, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=9251, memclockmin=9251, memclockmax=9251, memclockeditable=1, memTransferRate=18502, memTransferRatemin=18502, memTransferRatemax=18502, memTransferRateeditable=1 ;
perf=4, nvclock=240, nvclockmin=240, nvclockmax=2130, nvclockeditable=1, memclock=9501, memclockmin=9501, memclockmax=9501, memclockeditable=1, memTransferRate=19002, memTransferRatemin=19002, memTransferRatemax=19002, memTransferRateeditable=1

Nvidia has said, "The -a and -g arguments are now deprecated in favor of -q and -i, respectively. However, the old arguments still work for this release." Sounds like they're planning to reduce or eliminate customers ability to control the products they buy.

Nvidia also eliminated GPULogoBrightness so the baby-blinkie lights never turn off.
ID: 57120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 57121 - Posted: 4 Jul 2021, 16:49:39 UTC - in response to Message 57116.  

running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.

At less than 5% complete with two WUs running simultaneously and having started within minutes of each other:
WU1: 4840 sec at 4.7% implies 102978 sec total
WU2: 5409 sec at 4.6% implies 117587 sec total
From yesterday's singleton: 2 x 58014 sec = 116028 sec total if independent.
ID: 57121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,535,595
RAC: 4,302,611
Level
Trp
Scientific publications
wat
Message 57122 - Posted: 4 Jul 2021, 17:54:31 UTC - in response to Message 57121.  

running 2x GPUGRID tasks concurrently wont make it "use more". it'll just slow both down, probably slower than half speed due to the constant resource fighting.

if GPUGRID isn't seeing the effective 2x benefit of Turing vs Ampere, that tells me one of two things (or maybe some combination of both):
1. that app isn't as FP32 heavy as some have implied, and maybe has a decent amount of INT32 instructions. the INT32 setup of Ampere is the same as Turing
2. there is some additional optimization that needs to be applied to the ACEMD3 app to better take advantage of the extra FP32 cores on Ampere.

At less than 5% complete with two WUs running simultaneously and having started within minutes of each other:
WU1: 4840 sec at 4.7% implies 102978 sec total
WU2: 5409 sec at 4.6% implies 117587 sec total
From yesterday's singleton: 2 x 58014 sec = 116028 sec total if independent.


my point exactly. showing roughly half speed, with no real benefit to running multiples. pushing your completion time to 32hours will only reduce your credit reward since you'll be bumped out of the +50% bonus for returning in 24hrs.
ID: 57122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 9 · Next

Message boards : News : Update acemd3 app

©2025 Universitat Pompeu Fabra