Update acemd3 app

Message boards : News : Update acemd3 app
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57062 - Posted: 2 Jul 2021, 12:54:25 UTC - in response to Message 57060.  
Last modified: 2 Jul 2021, 13:09:11 UTC

Toni, I think the first thing that needs to be fixed is the problem with boost 1.74 library not being included in the app distribution. the app is failing right away because it's not there. you either need to distribute the .so file or statically link it into the acemd3 app so it's not needed separately.

manually installing it seems to be a workaround, but that's a tall order to make every Linux user have to perform.
ID: 57062 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57063 - Posted: 2 Jul 2021, 14:49:02 UTC

after manually installing the required boost to get past that error, I now get this error on my 3080 Ti system:

09:55:10 (4806): wrapper (7.7.26016): starting
09:55:10 (4806): wrapper (7.7.26016): starting
09:55:10 (4806): wrapper: running acemd3 (--boinc input --device 0)
ACEMD failed:
Error launching CUDA compiler: 32512
sh: 1: : Permission denied


09:55:11 (4806): acemd3 exited; CPU time 0.479062
09:55:11 (4806): app exit status: 0x1
09:55:11 (4806): called boinc_finish(195)


Task: https://www.gpugrid.net/result.php?resultid=32632410

I tried purging and reinstalling the nvidia drivers, but no change.

it looks like this same error popped up when you first released acemd3 2 years ago: http://www.gpugrid.net/forum_thread.php?id=4935#51970

biodoc wrote:
Multiple failures of this task on both windows and linux

http://www.gpugrid.net/workunit.php?wuid=16517304

<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
15:19:27 (30109): wrapper (7.7.26016): starting
15:19:27 (30109): wrapper (7.7.26016): starting
15:19:27 (30109): wrapper: running acemd3 (--boinc input --device 0)
# Engine failed: Error launching CUDA compiler: 32512
sh: 1: : Permission denied

15:19:28 (30109): acemd3 exited; CPU time 0.186092
15:19:28 (30109): app exit status: 0x1
15:19:28 (30109): called boinc_finish(195)

</stderr_txt>


Why is the app launching CUDA compiler?


you then updated the app which fixed the problem at that time, but you didnt post exactly what was changed: http://www.gpugrid.net/forum_thread.php?id=4935&nowrap=true#52022

Toni wrote:
It was a cryptic bug in the order loading shared libraries, or something like that. Otherwise unexplainably system-dependent.

I see VERY few failures now. The new app will be a huge step forward on several aspects, not least maintainability. We'll be transitioning gradually.


so whatever kind of change you made between v2.02 and v2.03 seems to be what needs fixing again.
ID: 57063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,102,898
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57064 - Posted: 2 Jul 2021, 15:26:32 UTC

I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems.

Thank you so much.
Those efforts are for noble reasons.

Regarding persistent errors:
I also manually installed boost as a try at one of my Ubuntu 20.04 hosts, by means of the following commands:

sudo add-apt-repository ppa:mhier/libboost-latest
sudo apt-get update
sudo apt-get install boost1.74
reboot

But a new task downloaded after that still failed:
e3s644_e1s419p0f770-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9285_4
Then, I've reset GPUGrid project, and it seems that it did the trick.
A new task is currently running on this host, instead of failing after a few seconds past:
e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7
49 minutes, 1,919% progress by now.
ID: 57064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57065 - Posted: 2 Jul 2021, 15:33:05 UTC - in response to Message 57064.  

I deployed the new app, which now requires cuda 11.2 and hopefully support all the latest cards. Touching the cuda versions is always a nightmare in boinc scheduler so expect problems.

Thank you so much.
Those efforts are for noble reasons.

Regarding persistent errors:
I also manually installed boost as a try at one of my Ubuntu 20.04 hosts, by means of the following commands:

sudo add-apt-repository ppa:mhier/libboost-latest
sudo apt-get update
sudo apt-get install boost1.74
reboot

But a new task downloaded after that still failed:
e3s644_e1s419p0f770-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND9285_4
Then, I've reset GPUGrid project, and it seems that it did the trick.
A new task is currently running on this host, instead of failing after a few seconds past:
e4s126_e3s248p0f238-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-0-2-RND6347_7
49 minutes, 1,919% progress by now.


Thanks, I'll try a project reset. though I had already done a project reset after the new app was announced. I guess it can't hurt.

ID: 57065 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57066 - Posted: 2 Jul 2021, 15:45:20 UTC - in response to Message 57065.  

nope, even after the project reset, still the same error

process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
11:42:55 (5665): wrapper (7.7.26016): starting
11:42:55 (5665): wrapper (7.7.26016): starting
11:42:55 (5665): wrapper: running acemd3 (--boinc input --device 0)
ACEMD failed:
Error launching CUDA compiler: 32512
sh: 1: : Permission denied

11:42:56 (5665): acemd3 exited; CPU time 0.429069
11:42:56 (5665): app exit status: 0x1
11:42:56 (5665): called boinc_finish(195)


https://www.gpugrid.net/result.php?resultid=32632487
ID: 57066 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57067 - Posted: 2 Jul 2021, 16:00:16 UTC - in response to Message 57064.  

sudo add-apt-repository ppa:mhier/libboost-latest
sudo apt-get update
sudo apt-get install libboost1.74
reboot


small correction here. it's "libboost1.74", not just "boost1.74"
ID: 57067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,102,898
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57068 - Posted: 2 Jul 2021, 16:27:57 UTC - in response to Message 57066.  

Maybe that your problem is an Ampere-specific one (?).

I've catched a new task in another of my hosts after applying the same remedy, and it is also running as expected.
e3s263_e1s419p0f938-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6959_2
25 minutes, 0,560% progress by now for this second task.
Turing GPUs and Nvidia drivers 465.31 on both hosts.
Installing libboost1.74 didn't worked for me by itself.
Resetting project didn't worked for me by itself.
Installing libboost1.74 and resetting project afterfards, did work for both my hosts.
I've doublechecked, and the commands that I employed were the previously published at message #57064
Watching Synaptic, this lead to libboost1.74 and libboost1.74-dev were correctly installed.
ID: 57068 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57069 - Posted: 2 Jul 2021, 16:37:10 UTC - in response to Message 57068.  
Last modified: 2 Jul 2021, 16:41:30 UTC

Maybe that your problem is an Ampere-specific one (?).

I've catched a new task in another of my hosts after applying the same remedy, and it is also running as expected.
e3s263_e1s419p0f938-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6959_2
25 minutes, 0,560% progress by now for this second task.
Turing GPUs and Nvidia drivers 465.31 on both hosts.
Installing libboost1.74 didn't worked for me by itself.
Resetting project didn't worked for me by itself.
Installing libboost1.74 and resetting project afterfards, did work for both my hosts.
I've doublechecked, and the commands that I employed were the previously published at message #57064
Watching Synaptic, this lead to libboost1.74 and libboost1.74-dev were correctly installed.


I had this thought. I put in my old 2080ti to the problem-host, and will see if it starts processing, or if it's really a problem with the host-specific configuration. this isn't the first time this has happened though. and Toni previously fixed it with an app update. so it looks like that will be needed again even if it's Ampere-specifc.

I think the difference in install commands comes down to the use of apt vs. apt-get. although apt-get still works, transitioning to just apt will be better in the long term. Difference between apt and apt-get
ID: 57069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57070 - Posted: 2 Jul 2021, 17:15:35 UTC - in response to Message 57069.  
Last modified: 2 Jul 2021, 17:22:47 UTC

Maybe that your problem is an Ampere-specific one (?).

I've catched a new task in another of my hosts after applying the same remedy, and it is also running as expected.
e3s263_e1s419p0f938-ADRIA_New_KIXcMyb_HIP_AdaptiveBandit-1-2-RND6959_2
25 minutes, 0,560% progress by now for this second task.
Turing GPUs and Nvidia drivers 465.31 on both hosts.
Installing libboost1.74 didn't worked for me by itself.
Resetting project didn't worked for me by itself.
Installing libboost1.74 and resetting project afterfards, did work for both my hosts.
I've doublechecked, and the commands that I employed were the previously published at message #57064
Watching Synaptic, this lead to libboost1.74 and libboost1.74-dev were correctly installed.


I had this thought. I put in my old 2080ti to the problem-host, and will see if it starts processing, or if it's really a problem with the host-specific configuration. this isn't the first time this has happened though. and Toni previously fixed it with an app update. so it looks like that will be needed again even if it's Ampere-specifc.


well, it seems it's not Ampere specific. it failed in the same way on my 2080ti here: https://www.gpugrid.net/result.php?resultid=32632521

still the CUDA compiler error

unfortunately I can't easily move the 3080ti to another system since it's a watercooled model that requires a custom water loop.
ID: 57070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57071 - Posted: 2 Jul 2021, 18:08:24 UTC

I just used the ppa method on my other two hosts. But I did not reboot.
Picked up another task and it is running.
Waiting still on the luck of the draw for the other host without work.
ID: 57071 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57072 - Posted: 2 Jul 2021, 18:19:53 UTC - in response to Message 57070.  


well, it seems it's not Ampere specific. it failed in the same way on my 2080ti here: https://www.gpugrid.net/result.php?resultid=32632521

still the CUDA compiler error

unfortunately I can't easily move the 3080ti to another system since it's a watercooled model that requires a custom water loop.


I think I finally solved the issue! it's running on the 3080ti finally!

first I removed the manual installation of boost. and installed the PPA version. I don't think this was the issue though.

while poking around in my OS installs, I discovered that I had the CUDA 11.1 toolkit installed (likely from my previous attempts at building some apps to run on Ampere). I removed this old toolkit, cleaned up any files, rebooted, reset the project and waited for a task to show up.

so now it's running finally. now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now. but with the caveat that ever since I've had this card, I've noticed some weird power limiting behavior. I'm waiting on an RMA now for a new card, and I'm hoping it can really stretch it's legs, plan to still power limit it to about 320W though.

ID: 57072 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,102,898
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57073 - Posted: 2 Jul 2021, 18:35:05 UTC - in response to Message 57072.  
Last modified: 2 Jul 2021, 18:36:13 UTC

Congratulations!
Good news...Anxious to see the performance on a 3080 Ti
ID: 57073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vismed

Send message
Joined: 19 Nov 17
Posts: 1
Credit: 46,790,085
RAC: 0
Level
Val
Scientific publications
wat
Message 57074 - Posted: 2 Jul 2021, 18:54:38 UTC - in response to Message 57041.  

Well, it will be your problem, not mine. Even having decent hard- and software I am pretty astonished how folks like cosmology and the likes seemingly do not understand how VM and the like works. I am pretty pissed as an amateur, though...
ID: 57074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57075 - Posted: 2 Jul 2021, 18:59:40 UTC

Just seen my first failures with libboost errors on Linux Mint 20.1, driver 460.80, GTX 1660 super.

Applied the PPA and reset the project - waiting on the next tasks now.
ID: 57075 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57076 - Posted: 2 Jul 2021, 19:02:55 UTC - in response to Message 57074.  

Well, it will be your problem, not mine. Even having decent hard- and software I am pretty astonished how folks like cosmology and the likes seemingly do not understand how VM and the like works. I am pretty pissed as an amateur, though...


what problem are you having specifically?

this project has nothing to do with cosmology, and this project does not use VMs.
ID: 57076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57077 - Posted: 2 Jul 2021, 22:47:18 UTC - in response to Message 57072.  

I think I finally solved the issue! it's running on the 3080ti finally!

now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now.

This is the moment of truth we're all waiting for.
My bet is 9h 15m.
ID: 57077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57078 - Posted: 3 Jul 2021, 1:22:16 UTC - in response to Message 57077.  

I think I finally solved the issue! it's running on the 3080ti finally!

now to see how long it'll take a 3080ti ;). it has over 10,000 CUDA cores so I'm hoping for a fast time. 2080ti runs about 12hrs, so it'll be interesting to see how fast I can knock it out. using about 310 watts right now.

This is the moment of truth we're all waiting for.
My bet is 9h 15m.


I’m not sure it’ll be so simple.

When I checked earlier, it was tracking a 12.5hr completion time. But the 2080ti was tracking a 14.5hr completion time.

Either the new run of tasks are longer, or the CUDA 11.2 app is slower? We’ll have to see.
ID: 57078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57079 - Posted: 3 Jul 2021, 2:11:48 UTC - in response to Message 57078.  

I'm curious how you have a real estimated time remaining calculated for a brand new application.

AFAIK, you JUST got the application working and I don't believe you have validated ten tasks yet to get an accurate APR which produces the accurate estimated time remaining numbers.

All my tasks are in EDF mode and multiple day estimates simply because I have returned exactly one valid task so far. A shorty Cryptic-Scout task.
ID: 57079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 57080 - Posted: 3 Jul 2021, 3:18:39 UTC - in response to Message 57079.  

I didn’t use the time remaining estimate from BOINC. I estimated it myself based on % complete and elapsed time, assuming a linear completion rate.
ID: 57080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57081 - Posted: 3 Jul 2021, 8:08:18 UTC - in response to Message 57078.  
Last modified: 3 Jul 2021, 8:15:46 UTC

When I checked earlier, it was tracking a 12.5hr completion time. But the 2080ti was tracking a 14.5hr completion time.

Either the new run of tasks are longer, or the CUDA 11.2 app is slower? We’ll have to see.
If the new tasks are longer, the awarded credit should be higher. The present ADRIA_New_KIXcMyb_HIP_AdaptiveBandit workunits "worth" 675.000 credits, while the previous ADRIA_D3RBandit_batch_nmax5000 "worth" 523.125 credits, so the present ones are longer.
My estimation was 12h/1.3=9h15m (based on my optimistic 30% performance improvement expectation).
Nevertheless we can use the completion times to estimate the actual performance improvement (3080Ti vs 2080Ti): The 3080 Ti completed the task in 44368s (12h 19m 28s) the 2080Ti completed the task in 52642s (14h 37m 22s), so the 3080Ti is "only" 18.65% faster. So the number of the usable CUDA cores in the 30xx series are the half of the advertised number (just as I expected), as 10240/2=5120, 5120/4352=1.1765 (so the 3080Ti has 17.65% more CUDA cores than the 2080Ti has), the CUDA cores of the 3080Ti are 1.4% faster than of the 2080Ti.
ID: 57081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : News : Update acemd3 app

©2025 Universitat Pompeu Fabra