ACEMD updated app

Message boards : News : ACEMD updated app
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 59768 - Posted: 19 Jan 2023, 16:10:32 UTC - in response to Message 59767.  

Just had one and it failed after 26 seconds on my 4090


are the Python tasks working on your 4090? or were those run on a different GPU?
ID: 59768 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 25,456
Level
Met
Scientific publications
wat
Message 59769 - Posted: 19 Jan 2023, 16:57:17 UTC - in response to Message 59768.  

Python run fine on my 4090, though they don't do much at all, all the work seems to be on the CPU.
ID: 59769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 59770 - Posted: 19 Jan 2023, 17:19:30 UTC - in response to Message 59769.  

Python run fine on my 4090, though they don't do much at all, all the work seems to be on the CPU.


Thanks.

could you please report your failed task? click update on BOINC for GPUGRID to send back the result. I'd like to see the nature of the failure, to see if the architecture check is the reason for failure.
ID: 59770 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,340,042,080
RAC: 25,456
Level
Met
Scientific publications
wat
Message 59772 - Posted: 19 Jan 2023, 18:16:15 UTC - in response to Message 59770.  

Done :)
ID: 59772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59774 - Posted: 19 Jan 2023, 19:37:49 UTC

Looks like the application does not understand the 4090 architecture. Needs to be recompiled with the gencodes that Ian pointed out.

ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
ID: 59774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59776 - Posted: 20 Jan 2023, 1:13:30 UTC - in response to Message 59766.  
Last modified: 20 Jan 2023, 1:22:56 UTC


it could very well be that the CPU is too old. it does not support AVX extensions for example, and if the application is built with this requirement then that could be a reason.

perhaps one of the GPUGRID people could tell me if this is the case?


maybe you can tell, (if you can run an ACEMD3 app on another host that is AVX enabled) by setting the AVX offset in the bios of a capable host and then checking to see if the processor speed corresponds while running the wrapper (with no other WU).
ID: 59776 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59777 - Posted: 20 Jan 2023, 1:48:29 UTC - in response to Message 59760.  
Last modified: 20 Jan 2023, 1:59:17 UTC

...
But probably the reason is 4GB RAM being too short for executing these tasks.

that's what I am guessing, too.
However, I was closely watching the RAM usage (via MemInfo) when the tasks started: at the moment the task crashed, about 2 GB were still free.
Further, for the tasks running on the other hosts mentioned above, the Windows tasks manager shows a RAM usage between 60MB and 400MB per task.
Maybe the CPU Intel Core2 Duo E7400 @ 2.80GHz is too old for these tasks?
(However, some other GPU projects like Einstein, WCG and Primegrid are running well).


interesting, larrywhitehead's 1060 3GB also does not seem to want to do these tasks

https://www.gpugrid.net/results.php?hostid=493191

only a vague siderr message

onl(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
23:38:59 (9616): wrapper (7.9.26016): starting
23:38:59 (9616): wrapper: running bin/acemd3.exe (--boinc --device 0)
23:39:01 (9616): bin/acemd3.exe exited; CPU time 0.000000
23:39:01 (9616): app exit status: 0xc0000135
23:39:01 (9616): called boinc_finish(195)y this in the siderr


Yet I only observe a little over 2GB graphics memory being utilized max so far on my hosts.
ID: 59777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 59778 - Posted: 20 Jan 2023, 5:26:53 UTC - in response to Message 59774.  

Looks like the application does not understand the 4090 architecture. Needs to be recompiled with the gencodes that Ian pointed out.

ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

That’s exactly what I thought would happen. I had the same experience with some other people trying to run the Einstein CUDA BRP7 app. Didn’t work on 11.7 but did work once I compiled it for 11.8 with gencode defined for CC 8.9
ID: 59778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
catavalon21

Send message
Joined: 1 Feb 09
Posts: 4
Credit: 427,753,832
RAC: 1,686
Level
Gln
Scientific publications
watwatwatwatwat
Message 60026 - Posted: 6 Mar 2023, 21:57:50 UTC

Is ACEMD3 not yet supporting the NV 4k architecture on W10? This is a 4070 Ti with the CUDA 1121 app.

ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
ID: 60026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60027 - Posted: 6 Mar 2023, 22:15:24 UTC - in response to Message 60026.  

Is ACEMD3 not yet supporting the NV 4k architecture on W10? This is a 4070 Ti with the CUDA 1121 app.

ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)


That’s correct. The current CUDA 11.21 app does not support Ada 4000 series.
ID: 60027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
catavalon21

Send message
Joined: 1 Feb 09
Posts: 4
Credit: 427,753,832
RAC: 1,686
Level
Gln
Scientific publications
watwatwatwatwat
Message 60033 - Posted: 8 Mar 2023, 1:15:52 UTC - in response to Message 60027.  

Is ACEMD3 not yet supporting the NV 4k architecture on W10? This is a 4070 Ti with the CUDA 1121 app.

ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)


That’s correct. The current CUDA 11.21 app does not support Ada 4000 series.


Thanks for confirming.
ID: 60033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
oemuser

Send message
Joined: 18 Sep 16
Posts: 10
Credit: 1,291,979
RAC: 0
Level
Ala
Scientific publications
wat
Message 60109 - Posted: 17 Mar 2023, 14:20:18 UTC

I got ACEMD 3 task for my gtx 1080ti on Windows (2oiq-ADRIA_KDMD_1k_test_3809-0-1-RND9959).
GPU stays at very low clock speed 750Mhz and VRAM 800Mhz. I expected 2x CPU clock and 7x VRAM clock. Or would it not have any advantage?

ID: 60109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60898 - Posted: 20 Dec 2023, 21:25:27 UTC
Last modified: 20 Dec 2023, 21:40:12 UTC

I see that a new acemd3 app was published yesterday for the Linux hosts in an attempt to fix the expired Acellera licensing issue.

Unfortunately, the app is still not working and any new work is still failing, this time with more information, problem with the python packaging of the job files.

https://www.gpugrid.net/result.php?resultid=33722983

Looks like they've moved away from a standalone acemd3 binary which is what was used in the past work.

Looks like they tried to just use the Windows code and of course failed with trying to use a Windows only msvcrt Python function.
ID: 60898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,102,898
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60904 - Posted: 26 Dec 2023, 10:22:18 UTC - in response to Message 60898.  
Last modified: 26 Dec 2023, 10:26:09 UTC

Looks like they tried to just use the Windows code and of course failed with trying to use a Windows only msvcrt Python function.

It seems that You're right.
And currently still pending to address for Linux hosts:

Nombre 0_0-CRYPTICSCOUT_pocket_discovery_c82914d2_15b4_4300_b4db_cb72998e09bf-6-7-RND0445_6
Unidad de trabajo 27641639
Creado 26 Dec 2023 | 9:50:25 UTC
Enviado 26 Dec 2023 | 9:50:26 UTC
Recibir 26 Dec 2023 | 9:57:14 UTC
Estado del servidor Over
Resultado Error de ejecución
Estado del cliente Error de ejecución
Exit status 195 (0xc3) EXIT_CHILD_FAILED
ID del ordenador 186626
Límite de tiempo para informar 31 Dec 2023 | 9:50:26 UTC
Tiempo de ejecución 23.07
Tiempo de CPU 0.00
Estado de validación Inválido
Crédito 0.00
Versión de la aplicación ACEMD 3: molecular dynamics simulations for GPUs v2.21 (cuda1121)

Stderr output

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
09:55:04 (339849): wrapper (7.7.26016): starting
09:55:25 (339849): wrapper (7.7.26016): starting
09:55:25 (339849): wrapper: running bin/acemd (--boinc --device 0)
Traceback (most recent call last):
File "/usr/lib/python3.10/subprocess.py", line 69, in <module>
import msvcrt
ModuleNotFoundError: No module named 'msvcrt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "runtime.py", line 8, in init runtime
File "/usr/lib/python3.10/platform.py", line 119, in <module>
import subprocess
File "/usr/lib/python3.10/subprocess.py", line 74, in <module>
import _posixsubprocess
ModuleNotFoundError: No module named '_posixsubprocess'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3.10/subprocess.py", line 69, in <module>
import msvcrt
ModuleNotFoundError: No module named 'msvcrt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 72, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 12, in <module>
import subprocess, tempfile, os.path, re, pwd, grp, os, io
File "/usr/lib/python3.10/subprocess.py", line 74, in <module>
import _posixsubprocess
ModuleNotFoundError: No module named '_posixsubprocess'

Original exception was:
Traceback (most recent call last):
File "/usr/lib/python3.10/subprocess.py", line 69, in <module>
import msvcrt
ModuleNotFoundError: No module named 'msvcrt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "runtime.py", line 8, in init runtime
File "/usr/lib/python3.10/platform.py", line 119, in <module>
import subprocess
File "/usr/lib/python3.10/subprocess.py", line 74, in <module>
import _posixsubprocess
ModuleNotFoundError: No module named '_posixsubprocess'
Python error
09:55:26 (339849): bin/acemd exited; CPU time 0.032149
09:55:26 (339849): app exit status: 0x1
09:55:26 (339849): called boinc_finish(195)

</stderr_txt>
]]>

No hope for a solution in short term, since usually Universities get frozen in Christmas time...
Merry Xmas
ID: 60904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60905 - Posted: 27 Dec 2023, 0:53:51 UTC

I'm waiting till after New Years before bugging Gianni again with the request to fix the acemd3 app properly.
ID: 60905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60922 - Posted: 3 Jan 2024, 17:01:02 UTC - in response to Message 60905.  

I'm waiting till after New Years before bugging Gianni again with the request to fix the acemd3 app properly.

my Windows10 PCs were successfully crunching ACEMD 3 until this morning.

Within the past hour, some more ACEMD 3 tasks were downloaded and failed after about 1 minute.
See here: http://www.gpugrid.net/result.php?resultid=33725238
ID: 60922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60923 - Posted: 3 Jan 2024, 17:50:23 UTC
Last modified: 3 Jan 2024, 17:51:50 UTC

I'm shocked to discover that this morning I have a acemd3 task running for 50 minutes so far.

All previous tasks insta-failed on the missing license issue and then when the app got updated in December for a missing Windows file.

All my hosts are Linux based and no Windows has ever been installed.

The slot that has the running task in it has all the normal and usual files in it along with checkpoint files that made running acemd3 tasks so wonderful because they could be stopped and started without failing.

Wish the other tasks at GPUGrid had that same capability.

I must assume that the app got updated again and now works. And after looking at the apps list, I see that that is the case. New app released today for acemd3.

Thank you Gianni!!
ID: 60923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60924 - Posted: 3 Jan 2024, 17:54:54 UTC

But that is only one task out of about 20 so far today that is being successfully run. All the rest are ATMbeta and have failed due to bad configuration file inputs.
ID: 60924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60925 - Posted: 3 Jan 2024, 18:21:35 UTC
Last modified: 3 Jan 2024, 18:22:00 UTC

New Linux acemd3 app has an expiration date 3649 days into the future. Should not be an issue for years now.

#
# ACEMD version 3.7.3
#
# Copyright (C) 2017-2024 Acellera (www.acellera.com)
#
# By using ACEMD, you accept the terms and conditions of the ACEMD licence
# Check the licence by running "acemd --licence"
# More details: https://software.acellera.com/acemd/licence.html
#
# When publishing, please cite:
# ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale
# M. J. Harvey, G. Giupponi and G. De Fabritiis,
# J Chem. Theory. Comput. 2009 5(6), pp1632-1639
# DOI: 10.1021/ct9000685
#
# Arguments:
# input: input
# platform:
# device: 2
# ncpus:
# precision: mixed
#
# ACEMD is running in Boinc mode!
#
# WARNING: This ACEMD version expires in 3649 days!
ID: 60925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 960
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60926 - Posted: 3 Jan 2024, 21:04:36 UTC - in response to Message 60925.  

New Linux acemd3 app has an expiration date 3649 days into the future. Should not be an issue for years now.

good news for the Linux crunchers.
However, it would be great it they did the same for the Windows version, and until this will be done, they should stop sending out Windows tasks which keep failing within a minute.
ID: 60926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : ACEMD updated app

©2025 Universitat Pompeu Fabra