ACEMD 4

Message boards : News : ACEMD 4
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 12,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58418 - Posted: 4 Mar 2022, 1:53:27 UTC

My task took 9.5 hours to download, and ran for 8 minutes. Of those 8 minutes, the first several minutes showed zero load on the GPU. I assume it was unpacking the task then. So it really ran for only about 5 minutes on the GPU.
Reno, NV
Team: SETI.USA
ID: 58418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58419 - Posted: 4 Mar 2022, 2:17:12 UTC - in response to Message 58418.  
Last modified: 4 Mar 2022, 2:17:38 UTC

My task took 9.5 hours to download, and ran for 8 minutes. Of those 8 minutes, the first several minutes showed zero load on the GPU. I assume it was unpacking the task then. So it really ran for only about 5 minutes on the GPU.

That is pretty much what I got, though I did not pay attention to the load. It was much ado about nothing. When they get more and longer ones, I will try again.
ID: 58419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Azmodes

Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 58420 - Posted: 4 Mar 2022, 14:15:49 UTC
Last modified: 4 Mar 2022, 14:16:37 UTC

My hosts received WUs, but they error out after 5 minutes:

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:39:40 (150677): wrapper (7.7.26016): starting
14:39:40 (150677): wrapper (7.7.26016): starting
14:39:40 (150677): wrapper: running /bin/tar (xf x86_64-pc-linux-gnu__cuda1121.tar.bz2)
14:44:45 (150677): /bin/tar exited; CPU time 299.093084
14:44:45 (150677): wrapper: running bin/acemd (--boinc --device 0)
ERROR: /home/user/conda/conda-bld/acemd_1646158992086/work/src/mdio/amberparm.cpp line 76: Unsupported PRMTOP version!
14:44:46 (150677): bin/acemd exited; CPU time 0.205850
14:44:46 (150677): app exit status: 0x9e
14:44:46 (150677): called boinc_finish(195)

</stderr_txt>
]]>


I noticed that the only ones failing are v1.02. v2.19 ones validate (well, one did so far, the rest are still running). I'm a bit confused, so is v1 ACEMD 3 and v2 ACEMD 4? Or are both different versions of ACEMD 4?
ID: 58420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 58421 - Posted: 4 Mar 2022, 14:24:38 UTC - in response to Message 58420.  

I noticed that the only ones failing are v1.02. v2.19 ones validate (well, one did so far, the rest are still running). I'm a bit confused, so is v1 ACEMD 3 and v2 ACEMD 4? Or are both different versions of ACEMD 4?


It's confusing the way they switched to a long-winded name and stopped labeling them acemd3 (v2.19) and acemd4 (v 1.0).
ID: 58421 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58422 - Posted: 4 Mar 2022, 14:49:58 UTC - in response to Message 58420.  
Last modified: 4 Mar 2022, 15:01:39 UTC

You've also had errors like one I've just encountered: EXIT_DISK_LIMIT_EXCEEDED

Like you, my first failure was with Unsupported PRMTOP version! (task 32749480), followed by

<workunit>
<name>T1_NNPMM_1ajv_07-RAIMIS_NNPMM-0-2-RND3217</name>
<app_name>acemd4</app_name>
<version_num>102</version_num>
<rsc_fpops_est>5000000000000000.000000</rsc_fpops_est>
<rsc_fpops_bound>250000000000000000.000000</rsc_fpops_bound>
<rsc_memory_bound>4000000000.000000</rsc_memory_bound>
<rsc_disk_bound>10000000000.000000</rsc_disk_bound>

(task 32749572)

It's possible that the first error didn't clean up properly behind itself, and caused the combined project total to exceed that 10,000,000,000 byte limit - though that seems unlikely.

All hell is breaking out at GPUGrid today, with ADRIA acemd3 tasks completing in under an hour - we'll just have to wait while things get sorted out one by one, and the dust settles!

Edit - the slot directory where my next task is running contains 17,298 items, totalling 10.3 GB (10.3 GB on disk) - above the limit, although BOINC hasn't noticed yet.

Edit 2 - it has now. Task 32750112
ID: 58422 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 0
Level
Trp
Scientific publications
watwatwat
Message 58423 - Posted: 4 Mar 2022, 15:10:45 UTC

Could it have anything to do with your running Borg BOINC? Resistance is futile :-)
ID: 58423 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Azmodes

Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 58424 - Posted: 4 Mar 2022, 15:21:40 UTC

ah, so the failing tasks are actually acemd3?

I'm not sure what that disk limit error is about. All my relevant hosts are set to allow between 30-75 GB of space for BOINC. The event logs confirm this setting.

Also, unrelated and nothing new, but this site is such a supreme pain in the butt to navigate if you are running multiple hosts from the same external IP... Ridiculous.
ID: 58424 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58425 - Posted: 4 Mar 2022, 15:36:46 UTC

We currently have:

Advanced molecular dynamics simulations for GPUs v2.19 - that's acemd3
Advanced molecular dynamics simulations for GPUs v1.02 - that's acemd4

Clear as mud ??!!
ID: 58425 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58426 - Posted: 4 Mar 2022, 15:45:33 UTC - in response to Message 58424.  

I'm not sure what that disk limit error is about.

The limit in question is set at the workunit level, and applies to the amount copied to the working ('slot') directory, plus any data generated during the run. The BOINC limits are applied to the sum total of all files, in all directories, under the control of BOINC.

To prove the point, I caught task 32750377 and suspended it before it started running. Then, I shut down the BOINC client, and edited client_state.xml to double the workunit limit. It ran to completion, and was validated.

I'll do another one to check, but I can't be sitting here manually editing BOINC files every five minutes - this needs catching at source, and quickly.
ID: 58426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58427 - Posted: 4 Mar 2022, 16:10:53 UTC

And the next one - workunit 27113095, created 15:43:59 UTC - already has the fix in place. Kudos to whoever was watching our conversation here.

It's also running a lot slower - nearly two minutes for each 2.5% step - so hopefully we're starting to get some real science done.
ID: 58427 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 12,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58428 - Posted: 4 Mar 2022, 16:17:26 UTC - in response to Message 58427.  

And the next one - workunit 27113095, created 15:43:59 UTC - already has the fix in place. Kudos to whoever was watching our conversation here.

It's also running a lot slower - nearly two minutes for each 2.5% step - so hopefully we're starting to get some real science done.


Did the fix require the whole app to be re-downloaded? Please say no...
Reno, NV
Team: SETI.USA
ID: 58428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Azmodes

Send message
Joined: 7 Jan 17
Posts: 34
Credit: 1,371,429,518
RAC: 0
Level
Met
Scientific publications
watwatwat
Message 58429 - Posted: 4 Mar 2022, 16:39:30 UTC
Last modified: 4 Mar 2022, 16:52:03 UTC

Thanks for the explanation, RH. :)

Getting validating acemd4 tasks now.
ID: 58429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58430 - Posted: 4 Mar 2022, 17:01:24 UTC - in response to Message 58428.  

Did the fix require the whole app to be re-downloaded? Please say no...

No!
ID: 58430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 58431 - Posted: 4 Mar 2022, 17:08:20 UTC

i just downloaded the 2.8GB tar package and it only took about a few minutes to download at the roughly 15Mbps transfer rate.

the ~500MB model file however is going quite slow at ~200Kbps and is dragging along
ID: 58431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 259
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58432 - Posted: 4 Mar 2022, 21:06:38 UTC

Up to 3GB file. 3hr45min to get to 66% download.
ID: 58432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greger

Send message
Joined: 6 Jan 15
Posts: 76
Credit: 25,499,534,331
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 58433 - Posted: 4 Mar 2022, 21:11:39 UTC
Last modified: 4 Mar 2022, 21:15:27 UTC

3GB got downloaded in few minutes and unit looks to working fine.

Task: http://www.gpugrid.net/result.php?resultid=32751730

stderr out
Stderr output
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
17:49:50 (2974532): wrapper (7.7.26016): starting
17:49:50 (2974532): wrapper (7.7.26016): starting
17:49:50 (2974532): wrapper: running /bin/tar (xf x86_64-pc-linux-gnu__cuda1121.tar.bz2)
17:58:25 (2974532): /bin/tar exited; CPU time 509.278471
17:58:25 (2974532): wrapper: running bin/acemd (--boinc --device 0)
21:34:24 (2974532): bin/acemd exited; CPU time 12867.200624
21:34:24 (2974532): called boinc_finish(0)

</stderr_txt>
]]>


run.log

#
# ACEMD version 4.0.0rc6
#
# Copyright (C) 2017-2022 Acellera (www.acellera.com)
#
# When publishing, please cite:
#   ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale
#   M. J. Harvey, G. Giupponi and G. De Fabritiis,
#   J Chem. Theory. Comput. 2009 5(6), pp1632-1639
#   DOI: 10.1021/ct9000685
#
# Arguments:
#   input: input
#   platform: 
#   device: 0
#   ncpus: 
#   precision: mixed
#
# ACEMD is running in Boinc mode!
#
# Read input file: input
# Parse input file
$
$# Forcefield configuration
$
$             parmFile structure.prmtop
$              nnpFile model.json
$
$# Initial State
$
$          coordinates structure.pdb
$       binCoordinates input.coor
$        binVelocities input.vel
$       extendedSystem input.xsc
$#          temperature 298.15 # Explicit velocity field provided
$
$# Output
$
$     trajectoryPeriod 25000
$       trajectoryFile output.xtc
$
$# Electrostatics 
$
$                  PME on
$               cutoff 9.00 # A
$            switching on
$       switchDistance 7.50 # A
$      implicitSolvent off
$
$# Temperature Control 
$
$           thermostat on
$ thermostatTemperature 310.00 # K
$    thermostatDamping 0.10 # /ps
$
$# Pressure Control 
$
$             barostat off
$     barostatPressure 1.0000 # bar
$  barostatAnisotropic off
$   barostatConstRatio off
$      barostatConstXY off
$
$# Integration
$
$             timeStep 2.00 # fs
$           slowPeriod 1
$
$# External forces
$
$
$# Restraints
$
$
$# Run Configuration
$
$              restart off
$                  run 500000
# Parse force field and topology files
#   Force field: AMBER
#   PRMTOP file: structure.prmtop
#
# Force field parameters
#   Number of atom parameters: 12
#   Number of bond parameters: 14
#   Number of angle parameters: 22
#   Number of dihedral parameters: 20
#   Number of improper parameters: 0
#   Number of CMAP parameters: 0
#
# System topology
#   Number of atoms: 5058
#   Number of bonds: 5062
#   Number of angles: 136
#   Number of dihedrals: 240
#   Number of impropers: 0
#   Number of CMAPs: 0
#
# Initializing engine
#   Version: 7.7
#   Plugin directory: /var/lib/boinc-client/slots/3/lib/acemd3
#   Loaded plugins
#     CPU
#     PME
#     CUDA
#     CudaCompiler
# WARNING: there is no library for "OpenCL" plugin
#     PlumedCUDA
# WARNING: there is no library for "PlumedOpenCL" plugin
#     PlumedReference
#     TorchReference
#     TorchCUDA
# WARNING: there is no library for "TorchOpenCL" plugin
#   Available platforms
#     CPU
#     CUDA
#
# Bonded interactions
#   Harmonic bond interactions
#     Number of terms: 5062
#   Harmonic angle interactions
#     Number of terms: 136
#   Urey-Bradley interactions
#     Number of terms: 0
#     Number of skipped terms (zero force constant): 136
#     NOTE: Urey-Bradley interations skipped
#   Proper dihedral interations
#     Number of terms: 224
#     Number of skipped terms (zero force constants): 16
#   Improper dihedral interations
#     Number of terms: 0
#     NOTE: improper dihedral interations skipped
#   CMAP interactions
#     Number of terms: 0
#     NOTE: CMAP interations skipped
#
# Non-bonded interactions
#   Number of exclusions: 5391
#   Lennard-Jones terms
#     Cutoff distance: 9.000 A
#     Switching distance: 7.500 A
#   Coulombic (PME) term
#     Ewald tolerance: 0.000500
#   No NBFIX
#   No implicit solvent
#
# NNP
#   Configuration file: model.json
#   Model type: torch
#   Model file: model.nnp
#   Number of atoms: 75
#
# Constraining hydrogen (X-H) bonds
#   Number of constrained bonds: 3356
#   Making water molecules rigid
#     Number of water molecules: 1661
# Number of constraints: 5017
#
# Reading box sizes from input.xsc
#
# Creating simulation system
#   Number of particles: 5058
#   Number of degrees of freedom 10154
#   Periodic box size: 37.314 37.226 37.280 A
#
# Integrator
#   Type: velocity Verlet
#   Step size: 2.00 fs
#   Constraint tolerance: 1.0e-06
#
# Thermostat
#   Type: Langevin
#   Target temperature: 310.00 K
#   Friction coefficient: 0.10 ps^-1
#
# Setting up platform: CUDA
# Interactions: 1 2 4 7 14 12
# Platform properties:
#   DeviceIndex: 0
#   DeviceName: NVIDIA GeForce RTX 3070
#   UseBlockingSync: false
#   Precision: mixed
#   UseCpuPme: false
#   CudaCompiler: /usr/local/cuda/bin/nvcc
#   TempDirectory: /tmp
#   CudaHostCompiler: 
#   DisablePmeStream: false
#   DeterministicForces: false
#
# Set initial positions from an input file
#
# Initial velocities
#   File: input.vel
#
# Optimize platform for MD
#   Number of constraints: 5017
#   Harmonic bond interations
#     Initial number of terms: 5062
#     Optimized number of terms: 45
#   Remaining interactions: 2 4 7 14 12 1
#
# Running simulation
#   Current step: 0
#   Number of steps: 500000
#
# Trajectory output
#   Positions: output.xtc
#   Period: 25000
#   Wrapping: off
#
# Log, trajectory, and restart files are written every 50.000 ps (25000 steps)
# Step       Time         Bond         Angle        Urey-Bradley Dihedral     Improper     CMAP         Non-bonded   Implicit     External     Potential    Kinetic      Total        Temperature  Volume      
#            [ps]         [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [kcal/mol]   [K]          [A^3]       
       25000        50.00       7.8698      22.5692       0.0000      62.2167       0.0000       0.0000  -15734.1899       0.0000 -1379454.7644 -1395096.2985    3140.7230 -1391955.5755      311.303     51783.76
# Speed: average    6.63 ns/day, current    6.63 ns/day
# Progress: 5.0, remaining time: 3:26:16, ETA: Fri Mar  4 21:35:50 2022
       50000       100.00       6.7940      26.1562       0.0000      56.6657       0.0000       0.0000  -15592.6529       0.0000 -1379450.5644 -1394953.6014    3101.0705 -1391852.5309      307.372     51783.76
# Speed: average    6.66 ns/day, current    6.68 ns/day
# Progress: 10.0, remaining time: 3:14:43, ETA: Fri Mar  4 21:35:03 2022
       75000       150.00       4.6298      22.0773       0.0000      58.5973       0.0000       0.0000  -15798.4139       0.0000 -1379459.1078 -1395172.2173    3143.0430 -1392029.1743      311.533     51783.76
# Speed: average    6.66 ns/day, current    6.68 ns/day
# Progress: 15.0, remaining time: 3:03:42, ETA: Fri Mar  4 21:34:50 2022
      100000       200.00       7.8170      26.8665       0.0000      59.5203       0.0000       0.0000  -15618.1979       0.0000 -1379453.2405 -1394977.2346    3138.9295 -1391838.3051      311.125     51783.76
# Speed: average    6.67 ns/day, current    6.68 ns/day
# Progress: 20.0, remaining time: 2:52:48, ETA: Fri Mar  4 21:34:42 2022
      125000       250.00       8.7645      24.8827       0.0000      59.5112       0.0000       0.0000  -15731.4732       0.0000 -1379450.9954 -1395089.3103    3081.5431 -1392007.7672      305.437     51783.76
# Speed: average    6.67 ns/day, current    6.68 ns/day
# Progress: 25.0, remaining time: 2:41:56, ETA: Fri Mar  4 21:34:37 2022
ID: 58433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 58434 - Posted: 4 Mar 2022, 21:28:32 UTC

is it really necessary to spend 4-5 mins every task to extract the same 3GB file? seems unnecessary. if it's not downloading a new file every time, then why extract the same file over and over? why not just leave it extracted?
ID: 58434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,987,341,558
RAC: 259
Level
Tyr
Scientific publications
watwatwatwatwat
Message 58436 - Posted: 4 Mar 2022, 22:53:25 UTC - in response to Message 58432.  

Up to 3GB file. 3hr45min to get to 66% download.


5hr20min to download
10+ min to start processing and already at 50% complete
Task completed in just under 12 min. Less than 2 minutes of processing on a 3070Ti at around 55% load.
ID: 58436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 12,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58437 - Posted: 5 Mar 2022, 1:52:50 UTC

Since the original task, I have received several more, with no giant file to download again. Then I got a task for the same machine, and it is downloading another beast of a file, veeeery slowly. 4.5 hours so far, and only 29% complete. BOINCtasks says it 9.71KBps. Ouch.

Reno, NV
Team: SETI.USA
ID: 58437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,496,860,456
RAC: 12,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58438 - Posted: 5 Mar 2022, 4:26:09 UTC - in response to Message 58437.  

Since the original task, I have received several more, with no giant file to download again. Then I got a task for the same machine, and it is downloading another beast of a file, veeeery slowly. 4.5 hours so far, and only 29% complete. BOINCtasks says it 9.71KBps. Ouch.


A Follow-up: The d/l seems to stall eventually. But going into Boinc and turning off/on networking seems to restart the d/l at a reasonable pace, and be done in a matter of minutes in stead of hours.

This project need to get their networking in order.

Reno, NV
Team: SETI.USA
ID: 58438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : News : ACEMD 4

©2025 Universitat Pompeu Fabra