Message boards :
News :
ACEMD 4
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
| Author | Message |
|---|---|
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 12,111 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My task took 9.5 hours to download, and ran for 8 minutes. Of those 8 minutes, the first several minutes showed zero load on the GPU. I assume it was unpacking the task then. So it really ran for only about 5 minutes on the GPU. Reno, NV Team: SETI.USA |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
My task took 9.5 hours to download, and ran for 8 minutes. Of those 8 minutes, the first several minutes showed zero load on the GPU. I assume it was unpacking the task then. So it really ran for only about 5 minutes on the GPU. That is pretty much what I got, though I did not pay attention to the load. It was much ado about nothing. When they get more and longer ones, I will try again. |
|
Send message Joined: 7 Jan 17 Posts: 34 Credit: 1,371,429,518 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
My hosts received WUs, but they error out after 5 minutes: <core_client_version>7.16.6</core_client_version> I noticed that the only ones failing are v1.02. v2.19 ones validate (well, one did so far, the rest are still running). I'm a bit confused, so is v1 ACEMD 3 and v2 ACEMD 4? Or are both different versions of ACEMD 4? |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
I noticed that the only ones failing are v1.02. v2.19 ones validate (well, one did so far, the rest are still running). I'm a bit confused, so is v1 ACEMD 3 and v2 ACEMD 4? Or are both different versions of ACEMD 4? It's confusing the way they switched to a long-winded name and stopped labeling them acemd3 (v2.19) and acemd4 (v 1.0). |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
You've also had errors like one I've just encountered: EXIT_DISK_LIMIT_EXCEEDED Like you, my first failure was with Unsupported PRMTOP version! (task 32749480), followed by <workunit> (task 32749572) It's possible that the first error didn't clean up properly behind itself, and caused the combined project total to exceed that 10,000,000,000 byte limit - though that seems unlikely. All hell is breaking out at GPUGrid today, with ADRIA acemd3 tasks completing in under an hour - we'll just have to wait while things get sorted out one by one, and the dust settles! Edit - the slot directory where my next task is running contains 17,298 items, totalling 10.3 GB (10.3 GB on disk) - above the limit, although BOINC hasn't noticed yet. Edit 2 - it has now. Task 32750112 |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Could it have anything to do with your running Borg BOINC? Resistance is futile :-) |
|
Send message Joined: 7 Jan 17 Posts: 34 Credit: 1,371,429,518 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
ah, so the failing tasks are actually acemd3? I'm not sure what that disk limit error is about. All my relevant hosts are set to allow between 30-75 GB of space for BOINC. The event logs confirm this setting. Also, unrelated and nothing new, but this site is such a supreme pain in the butt to navigate if you are running multiple hosts from the same external IP... Ridiculous. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
We currently have: Advanced molecular dynamics simulations for GPUs v2.19 - that's acemd3 Advanced molecular dynamics simulations for GPUs v1.02 - that's acemd4 Clear as mud ??!! |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm not sure what that disk limit error is about. The limit in question is set at the workunit level, and applies to the amount copied to the working ('slot') directory, plus any data generated during the run. The BOINC limits are applied to the sum total of all files, in all directories, under the control of BOINC. To prove the point, I caught task 32750377 and suspended it before it started running. Then, I shut down the BOINC client, and edited client_state.xml to double the workunit limit. It ran to completion, and was validated. I'll do another one to check, but I can't be sitting here manually editing BOINC files every five minutes - this needs catching at source, and quickly. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And the next one - workunit 27113095, created 15:43:59 UTC - already has the fix in place. Kudos to whoever was watching our conversation here. It's also running a lot slower - nearly two minutes for each 2.5% step - so hopefully we're starting to get some real science done. |
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 12,111 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
And the next one - workunit 27113095, created 15:43:59 UTC - already has the fix in place. Kudos to whoever was watching our conversation here. Did the fix require the whole app to be re-downloaded? Please say no... Reno, NV Team: SETI.USA |
|
Send message Joined: 7 Jan 17 Posts: 34 Credit: 1,371,429,518 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Thanks for the explanation, RH. :) Getting validating acemd4 tasks now. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Did the fix require the whole app to be re-downloaded? Please say no... No! |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
i just downloaded the 2.8GB tar package and it only took about a few minutes to download at the roughly 15Mbps transfer rate. the ~500MB model file however is going quite slow at ~200Kbps and is dragging along
|
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Up to 3GB file. 3hr45min to get to 66% download. |
|
Send message Joined: 6 Jan 15 Posts: 76 Credit: 25,499,534,331 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
3GB got downloaded in few minutes and unit looks to working fine. Task: http://www.gpugrid.net/result.php?resultid=32751730 stderr out Stderr output <core_client_version>7.16.6</core_client_version> <![CDATA[ <stderr_txt> 17:49:50 (2974532): wrapper (7.7.26016): starting 17:49:50 (2974532): wrapper (7.7.26016): starting 17:49:50 (2974532): wrapper: running /bin/tar (xf x86_64-pc-linux-gnu__cuda1121.tar.bz2) 17:58:25 (2974532): /bin/tar exited; CPU time 509.278471 17:58:25 (2974532): wrapper: running bin/acemd (--boinc --device 0) 21:34:24 (2974532): bin/acemd exited; CPU time 12867.200624 21:34:24 (2974532): called boinc_finish(0) </stderr_txt> ]]> run.log #
# ACEMD version 4.0.0rc6
#
# Copyright (C) 2017-2022 Acellera (www.acellera.com)
#
# When publishing, please cite:
# ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale
# M. J. Harvey, G. Giupponi and G. De Fabritiis,
# J Chem. Theory. Comput. 2009 5(6), pp1632-1639
# DOI: 10.1021/ct9000685
#
# Arguments:
# input: input
# platform:
# device: 0
# ncpus:
# precision: mixed
#
# ACEMD is running in Boinc mode!
#
# Read input file: input
# Parse input file
$
$# Forcefield configuration
$
$ parmFile structure.prmtop
$ nnpFile model.json
$
$# Initial State
$
$ coordinates structure.pdb
$ binCoordinates input.coor
$ binVelocities input.vel
$ extendedSystem input.xsc
$# temperature 298.15 # Explicit velocity field provided
$
$# Output
$
$ trajectoryPeriod 25000
$ trajectoryFile output.xtc
$
$# Electrostatics
$
$ PME on
$ cutoff 9.00 # A
$ switching on
$ switchDistance 7.50 # A
$ implicitSolvent off
$
$# Temperature Control
$
$ thermostat on
$ thermostatTemperature 310.00 # K
$ thermostatDamping 0.10 # /ps
$
$# Pressure Control
$
$ barostat off
$ barostatPressure 1.0000 # bar
$ barostatAnisotropic off
$ barostatConstRatio off
$ barostatConstXY off
$
$# Integration
$
$ timeStep 2.00 # fs
$ slowPeriod 1
$
$# External forces
$
$
$# Restraints
$
$
$# Run Configuration
$
$ restart off
$ run 500000
# Parse force field and topology files
# Force field: AMBER
# PRMTOP file: structure.prmtop
#
# Force field parameters
# Number of atom parameters: 12
# Number of bond parameters: 14
# Number of angle parameters: 22
# Number of dihedral parameters: 20
# Number of improper parameters: 0
# Number of CMAP parameters: 0
#
# System topology
# Number of atoms: 5058
# Number of bonds: 5062
# Number of angles: 136
# Number of dihedrals: 240
# Number of impropers: 0
# Number of CMAPs: 0
#
# Initializing engine
# Version: 7.7
# Plugin directory: /var/lib/boinc-client/slots/3/lib/acemd3
# Loaded plugins
# CPU
# PME
# CUDA
# CudaCompiler
# WARNING: there is no library for "OpenCL" plugin
# PlumedCUDA
# WARNING: there is no library for "PlumedOpenCL" plugin
# PlumedReference
# TorchReference
# TorchCUDA
# WARNING: there is no library for "TorchOpenCL" plugin
# Available platforms
# CPU
# CUDA
#
# Bonded interactions
# Harmonic bond interactions
# Number of terms: 5062
# Harmonic angle interactions
# Number of terms: 136
# Urey-Bradley interactions
# Number of terms: 0
# Number of skipped terms (zero force constant): 136
# NOTE: Urey-Bradley interations skipped
# Proper dihedral interations
# Number of terms: 224
# Number of skipped terms (zero force constants): 16
# Improper dihedral interations
# Number of terms: 0
# NOTE: improper dihedral interations skipped
# CMAP interactions
# Number of terms: 0
# NOTE: CMAP interations skipped
#
# Non-bonded interactions
# Number of exclusions: 5391
# Lennard-Jones terms
# Cutoff distance: 9.000 A
# Switching distance: 7.500 A
# Coulombic (PME) term
# Ewald tolerance: 0.000500
# No NBFIX
# No implicit solvent
#
# NNP
# Configuration file: model.json
# Model type: torch
# Model file: model.nnp
# Number of atoms: 75
#
# Constraining hydrogen (X-H) bonds
# Number of constrained bonds: 3356
# Making water molecules rigid
# Number of water molecules: 1661
# Number of constraints: 5017
#
# Reading box sizes from input.xsc
#
# Creating simulation system
# Number of particles: 5058
# Number of degrees of freedom 10154
# Periodic box size: 37.314 37.226 37.280 A
#
# Integrator
# Type: velocity Verlet
# Step size: 2.00 fs
# Constraint tolerance: 1.0e-06
#
# Thermostat
# Type: Langevin
# Target temperature: 310.00 K
# Friction coefficient: 0.10 ps^-1
#
# Setting up platform: CUDA
# Interactions: 1 2 4 7 14 12
# Platform properties:
# DeviceIndex: 0
# DeviceName: NVIDIA GeForce RTX 3070
# UseBlockingSync: false
# Precision: mixed
# UseCpuPme: false
# CudaCompiler: /usr/local/cuda/bin/nvcc
# TempDirectory: /tmp
# CudaHostCompiler:
# DisablePmeStream: false
# DeterministicForces: false
#
# Set initial positions from an input file
#
# Initial velocities
# File: input.vel
#
# Optimize platform for MD
# Number of constraints: 5017
# Harmonic bond interations
# Initial number of terms: 5062
# Optimized number of terms: 45
# Remaining interactions: 2 4 7 14 12 1
#
# Running simulation
# Current step: 0
# Number of steps: 500000
#
# Trajectory output
# Positions: output.xtc
# Period: 25000
# Wrapping: off
#
# Log, trajectory, and restart files are written every 50.000 ps (25000 steps)
# Step Time Bond Angle Urey-Bradley Dihedral Improper CMAP Non-bonded Implicit External Potential Kinetic Total Temperature Volume
# [ps] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [kcal/mol] [K] [A^3]
25000 50.00 7.8698 22.5692 0.0000 62.2167 0.0000 0.0000 -15734.1899 0.0000 -1379454.7644 -1395096.2985 3140.7230 -1391955.5755 311.303 51783.76
# Speed: average 6.63 ns/day, current 6.63 ns/day
# Progress: 5.0, remaining time: 3:26:16, ETA: Fri Mar 4 21:35:50 2022
50000 100.00 6.7940 26.1562 0.0000 56.6657 0.0000 0.0000 -15592.6529 0.0000 -1379450.5644 -1394953.6014 3101.0705 -1391852.5309 307.372 51783.76
# Speed: average 6.66 ns/day, current 6.68 ns/day
# Progress: 10.0, remaining time: 3:14:43, ETA: Fri Mar 4 21:35:03 2022
75000 150.00 4.6298 22.0773 0.0000 58.5973 0.0000 0.0000 -15798.4139 0.0000 -1379459.1078 -1395172.2173 3143.0430 -1392029.1743 311.533 51783.76
# Speed: average 6.66 ns/day, current 6.68 ns/day
# Progress: 15.0, remaining time: 3:03:42, ETA: Fri Mar 4 21:34:50 2022
100000 200.00 7.8170 26.8665 0.0000 59.5203 0.0000 0.0000 -15618.1979 0.0000 -1379453.2405 -1394977.2346 3138.9295 -1391838.3051 311.125 51783.76
# Speed: average 6.67 ns/day, current 6.68 ns/day
# Progress: 20.0, remaining time: 2:52:48, ETA: Fri Mar 4 21:34:42 2022
125000 250.00 8.7645 24.8827 0.0000 59.5112 0.0000 0.0000 -15731.4732 0.0000 -1379450.9954 -1395089.3103 3081.5431 -1392007.7672 305.437 51783.76
# Speed: average 6.67 ns/day, current 6.68 ns/day
# Progress: 25.0, remaining time: 2:41:56, ETA: Fri Mar 4 21:34:37 2022 |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
is it really necessary to spend 4-5 mins every task to extract the same 3GB file? seems unnecessary. if it's not downloading a new file every time, then why extract the same file over and over? why not just leave it extracted?
|
|
Send message Joined: 2 Jul 16 Posts: 338 Credit: 7,987,341,558 RAC: 259 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Up to 3GB file. 3hr45min to get to 66% download. 5hr20min to download 10+ min to start processing and already at 50% complete Task completed in just under 12 min. Less than 2 minutes of processing on a 3070Ti at around 55% load. |
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 12,111 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Since the original task, I have received several more, with no giant file to download again. Then I got a task for the same machine, and it is downloading another beast of a file, veeeery slowly. 4.5 hours so far, and only 29% complete. BOINCtasks says it 9.71KBps. Ouch. Reno, NV Team: SETI.USA |
|
Send message Joined: 16 Jul 07 Posts: 209 Credit: 5,496,860,456 RAC: 12,111 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Since the original task, I have received several more, with no giant file to download again. Then I got a task for the same machine, and it is downloading another beast of a file, veeeery slowly. 4.5 hours so far, and only 29% complete. BOINCtasks says it 9.71KBps. Ouch. A Follow-up: The d/l seems to stall eventually. But going into Boinc and turning off/on networking seems to restart the d/l at a reasonable pace, and be done in a matter of minutes in stead of hours. This project need to get their networking in order. Reno, NV Team: SETI.USA |
©2025 Universitat Pompeu Fabra