ATM

Message boards : News : ATM
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 35 · Next

AuthorMessage
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60894 - Posted: 20 Dec 2023, 16:54:54 UTC - in response to Message 60880.  

nice, a PR should at least get someone's attention lol


And so it finally did. :-)

Pull request was accepted and merged into the original AToM-OpenMM/Master repo. All that's left now is for it to be merged into the proper repo that is retrieved at the execution of any WU, and progress % will be fixed.
Which will obviously only be useful if ATMbeta task generation starts up again...


Please provide the direct link to the PR and the repo for the devs to incorporate your fix.
ID: 60894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60895 - Posted: 20 Dec 2023, 17:33:19 UTC - in response to Message 60894.  
Last modified: 20 Dec 2023, 17:58:14 UTC

nice, a PR should at least get someone's attention lol


And so it finally did. :-)

Pull request was accepted and merged into the original AToM-OpenMM/Master repo. All that's left now is for it to be merged into the proper repo that is retrieved at the execution of any WU, and progress % will be fixed.
Which will obviously only be useful if ATMbeta task generation starts up again...


Please provide the direct link to the PR and the repo for the devs to incorporate your fix.


i don't know why the devs need the users to spoon feed them their own code and repos. they accepted the PR and merged it already. it almost seems like theres little to no inter-team communication about what is going on.

it's all here: https://github.com/Gallicchio-Lab/AToM-OpenMM/pull/56/

the PR was merged into this master on November 16th. but the tasks being distributed to users must be pulling from some other repo tag as the changes have not yet been reflected on subsequent tasks that we have received since then

i don't have any tasks for ATM so i don't remember off hand what tag it was pulling. probably not master or the latest v8.1.0 since those have the fix. probably pulling the v8.1.0beta tag from October.
ID: 60895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 351
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60896 - Posted: 20 Dec 2023, 17:47:28 UTC - in response to Message 60895.  

... the PR was merged into this master on November 16th. but the tasks being distributed to users must be pulling from some other repo tag as the changes have not yet been reflected on subsequent tasks that we have received since then

Like all BOINC projects, GPUGrid has an applications page - it's part of the standard BOINC toolkit.

That shows that the active ATM Beta code was installed for distribution on 27 Mar 2023 for Linux, and the following day for Windows. Now that the source code has been updated, it will need to be re-compiled into binary form and re-deployed. That's the current stumbling block.
ID: 60896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60897 - Posted: 20 Dec 2023, 18:05:38 UTC - in response to Message 60896.  
Last modified: 20 Dec 2023, 18:08:01 UTC

... the PR was merged into this master on November 16th. but the tasks being distributed to users must be pulling from some other repo tag as the changes have not yet been reflected on subsequent tasks that we have received since then

Like all BOINC projects, GPUGrid has an applications page - it's part of the standard BOINC toolkit.

That shows that the active ATM Beta code was installed for distribution on 27 Mar 2023 for Linux, and the following day for Windows. Now that the source code has been updated, it will need to be re-compiled into binary form and re-deployed. That's the current stumbling block.


no that's not correct. you're not understanding how this application works. it's not the normal setup most boinc projects use.

this "app" is NOT a compiled binary! it's just a bunch of python scripts. just watch how these tasks run and you will see. start from the wrapper and look what's actually happening. what gets distributed to users as the "app" is a baseline zip archive package that contains the conda python environment and some prepackaged libraries, etc. when BOINC runs, it's using the wrapper and associated job.xml file to start execution of the scripts. somewhere along the way in the long chain of script execution, it reaches out to github to download the necessary files and the one in question.

wrapper -> unzip archive -> run script -> download stuff from github -> run more scripts

that's why these tasks fail if you try to run them offline or without an internet connection.
ID: 60897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[BAT] Svennemans

Send message
Joined: 27 May 21
Posts: 54
Credit: 1,004,151,720
RAC: 0
Level
Met
Scientific publications
wat
Message 60899 - Posted: 21 Dec 2023, 20:44:22 UTC - in response to Message 60897.  
Last modified: 21 Dec 2023, 20:46:30 UTC

... the PR was merged into this master on November 16th. but the tasks being distributed to users must be pulling from some other repo tag as the changes have not yet been reflected on subsequent tasks that we have received since then

Like all BOINC projects, GPUGrid has an applications page - it's part of the standard BOINC toolkit.

That shows that the active ATM Beta code was installed for distribution on 27 Mar 2023 for Linux, and the following day for Windows. Now that the source code has been updated, it will need to be re-compiled into binary form and re-deployed. That's the current stumbling block.


no that's not correct. you're not understanding how this application works. it's not the normal setup most boinc projects use.

this "app" is NOT a compiled binary! it's just a bunch of python scripts. just watch how these tasks run and you will see. start from the wrapper and look what's actually happening. what gets distributed to users as the "app" is a baseline zip archive package that contains the conda python environment and some prepackaged libraries, etc. when BOINC runs, it's using the wrapper and associated job.xml file to start execution of the scripts. somewhere along the way in the long chain of script execution, it reaches out to github to download the necessary files and the one in question.

wrapper -> unzip archive -> run script -> download stuff from github -> run more scripts

that's why these tasks fail if you try to run them offline or without an internet connection.



Correct, each WU downloads the ATM code on the fly. And the repo it is being pulled from is the "HEAD" of this repo: https://github.com/raimis/AToM-OpenMM.
However, my pull request has been merged into https://github.com/Gallicchio-Lab/AToM-OpenMM (which is the ATM master repo), but NOT into the raimis one. The raimis one is still '17 commits behind' Gallicchio-Lab, and my fix is one of those 'todo' commits - check here:
https://github.com/raimis/AToM-OpenMM/compare/master...Gallicchio-Lab%3AAToM-OpenMM%3Amaster

So no compile needed. 2 things would potentially work:
1) Raimis merges the '17 commits' into his repo. That would then probably become the new 'HEAD' and WU's would automatically pull this (I hope) or the devs would potentially need to change the SHA code in run.bat/run.sh
2) The devs adapt the run.bat/run.sh to pull not from raimis 'HEAD', but from Gallicchio-Lab - but that might obviously have other side effects, i have no idea if they use raimis for a reason...

Relevant code in run.bat/sh:
@echo Install AToM
set REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@d7931b9a6217232d481731f7589d64b100a514ac


And for readers wondering where the hell that run.bat/sh comes from: it's part of each WU's "xxxxxx-input" file - which is just a bzipped tar file.
ID: 60899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 21 Nov 16
Posts: 36
Credit: 164,429,114
RAC: 15
Level
Ile
Scientific publications
wat
Message 60900 - Posted: 22 Dec 2023, 2:35:40 UTC - in response to Message 60899.  

... the PR was merged into this master on November 16th. but the tasks being distributed to users must be pulling from some other repo tag as the changes have not yet been reflected on subsequent tasks that we have received since then

Like all BOINC projects, GPUGrid has an applications page - it's part of the standard BOINC toolkit.

That shows that the active ATM Beta code was installed for distribution on 27 Mar 2023 for Linux, and the following day for Windows. Now that the source code has been updated, it will need to be re-compiled into binary form and re-deployed. That's the current stumbling block.


no that's not correct. you're not understanding how this application works. it's not the normal setup most boinc projects use.

this "app" is NOT a compiled binary! it's just a bunch of python scripts. just watch how these tasks run and you will see. start from the wrapper and look what's actually happening. what gets distributed to users as the "app" is a baseline zip archive package that contains the conda python environment and some prepackaged libraries, etc. when BOINC runs, it's using the wrapper and associated job.xml file to start execution of the scripts. somewhere along the way in the long chain of script execution, it reaches out to github to download the necessary files and the one in question.

wrapper -> unzip archive -> run script -> download stuff from github -> run more scripts

that's why these tasks fail if you try to run them offline or without an internet connection.



Correct, each WU downloads the ATM code on the fly. And the repo it is being pulled from is the "HEAD" of this repo: https://github.com/raimis/AToM-OpenMM.
However, my pull request has been merged into https://github.com/Gallicchio-Lab/AToM-OpenMM (which is the ATM master repo), but NOT into the raimis one. The raimis one is still '17 commits behind' Gallicchio-Lab, and my fix is one of those 'todo' commits - check here:
https://github.com/raimis/AToM-OpenMM/compare/master...Gallicchio-Lab%3AAToM-OpenMM%3Amaster

So no compile needed. 2 things would potentially work:
1) Raimis merges the '17 commits' into his repo. That would then probably become the new 'HEAD' and WU's would automatically pull this (I hope) or the devs would potentially need to change the SHA code in run.bat/run.sh
2) The devs adapt the run.bat/run.sh to pull not from raimis 'HEAD', but from Gallicchio-Lab - but that might obviously have other side effects, i have no idea if they use raimis for a reason...

Relevant code in run.bat/sh:
@echo Install AToM
set REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@d7931b9a6217232d481731f7589d64b100a514ac


And for readers wondering where the hell that run.bat/sh comes from: it's part of each WU's "xxxxxx-input" file - which is just a bzipped tar file.


So we all "Hope" that either option #1 or #2 happens....

Thank you for your efforts.

Bill F
ID: 60900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[BAT] Svennemans

Send message
Joined: 27 May 21
Posts: 54
Credit: 1,004,151,720
RAC: 0
Level
Met
Scientific publications
wat
Message 60901 - Posted: 23 Dec 2023, 0:28:50 UTC - in response to Message 60900.  


So no compile needed. 2 things would potentially work:
1) Raimis merges the '17 commits' into his repo. That would then probably become the new 'HEAD' and WU's would automatically pull this (I hope) or the devs would potentially need to change the SHA code in run.bat/run.sh
2) The devs adapt the run.bat/run.sh to pull not from raimis 'HEAD', but from Gallicchio-Lab - but that might obviously have other side effects, i have no idea if they use raimis for a reason...

Relevant code in run.bat/sh:
@echo Install AToM
set REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@d7931b9a6217232d481731f7589d64b100a514ac


And for readers wondering where the hell that run.bat/sh comes from: it's part of each WU's "xxxxxx-input" file - which is just a bzipped tar file.


So we all "Hope" that either option #1 or #2 happens....

Thank you for your efforts.

Bill F


UPDATE: Seems that Raimis is reading the forum, or a little bird told him, because he has merged all commits. That's the good news.

Now some dev still needs to update the commit SHA code in run.bat/run.sh to the new HEAD version 1aa4eb9c39de5e269da430949da2ef377b3d9ca2

New code needed in run.bat/sh:
@echo Install AToM
set REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@1aa4eb9c39de5e269da430949da2ef377b3d9ca2


Once that is fixed, the progress issue should be over and done with!
ID: 60901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60902 - Posted: 23 Dec 2023, 8:25:57 UTC - in response to Message 60901.  

Good news indeed.
ID: 60902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 21 Nov 16
Posts: 36
Credit: 164,429,114
RAC: 15
Level
Ile
Scientific publications
wat
Message 60903 - Posted: 24 Dec 2023, 21:51:50 UTC - in response to Message 60901.  


So no compile needed. 2 things would potentially work:
1) Raimis merges the '17 commits' into his repo. That would then probably become the new 'HEAD' and WU's would automatically pull this (I hope) or the devs would potentially need to change the SHA code in run.bat/run.sh
2) The devs adapt the run.bat/run.sh to pull not from raimis 'HEAD', but from Gallicchio-Lab - but that might obviously have other side effects, i have no idea if they use raimis for a reason...

Relevant code in run.bat/sh:
@echo Install AToM
set REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@d7931b9a6217232d481731f7589d64b100a514ac


And for readers wondering where the hell that run.bat/sh comes from: it's part of each WU's "xxxxxx-input" file - which is just a bzipped tar file.


So we all "Hope" that either option #1 or #2 happens....

Thank you for your efforts.

Bill F


UPDATE: Seems that Raimis is reading the forum, or a little bird told him, because he has merged all commits. That's the good news.

Now some dev still needs to update the commit SHA code in run.bat/run.sh to the new HEAD version 1aa4eb9c39de5e269da430949da2ef377b3d9ca2

New code needed in run.bat/sh:
@echo Install AToM
set REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@1aa4eb9c39de5e269da430949da2ef377b3d9ca2


Once that is fixed, the progress issue should be over and done with!


Are there multiple who do Dev work for the project and is there away to get a "little bird" to talk to them ?

Bill F

ID: 60903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60913 - Posted: 3 Jan 2024, 9:46:25 UTC

this morning, 2 of my Windows10 machines received ATMbeta tasks, and all of them failed after around 51 seconds (RTX3070) and around 81 seconds ((Quadro P5000).
ID: 60913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60914 - Posted: 3 Jan 2024, 10:16:35 UTC - in response to Message 60913.  

this morning, 2 of my Windows10 machines received ATMbeta tasks, and all of them failed after around 51 seconds (RTX3070) and around 81 seconds ((Quadro P5000).

still these faulty ATMs are being sent out, failing after short time.
I now delisted them from my download choices in the web settings.

Are these tasks failing only on my systems, or do other crunchers experience the same problem ?
ID: 60914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60915 - Posted: 3 Jan 2024, 10:28:58 UTC - in response to Message 60914.  

this morning, 2 of my Windows10 machines received ATMbeta tasks, and all of them failed after around 51 seconds (RTX3070) and around 81 seconds ((Quadro P5000).

still these faulty ATMs are being sent out, failing after short time.
I now delisted them from my download choices in the web settings.

Are these tasks failing only on my systems, or do other crunchers experience the same problem ?

what kind of junk is this now? even after I set the ATMbeta to "no", they still come in and fail :-((((

So something seems to be wrong with the GPUGRID web settings :-(((
ID: 60915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 68
Credit: 12,293,491,875
RAC: 2,606
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60916 - Posted: 3 Jan 2024, 11:20:49 UTC

I got 6 ATMbeta so far today. All of them error out with

openmm.OpenMMException: Unknown property 'version' in node 'IntegratorParameters'

ATMbetas have worked well on this machine before.
ID: 60916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 57
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60917 - Posted: 3 Jan 2024, 11:33:40 UTC

Same here with this:

Stderr output

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
06:08:31 (183097): wrapper (7.7.26016): starting
06:08:48 (183097): wrapper (7.7.26016): starting
06:08:48 (183097): wrapper: running bin/python (bin/conda-unpack)
06:08:49 (183097): bin/python exited; CPU time 0.184758
06:08:49 (183097): wrapper: running bin/tar (xjvf input.tar.bz2)
06:08:50 (183097): bin/tar exited; CPU time 0.557920
06:08:50 (183097): wrapper: running bin/bash (run.sh)
+ echo 'Setup environment'
+ source bin/activate
++ _conda_pack_activate
++ local _CONDA_SHELL_FLAVOR
++ '[' -n x ']'
++ _CONDA_SHELL_FLAVOR=bash
++ local script_dir
++ case "$_CONDA_SHELL_FLAVOR" in
+++ dirname bin/activate
++ script_dir=bin
+++ cd bin
+++ pwd
++ local full_path_script_dir=/var/lib/boinc-client/slots/1/bin
+++ dirname /var/lib/boinc-client/slots/1/bin
++ local full_path_env=/var/lib/boinc-client/slots/1
+++ basename /var/lib/boinc-client/slots/1
++ local env_name=1
++ '[' -n '' ']'
++ export CONDA_PREFIX=/var/lib/boinc-client/slots/1
++ CONDA_PREFIX=/var/lib/boinc-client/slots/1
++ export _CONDA_PACK_OLD_PS1=
++ _CONDA_PACK_OLD_PS1=
++ PATH=/var/lib/boinc-client/slots/1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
++ PS1='(1) '
++ case "$_CONDA_SHELL_FLAVOR" in
++ hash -r
++ local _script_dir=/var/lib/boinc-client/slots/1/etc/conda/activate.d
++ '[' -d /var/lib/boinc-client/slots/1/etc/conda/activate.d ']'
+++ ls -A /var/lib/boinc-client/slots/1/etc/conda/activate.d
++ '[' -n ocl-icd_activate.sh ']'
++ local _path
++ for _path in "$_script_dir"/*.sh
++ . /var/lib/boinc-client/slots/1/etc/conda/activate.d/ocl-icd_activate.sh
+++ conda_ocl_icd_activate
++++ ls /var/lib/boinc-client/slots/1/etc/OpenCL/vendors/
+++ [[ -z ocl-icd-system ]]
+ export PATH=/var/lib/boinc-client/slots/1:/var/lib/boinc-client/slots/1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ PATH=/var/lib/boinc-client/slots/1:/var/lib/boinc-client/slots/1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ echo 'Create a temporary directory'
+ export TMP=/var/lib/boinc-client/slots/1/tmp
+ TMP=/var/lib/boinc-client/slots/1/tmp
+ mkdir -p /var/lib/boinc-client/slots/1/tmp
+ echo 'Install AToM'
+ REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@d7931b9a6217232d481731f7589d64b100a514ac
+ python -m pip install git+https://github.com/raimis/AToM-OpenMM.git@d7931b9a6217232d481731f7589d64b100a514ac
Running command git clone --filter=blob:none --quiet https://github.com/raimis/AToM-OpenMM.git /var/lib/boinc-client/slots/1/tmp/pip-req-build-x7scn3kc
Running command git rev-parse -q --verify 'sha^d7931b9a6217232d481731f7589d64b100a514ac'
Running command git fetch -q https://github.com/raimis/AToM-OpenMM.git d7931b9a6217232d481731f7589d64b100a514ac
Running command git checkout -q d7931b9a6217232d481731f7589d64b100a514ac
+ python -m pip list
+ echo 'Configure AToM'
+ echo localhost,0:0,1,CUDA,,/var/lib/boinc-client/slots/1/tmp
+ echo 'Extract restart'
+ tar xjvf restart.tar.bz2
+ echo 'Run AToM'
+ CONFIG_FILE=JAK2_m02_m04_asyncre.cntl
+ python bin/rbfe_explicit_sync.py JAK2_m02_m04_asyncre.cntl
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Traceback (most recent call last):
File "/var/lib/boinc-client/slots/1/bin/rbfe_explicit_sync.py", line 8, in <module>
rx.setupJob()
File "/var/lib/boinc-client/slots/1/lib/python3.9/site-packages/sync/atm.py", line 82, in setupJob
self.worker = OMMWorkerATM(ommsystem, self.config, self.logger)
File "/var/lib/boinc-client/slots/1/lib/python3.9/site-packages/sync/worker.py", line 44, in __init__
self.simulation.loadState(basename + "_0.xml")
File "/var/lib/boinc-client/slots/1/lib/python3.9/site-packages/openmm/app/simulation.py", line 344, in loadState
self.context.setState(mm.XmlSerializer.deserialize(xml))
File "/var/lib/boinc-client/slots/1/lib/python3.9/site-packages/openmm/openmm.py", line 9742, in setState
return _openmm.Context_setState(self, state)
openmm.OpenMMException: Unknown property 'version' in node 'IntegratorParameters'
06:09:26 (183097): bin/bash exited; CPU time 33.678225
06:09:26 (183097): app exit status: 0x1
06:09:26 (183097): called boinc_finish(195)

</stderr_txt>
]]>

Erich56, Regarding your preferences, did you answer yes to these questions:

1. Run test applications?
2. If no work for selected applications is available, accept work from other applications?

Then change them to no.

What is the location setting for your computers? Default, Home, Work or School.
Make sure you modify the correct location setting preferences.

That's all that I can think of.

ID: 60917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60918 - Posted: 3 Jan 2024, 12:47:28 UTC - in response to Message 60917.  

Erich56, Regarding your preferences, did you answer yes to these questions:

1. Run test applications?
2. If no work for selected applications is available, accept work from other applications?

Then change them to no.

What is the location setting for your computers? Default, Home, Work or School.
Make sure you modify the correct location setting preferences.

That's all that I can think of.


thanky you, Bedrich, for your hints.
Indeed, I deselected "ATMbeta", but I forgot to deselect "Run test applications".
So now I corrected this.

Still though, I would guess once "ATMbeta" is deselected, no ATMbeta should be downloaded, regardless whether "run test applications" is selected or not.
What happened is rather unlogical :-(

ID: 60918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
roundup

Send message
Joined: 11 May 10
Posts: 68
Credit: 12,293,491,875
RAC: 2,606
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60919 - Posted: 3 Jan 2024, 15:30:58 UTC - in response to Message 60918.  

11 WUs on 3 different computers (1 Win11, 2 Linux) with 3 different GPUs failed today.
ATMbeta v1.09 has worked fine before on these machines (was there a change in the app but the version number 1.09 kept as before?).

Error messages same as in the postings before.
ID: 60919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60920 - Posted: 3 Jan 2024, 16:53:21 UTC - in response to Message 60919.  
Last modified: 3 Jan 2024, 16:57:58 UTC

11 WUs on 3 different computers (1 Win11, 2 Linux) with 3 different GPUs failed today.
ATMbeta v1.09 has worked fine before on these machines (was there a change in the app but the version number 1.09 kept as before?).

Error messages same as in the postings before.

from what I remember reading somewhere here recently, there was some issue with an expired license which they tried to fix but it failed. This though was true for the Linux version. So, as it looks, the same problem might be true for Windows as well :-(

However, I am wondering that no one from the team notices that all the tasks which they send out are failing, and so they would stop the distribution.

P.S.
I just notice that one of my PCs received serval ACEMD 3 tasks within the past hour - and they also failed after about a minute.
See here: http://www.gpugrid.net/result.php?resultid=33725238
Until this morning, they could be crunched successfully.
So there seems to exist a major problem wiht GPUGRID at this time :-(
ID: 60920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 5,269
Level
Trp
Scientific publications
wat
Message 60921 - Posted: 3 Jan 2024, 16:56:52 UTC - in response to Message 60920.  

11 WUs on 3 different computers (1 Win11, 2 Linux) with 3 different GPUs failed today.
ATMbeta v1.09 has worked fine before on these machines (was there a change in the app but the version number 1.09 kept as before?).

Error messages same as in the postings before.

from what I remember reading somewhere here recently, there was some issue with an expired license which they tried to fix but it failed. This though was true for the Linux version. So, as it looks, the same problem might be true for Windows as well :-(

However, I am wondering that no one from the team notices that all the tasks which they send out are failing, and so they would stop the distribution.


You’re confusing two different apps.

acemd3 had the expired license issue, which they tried to fix, but ended up replacing the Linux version with the windows version which remains broken (because a windows app can’t run on Linux)

This thread is about the ATM app. Which is not subject to the same licensing issues.
ID: 60921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,187
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60933 - Posted: 7 Jan 2024, 21:10:13 UTC

From January 5th onwards, my Linux hosts have received a limited quantity of new ATMbeta tasks.
Every of them, are showing now a lineal % progress during execution.
For an example, I reproduce "Properties" from the first I received at two different moments:

Application: ATMbeta: Free energy calculations of protein-ligand binding 1.09 (cuda1121)
Name: CDK2_m01_m10_1-QUICO_ATM_TEST_NEW-3-5-RND8234
State: Running
Received: Fri 05 Jan 2024 11:52:17 WET
Report deadline: Wed 10 Jan 2024 11:52:16 WET
Resources: 0.955 CPUs + 1 NVIDIA GPU
Estimated computation size: 1,000,000,000 GFLOPs
CPU time: 00:55:25
CPU time since checkpoint: 00:55:22
Elapsed time: 00:57:41
Estimated time remaining: 12d 22:00:55
Fraction done: 7.328%
Virtual memory size: 11.94 GB
Working set size: 1.48 GB
Directory: slots/3
Process ID: 14905
Progress rate: 7.560% per hour
Executable: wrapper_26198_x86_64-pc-linux-gnu

----------------------------------------------------------------------

Application: ATMbeta: Free energy calculations of protein-ligand binding 1.09 (cuda1121)
Name: CDK2_m01_m10_1-QUICO_ATM_TEST_NEW-3-5-RND8234
State: Running
Received: Fri 05 Jan 2024 11:52:17 WET
Report deadline: Wed 10 Jan 2024 11:52:16 WET
Resources: 0.955 CPUs + 1 NVIDIA GPU
Estimated computation size: 1,000,000,000 GFLOPs
CPU time: 07:00:18
CPU time since checkpoint: 07:00:16
Elapsed time: 07:11:01
Estimated time remaining: 6d 18:09:42
Fraction done: 51.526%
Virtual memory size: 11.94 GB
Working set size: 1.48 GB
Directory: slots/3
Process ID: 14905
Progress rate: 7.200% per hour
Executable: wrapper_26198_x86_64-pc-linux-gnu

This is an interesting issue correction comparing to precedent ATMbeta tasks!

On the other hand, values for "CPU time since checkpoint:" make me think that "no checkpointing" issue is still pending to correct.
This compels for the tasks to be executed with no interruptions from the beginning to the end...
Also, They seem to continue failing on Windows hosts.
ID: 60933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 731
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60934 - Posted: 8 Jan 2024, 1:04:31 UTC

They fixed the percentage completion issue.

But I don't think anyone has tried the stop-restart experiment yet to determine if the tasks can be stopped and restarted without failing yet.

The tasks are too rare yet to throw any away for the experiment.
ID: 60934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 35 · Next

Message boards : News : ATM

©2025 Universitat Pompeu Fabra