ATM

Message boards : News : ATM
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 35 · Next

AuthorMessage
Emilio Gallicchio

Send message
Joined: 23 Mar 23
Posts: 4
Credit: 87,500
RAC: 0
Level

Scientific publications
wat
Message 60171 - Posted: 25 Mar 2023, 2:28:51 UTC - in response to Message 60163.  


I can't answer immediately on the termination question, but it's all open-source and I can look through it. In this case, it's more complicated, because BOINC will talk to the wrapper, and the wrapper will talk to the science app.

But the basic idea is that BOINC will send a request to terminate over the API, and wait for the application to close itself down as it sees fit. Actual signals will only be used to force termination in the case of an unconditional quit, such as an operating system closedown.


Right, probably the wrapper should send a termination signal to AToM.

We have of course access to AToM's sources https://github.com/Gallicchio-Lab/AToM-OpenMM and we can make sure that it checkpoints appropriately when it receives the signal.

However, I do not have access to the wrapper. Quico: please advise.
ID: 60171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 2 Nov 08
Posts: 3
Credit: 11,500,745,584
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60172 - Posted: 25 Mar 2023, 9:32:49 UTC
Last modified: 25 Mar 2023, 9:33:49 UTC

Hi, i have some "new_2" ATMs that run for 14h+ yet. Should i abort them?
Running linux with rtx3070 cards
ID: 60172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60173 - Posted: 25 Mar 2023, 9:38:33 UTC - in response to Message 60171.  
Last modified: 25 Mar 2023, 9:50:33 UTC

The wrapper you're using at the moment is called "wrapper_26198_x86_64-pc-linux-gnu" (I haven't tried ATM under Windows yet, but can and will do so when I get a moment).

That wrapper name looks as if it was prepared from BOINC code dating to around February 2017. At that time, BOINC was working on versions of the wrapper specifically intended for use with VirtualBox.

BOINC makes pre-compiled versions of the wrapper available for projects to use "as is", but some projects customise the source code to suit their own needs. I don't know which path GPUGrid has taken.

Edit - I just looked at the file name the first time. In stderr.txt, I see

20:37:54 (115491): wrapper (7.7.26016): starting

That would put the date back to around November 2015, but I guess someone has made some local modifications.
ID: 60173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60174 - Posted: 25 Mar 2023, 9:45:14 UTC - in response to Message 60172.  

Hi, i have some "new_2" ATMs that run for 14h+ yet. Should i abort them?

I have one at the moment which has been running for 17.5 hours. The same machine completed one yesterday (task 33374928) which ran for 19 hours. I wouldn't abort it just yet.

ID: 60174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 2 Nov 08
Posts: 3
Credit: 11,500,745,584
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60175 - Posted: 25 Mar 2023, 9:46:50 UTC - in response to Message 60174.  

Hi, i have some "new_2" ATMs that run for 14h+ yet. Should i abort them?

I have one at the moment which has been running for 17.5 hours. The same machine completed one yesterday (task 33374928) which ran for 19 hours. I wouldn't abort it just yet.



thank you. I will let them running =)
ID: 60175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60176 - Posted: 25 Mar 2023, 11:32:54 UTC - in response to Message 60175.  

ID: 60176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60177 - Posted: 25 Mar 2023, 13:06:20 UTC - in response to Message 60165.  
Last modified: 25 Mar 2023, 13:53:41 UTC

Seriously? Only 14 tasks a day?

The quota adjusts dynamically - it goes up if you report successful tasks, and goes down if you report errors.

Quico, This behavior is intended to block misconfigured computers. In this case it's your Windows version that fails in seconds and being resent until it hits a Linux computer or fails 7 times. My Win computer was locked out of GG early yesterday but all my Linux computers donated until WUs ran out.
In this example the first 4 failures all went to Win7 & 11 computers and then Linux completed it successfully:
https://www.gpugrid.net/workunit.php?wuid=27438768

And the Win WUs are failing in seconds again with today's tranche.
ID: 60177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 12 Jul 17
Posts: 404
Credit: 17,408,899,587
RAC: 2
Level
Trp
Scientific publications
watwatwat
Message 60183 - Posted: 25 Mar 2023, 14:27:30 UTC

WUs failing on Linux computers:
+ python -m pip install git+https://github.com/raimis/AToM-OpenMM.git@172e6db924567cd0af1312d33f05b156b53e3d1c
  Running command git clone --filter=blob:none --quiet https://github.com/raimis/AToM-OpenMM.git /var/lib/boinc-client/slots/36/tmp/pip-req-build-jsq34xa4
  fatal: unable to access '/home/conda/feedstock_root/build_artifacts/git_1679396317102/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/etc/gitconfig': Permission denied
  error: subprocess-exited-with-error
  
  × git clone --filter=blob:none --quiet https://github.com/raimis/AToM-OpenMM.git /var/lib/boinc-client/slots/36/tmp/pip-req-build-jsq34xa4 did not run successfully.
  │ exit code: 128
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

https://www.gpugrid.net/result.php?resultid=33379917
ID: 60183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 2 Nov 08
Posts: 3
Credit: 11,500,745,584
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 60184 - Posted: 25 Mar 2023, 14:30:06 UTC

Any ideas why WUs are failing on a linux ubuntu machine with gtx1070?

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:01:49 (3551): wrapper (7.7.26016): starting
14:02:12 (3551): wrapper (7.7.26016): starting
14:02:12 (3551): wrapper: running bin/python (bin/conda-unpack)
14:02:13 (3551): bin/python exited; CPU time 0.280413
14:02:13 (3551): wrapper: running bin/tar (xjvf input.tar.bz2)
14:02:14 (3551): bin/tar exited; CPU time 0.840912
14:02:14 (3551): wrapper: running bin/bash (run.sh)
+ echo 'Setup environment'
+ source bin/activate
++ _conda_pack_activate
++ local _CONDA_SHELL_FLAVOR
++ '[' -n x ']'
++ _CONDA_SHELL_FLAVOR=bash
++ local script_dir
++ case "$_CONDA_SHELL_FLAVOR" in
+++ dirname bin/activate
++ script_dir=bin
+++ cd bin
+++ pwd
++ local full_path_script_dir=/var/lib/boinc-client/slots/7/bin
+++ dirname /var/lib/boinc-client/slots/7/bin
++ local full_path_env=/var/lib/boinc-client/slots/7
+++ basename /var/lib/boinc-client/slots/7
++ local env_name=7
++ '[' -n '' ']'
++ export CONDA_PREFIX=/var/lib/boinc-client/slots/7
++ CONDA_PREFIX=/var/lib/boinc-client/slots/7
++ export _CONDA_PACK_OLD_PS1=
++ _CONDA_PACK_OLD_PS1=
++ PATH=/var/lib/boinc-client/slots/7/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
++ PS1='(7) '
++ case "$_CONDA_SHELL_FLAVOR" in
++ hash -r
++ local _script_dir=/var/lib/boinc-client/slots/7/etc/conda/activate.d
++ '[' -d /var/lib/boinc-client/slots/7/etc/conda/activate.d ']'
+++ ls -A /var/lib/boinc-client/slots/7/etc/conda/activate.d
++ '[' -n ocl-icd_activate.sh ']'
++ local _path
++ for _path in "$_script_dir"/*.sh
++ . /var/lib/boinc-client/slots/7/etc/conda/activate.d/ocl-icd_activate.sh
+++ conda_ocl_icd_activate
++++ ls /var/lib/boinc-client/slots/7/etc/OpenCL/vendors/
+++ [[ -z ocl-icd-system ]]
+ export PATH=/var/lib/boinc-client/slots/7:/var/lib/boinc-client/slots/7/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ PATH=/var/lib/boinc-client/slots/7:/var/lib/boinc-client/slots/7/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:.
+ echo 'Create a temporary directory'
+ export TMP=/var/lib/boinc-client/slots/7/tmp
+ TMP=/var/lib/boinc-client/slots/7/tmp
+ mkdir -p /var/lib/boinc-client/slots/7/tmp
+ echo 'Install AToM'
+ REPO_URL=git+https://github.com/raimis/AToM-OpenMM.git@172e6db924567cd0af1312d33f05b156b53e3d1c
+ python -m pip install git+https://github.com/raimis/AToM-OpenMM.git@172e6db924567cd0af1312d33f05b156b53e3d1c
Running command git clone --filter=blob:none --quiet https://github.com/raimis/AToM-OpenMM.git /var/lib/boinc-client/slots/7/tmp/pip-req-build-0qwsbkqo
Running command git rev-parse -q --verify 'sha^172e6db924567cd0af1312d33f05b156b53e3d1c'
Running command git fetch -q https://github.com/raimis/AToM-OpenMM.git 172e6db924567cd0af1312d33f05b156b53e3d1c
Running command git checkout -q 172e6db924567cd0af1312d33f05b156b53e3d1c
error: subprocess-exited-with-error

&#195;&#151; python setup.py egg_info did not run successfully.
&#226;&#148;&#130; exit code: -4
&#226;&#149;&#176;&#226;&#148;&#128;> [0 lines of output]
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

&#195;&#151; Encountered error while generating package metadata.
&#226;&#149;&#176;&#226;&#148;&#128;> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
14:02:22 (3551): bin/bash exited; CPU time 2.696428
14:02:22 (3551): app exit status: 0x1
14:02:22 (3551): called boinc_finish(195)

</stderr_txt>
ID: 60184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60185 - Posted: 25 Mar 2023, 16:27:51 UTC - in response to Message 60173.  

(I haven't tried ATM under Windows yet, but can and will do so when I get a moment).

Just downloaded a BACE task for Windows. There may be trouble ahead...

The job.xml file reads:

<job_desc>
    <unzip_input>
       <zipfilename>windows_x86_64__cuda1121.zip</zipfilename>
    </unzip_input>
    <task>
        <application>python.exe</application>
        <command_line>bin/conda-unpack</command_line>
        <weight>1</weight>
    </task>
    <task>
        <application>Library/usr/bin/tar.exe</application>
        <command_line>xjvf input.tar.bz2</command_line>
        <setenv>PATH=$PWD/Library/usr/bin</setenv>
        <weight>1</weight>
    </task>
    <task>
        <application>C:/Windows/system32/cmd.exe</application>
        <command_line>/c call run.bat</command_line>
        <setenv>CUDA_DEVICE=$GPU_DEVICE_NUM</setenv>
        <stdout_filename>run.log</stdout_filename>
        <weight>1000</weight>
        <fraction_done_filename>progress</fraction_done_filename>
    </task>
</job_desc>


1) We had problems with python.exe triggering a missing DLL error. I'll run Dependency Walker over this one, to see what the problem is.

2) It runs a private version of tar.exe: Microsoft included tar as a system utility from Windows 10 onwards - but I'm running Windows 7. The MS utility wouldn't run for me - I'll try this one.

3) I'm not totally convinced of the cmd.exe syntax either, but we'll cross that bridge when we get to it.
ID: 60185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60186 - Posted: 25 Mar 2023, 17:16:04 UTC - in response to Message 60185.  
Last modified: 25 Mar 2023, 17:42:46 UTC

First reports from Dependency Walker:

"Error opening file: The system cannot find the file specified" for
API-MS-WIN-CORE-PATH-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-ERROR-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-ROBUFFER-L1-1-0.DLL
API-MS-WIN-CORE-WINRT-STRING-L1-1-0.DLL
DCOMP.DLL
IESHIMS.DLL

The API-MS-WIN group and IESHIMS.DLL usually resolve when delay-load files are loaded during the run. But I can't find DCOMP.DLL in either the unpacked libraries, or the Windows system disk.

DCOMP.DLL seems to be called from MSHTML.DLL, which is a Windows system file. But I still can't find it from there.

Enough for now - my head is spinning!

Edit - DCOMP.DLL is present on my Windows 10 - now Windows 11 - laptop. Another fine example of Microsoft version control.
ID: 60186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 326,008
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60188 - Posted: 26 Mar 2023, 8:24:32 UTC
Last modified: 26 Mar 2023, 9:21:02 UTC

Just a note of warning: one of my machines is running a JNK1 task - been running for 13 hours.

It's running fine - the run log has reached sample 287, and progress has reached 1.2654867256637168

But that's over 100%, and the BOINC display has reached (and is pegged at) 100% - probably has been for several hours. Ignore it.

Edit: It's reached sample 298. And I've found a [task name].cntl file, which contains the line

MAX_SAMPLES = 341

One reason why this needs fixing: I have my BOINC client set up in such a way that it normally fetches the next task around an hour before the current one is expected to finish. Because this one was (apparently) running so fast, it reached that point over five hours ago - and it's still waiting. Sorry Abouh - your next result will be late!
ID: 60188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Freewill

Send message
Joined: 18 Mar 10
Posts: 28
Credit: 41,810,209,419
RAC: 10,034,982
Level
Trp
Scientific publications
watwatwatwatwat
Message 60189 - Posted: 26 Mar 2023, 11:48:39 UTC

I also noticed this latest round of BACE tasks have become much longer to run on my GPUs. Some are hitting > 24 hrs. I am going to stop taking new ones unless the # samples/task is trimmed down.
ID: 60189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[SG] Felix

Send message
Joined: 29 Jan 16
Posts: 11
Credit: 32,223,035
RAC: 0
Level
Val
Scientific publications
watwat
Message 60190 - Posted: 26 Mar 2023, 12:16:20 UTC

I had this one running for about 8 hours, but then i had to shut down my computer.
Unfortunately, it couldn't restart from the app checkpoint, and since there is no boinc checkpoint, it crashed and reported no run time.

ID: 60190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 27 Jul 11
Posts: 138
Credit: 539,953,398
RAC: 0
Level
Lys
Scientific publications
watwat
Message 60191 - Posted: 26 Mar 2023, 12:31:35 UTC

Forget about a re-start, these WUs cannot even take a suspension. I suspended my computer and this WU collapsed.
task 27438865
ID: 60191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[SG] Felix

Send message
Joined: 29 Jan 16
Posts: 11
Credit: 32,223,035
RAC: 0
Level
Val
Scientific publications
watwat
Message 60192 - Posted: 26 Mar 2023, 13:10:54 UTC

i'm a bit surprised right now, i looked at the resend, it was successfully completed in just over 2 minutes, how come? the computer has more WUs that were successfully completed in such a short time. Am I doing something wrong?
ID: 60192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1114
Credit: 40,838,348,595
RAC: 4,765,598
Level
Trp
Scientific publications
wat
Message 60193 - Posted: 26 Mar 2023, 13:44:47 UTC - in response to Message 60189.  

I also noticed this latest round of BACE tasks have become much longer to run on my GPUs. Some are hitting > 24 hrs. I am going to stop taking new ones unless the # samples/task is trimmed down.


I agree, the 4-6hr runs are much better.
ID: 60193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60194 - Posted: 26 Mar 2023, 23:44:00 UTC

I have task that reached 100% an hour ago, which means it is suppose to be finished, but it's still running.............

https://www.gpugrid.net/workunit.php?wuid=27439822

I don't want to aborted it, but this is annoying..........

What would be the reasonable amount of time one lets it run?????

The runtime at posting time is 7 hours and 30 minutes.



ID: 60194 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1416
Credit: 9,119,446,190
RAC: 678,713
Level
Tyr
Scientific publications
watwatwatwatwat
Message 60195 - Posted: 27 Mar 2023, 1:25:12 UTC - in response to Message 60194.  

My last ATM tasks spent at least a couple of hours at the 100% completion point.

Just let them run and eventually they will turn themselves in for validation.
ID: 60195 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,731,645,728
RAC: 52,725
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 60196 - Posted: 27 Mar 2023, 1:39:49 UTC - in response to Message 60195.  

That's a mute point now. It errored out.

https://www.gpugrid.net/result.php?resultid=33381994

I guess this goes with the territory.




ID: 60196 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 35 · Next

Message boards : News : ATM

©2025 Universitat Pompeu Fabra