ACEMD updated app

Message boards : News : ACEMD updated app
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 59700 - Posted: 10 Jan 2023, 9:53:01 UTC

As I said. We are currently compiling the Windows version.

GDF
ID: 59700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59708 - Posted: 10 Jan 2023, 15:40:06 UTC - in response to Message 59700.  

might as well compile it for CUDA 11.8 to bring Ada (40-series) support.
ID: 59708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HZL

Send message
Joined: 23 Nov 08
Posts: 1
Credit: 612,500
RAC: 0
Level
Gly
Scientific publications
wat
Message 59720 - Posted: 15 Jan 2023, 10:42:31 UTC - in response to Message 59700.  

大家好! 我在中国上海 如何让GPU 工作在百分之一百的状态 我发现在运行时GPU 一直在百分之30左右![img][/img]
ID: 59720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59722 - Posted: 15 Jan 2023, 17:31:49 UTC - in response to Message 59720.  

大家好! 我在中国上海 如何让GPU 工作在百分之一百的状态 我发现在运行时GPU 一直在百分之30左右![img][/img]


这个情况对于这个Python程序很正常,这个python程序用更多的CPU,而不是GPU。GPU的使用会被CPU限制。如果你同时运行两个任务,可以提高GPU的使用。但是在用这个Python程序的时候,你无法让GPU达到百分之百的状态。
ID: 59722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
guoyeah

Send message
Joined: 17 Mar 10
Posts: 1
Credit: 5,362,500
RAC: 0
Level
Ser
Scientific publications
wat
Message 59725 - Posted: 17 Jan 2023, 3:35:22 UTC - in response to Message 59720.  

我Nvidia能到80%。我也同时在运行其他的CPU(20%)和Intel GPU(97%)项目。电源调成最佳性能后,CPU到50%。Intel i7 12代。
ID: 59725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pop Piasa
Avatar

Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59734 - Posted: 18 Jan 2023, 15:34:32 UTC
Last modified: 18 Jan 2023, 15:52:09 UTC

Looking around I see the present batch of protein ligand sims are crashing... DARNIT!


process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
22:58:08 (3209098): wrapper (7.7.26016): starting
22:58:25 (3209098): wrapper (7.7.26016): starting
22:58:25 (3209098): wrapper: running /bin/bash (run.sh)
/bin/bash: run.sh: No such file or directory
22:58:26 (3209098): /bin/bash exited; CPU time 0.001795
22:58:26 (3209098): app exit status: 0x7f
22:58:26 (3209098): called boinc_finish(195)

anything else found?
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation
ID: 59734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59735 - Posted: 18 Jan 2023, 16:20:48 UTC - in response to Message 59734.  

Looking around I see the present batch of protein ligand sims are crashing... DARNIT!


process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
22:58:08 (3209098): wrapper (7.7.26016): starting
22:58:25 (3209098): wrapper (7.7.26016): starting
22:58:25 (3209098): wrapper: running /bin/bash (run.sh)
/bin/bash: run.sh: No such file or directory
22:58:26 (3209098): /bin/bash exited; CPU time 0.001795
22:58:26 (3209098): app exit status: 0x7f
22:58:26 (3209098): called boinc_finish(195)

anything else found?


if someone can preserve the data files and slot directory before it gets uploaded and subsequently wiped from your system, should be easy to figure out what's wrong.

my guess is they didn't name that run.sh file properly (via open_name probably), or didnt add a task to extract the file in the wrapper config file (jobs.xml), or something along those lines.
ID: 59735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59736 - Posted: 18 Jan 2023, 16:42:18 UTC - in response to Message 59735.  

actually I have some on my system so i took a look.

there appear to be many things wrong.

the job.xml file is calling just tar, with no reference to what tar is. this should probably be /bin/tar to use the system tar.

the extracted run.sh script looks woefully lacking in detail. i can see it trying to call python and conda from 'bin/' but that is not included in the input package and will fail. the input tarball only includes some text/config files and not the whole python package.
ID: 59736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 59738 - Posted: 18 Jan 2023, 17:04:57 UTC - in response to Message 59736.  

What app exactly?
ID: 59738 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59739 - Posted: 18 Jan 2023, 17:08:38 UTC - in response to Message 59738.  
Last modified: 18 Jan 2023, 17:09:18 UTC

What app exactly?


the new free energy one ('ATM' moniker). using the wrapper to call the run.sh script.

also it would be a good idea to add a checkbox for this app in project preferences. this app showed up with no warning and no announcement from the project and no way to prevent it it seems. I'm not sure if it's marked as beta or not.
ID: 59739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 59740 - Posted: 18 Jan 2023, 17:11:01 UTC - in response to Message 59739.  

Yes, we should have made a beta, but this app is not related to this thread.
ID: 59740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59741 - Posted: 18 Jan 2023, 17:12:16 UTC - in response to Message 59740.  

Yes, we should have made a beta, but this app is not related to this thread.


you're right, but there is no announcement thread for this app, so no where else appropriate in the News section to get your attention about it.
ID: 59741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 59746 - Posted: 18 Jan 2023, 17:22:45 UTC - in response to Message 59741.  

Soon we will announce it. This is just testing to see if it works which should have been done on a beta app.

I expect tons of workunits using this app. Soon I will introduce a new postdoc running the simulations.

g
ID: 59746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59749 - Posted: 18 Jan 2023, 17:53:44 UTC - in response to Message 59746.  
Last modified: 18 Jan 2023, 17:54:51 UTC

interesting to see that Ada "should" run on the Ampere cubins. I know the app has an architecture compatibility check, and it may fail there even if it could otherwise work.

you could also consider compiling your apps with the PTX version for forward compatibility

like this:
-gencode=arch=compute_86,code=sm_86
-gencode=arch=compute_86,code=compute_86

and the user can set the environment variable as needed. or you could set it in the wrapper config file
ID: 59749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1162
Credit: 12,205,098,501
RAC: 9,135,494
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59758 - Posted: 19 Jan 2023, 6:02:06 UTC
Last modified: 19 Jan 2023, 6:33:00 UTC

I am successfully running the current ACEMD_3 tasks on a GTX980ti, on a Quadro P5000, and on two RTX3070.
However, they fail on a GTX1650 after a few seconds:

https://www.gpugrid.net/result.php?resultid=33263379
https://www.gpugrid.net/result.php?resultid=33263343

can anyone tell me what might be the reason?
ID: 59758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 588
Credit: 11,396,036,510
RAC: 11,719,261
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59759 - Posted: 19 Jan 2023, 6:32:46 UTC - in response to Message 59758.  

As a first, you can try resetting GPUGRID project at failing host.
But probably the reason is 4GB RAM being too short for executing these tasks.
ID: 59759 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1162
Credit: 12,205,098,501
RAC: 9,135,494
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59760 - Posted: 19 Jan 2023, 6:40:38 UTC - in response to Message 59759.  
Last modified: 19 Jan 2023, 6:41:41 UTC

...
But probably the reason is 4GB RAM being too short for executing these tasks.

that's what I am guessing, too.
However, I was closely watching the RAM usage (via MemInfo) when the tasks started: at the moment the task crashed, about 2 GB were still free.
Further, for the tasks running on the other hosts mentioned above, the Windows tasks manager shows a RAM usage between 60MB and 400MB per task.
Maybe the CPU Intel Core2 Duo E7400 @ 2.80GHz is too old for these tasks?
(However, some other GPU projects like Einstein, WCG and Primegrid are running well).
ID: 59760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1109
Credit: 40,469,283,595
RAC: 3,993,807
Level
Trp
Scientific publications
wat
Message 59763 - Posted: 19 Jan 2023, 13:15:42 UTC - in response to Message 59760.  

...
But probably the reason is 4GB RAM being too short for executing these tasks.

that's what I am guessing, too.
However, I was closely watching the RAM usage (via MemInfo) when the tasks started: at the moment the task crashed, about 2 GB were still free.
Further, for the tasks running on the other hosts mentioned above, the Windows tasks manager shows a RAM usage between 60MB and 400MB per task.
Maybe the CPU Intel Core2 Duo E7400 @ 2.80GHz is too old for these tasks?
(However, some other GPU projects like Einstein, WCG and Primegrid are running well).


i could very well be that the CPU is too old. it does not support AVX extensions for example, and if the application is built with this requirement then that could be a reason.
ID: 59763 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1162
Credit: 12,205,098,501
RAC: 9,135,494
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 59766 - Posted: 19 Jan 2023, 15:27:18 UTC - in response to Message 59763.  


it could very well be that the CPU is too old. it does not support AVX extensions for example, and if the application is built with this requirement then that could be a reason.

perhaps one of the GPUGRID people could tell me if this is the case?
ID: 59766 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 6 Mar 18
Posts: 38
Credit: 1,323,842,080
RAC: 328,325
Level
Met
Scientific publications
wat
Message 59767 - Posted: 19 Jan 2023, 15:58:18 UTC

Just had one and it failed after 26 seconds on my 4090
ID: 59767 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : ACEMD updated app

©2025 Universitat Pompeu Fabra