More Acemd3 tests

Message boards : News : More Acemd3 tests
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52582 - Posted: 6 Sep 2019, 12:23:10 UTC

We've uploaded Windows and Linux apps named "acemd3". If thing go as expected, they should be the new simulation engine. They should be an improvement on many aspects, especially maintainability and compatibility with RTX.

There were a few short test workunits (TONI_TEST). Larger one should come soon. Please be patient as we iron out the details.
ID: 52582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52583 - Posted: 6 Sep 2019, 13:05:32 UTC - in response to Message 52582.  

By the way: things we'd need a comment on:

1. do PCs with multiple GPUs work as expected?
2. does suspend/restart work as expected?
ID: 52583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eXaPower

Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52584 - Posted: 6 Sep 2019, 13:21:41 UTC - in response to Message 52582.  

By the way: things we'd need a comment on:

1. do PCs with multiple GPUs work as expected?
2. does suspend/restart work as expected?


1. No - App not allowing 2/3/4/5 GPUs to run concurrent - Only 1 GPU at a time while other Turing error out.

http://www.gpugrid.net/results.php?hostid=208061
http://www.gpugrid.net/workunit.php?wuid=16748681
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
08:39:48 (1632): wrapper (7.9.26016): starting
08:39:48 (1632): wrapper: running acemd3.exe (--boinc input --device 1)
# Engine failed: Illegal value for DeviceIndex: 1
08:39:49 (1632): acemd3.exe exited; CPU time 0.000000
08:39:49 (1632): app exit status: 0x1
08:39:49 (1632): called boinc_finish(195)

2. Yes and No suspend/restart worked it just error once it restarted WU.

http://www.gpugrid.net/result.php?resultid=21350515

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
08:55:41 (4032): wrapper (7.9.26016): starting
08:55:41 (4032): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1845} normal block at 0x0000005BE25C15C0, 8 bytes long.
Data: < M [ > 00 00 4D E2 5B 00 00 00
..\lib\diagnostics_win.cpp(417) : {203} normal block at 0x0000005BE25C43B0, 1080 bytes long.
Data: < > 04 0C 00 00 CD CD CD CD EC 00 00 00 00 00 00 00
Object dump complete.
09:09:55 (3728): wrapper (7.9.26016): starting
09:09:55 (3728): wrapper: running acemd3.exe (--boinc input --device 0)
# Engine failed: The periodic box size has decreased to less than twice the nonbonded cutoff.
09:09:58 (3728): acemd3.exe exited; CPU time 0.000000
09:09:58 (3728): app exit status: 0x1
09:09:58 (3728): called boinc_finish(195)



ID: 52584 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank [NT]

Send message
Joined: 30 Sep 17
Posts: 2
Credit: 117,178,904
RAC: 0
Level
Cys
Scientific publications
wat
Message 52585 - Posted: 6 Sep 2019, 13:32:14 UTC - in response to Message 52583.  

Hi Toni,
i got 2 of them.
The 1. was at 32% when i suspend it, the 2. startet.
I restarted the 1.WU, and when 1 suspend the 2.WU to continue the 1. it exit with an error.
Then the 2. (still was at 0.0%) startet from itself.

At 37% i suspend and restartet it, it also exit with an error.

You can find my WU's here

GTX 1660 Ti and Windows 10

I hope it helps you to improve the app.
ID: 52585 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PappaLitto

Send message
Joined: 21 Mar 16
Posts: 513
Credit: 4,673,458,277
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 52586 - Posted: 6 Sep 2019, 15:10:05 UTC

Hey Toni,

The one I received errored out with only one 2080ti and no other cards in the system on windows:

http://www.gpugrid.net/result.php?resultid=21344094
ID: 52586 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STARBASEn
Avatar

Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52587 - Posted: 6 Sep 2019, 18:04:26 UTC
Last modified: 6 Sep 2019, 18:05:52 UTC

http://www.gpugrid.net/workunit.php?wuid=16749264
This WU run concurrently with E@H fine (2x GTX1060's). Suspended it once with leave WU in memory and it restarted fine from where it left off. Same with another ACEMD 2.06 WU but on another machine with only one GTX1060.
ID: 52587 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,714,845,728
RAC: 648,677
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52589 - Posted: 6 Sep 2019, 23:25:22 UTC - in response to Message 52583.  

By the way: things we'd need a comment on:

1. do PCs with multiple GPUs work as expected?
2. does suspend/restart work as expected?



I managed to get 1 of these unit on my windows 7 computer, with 1 rtx 2080ti card. It took nearly a minute from the time it started running for "elapsed" time to start moving and about another minute for the "process" % to start moving. I let it run for about 5 minutes before suspending it, (it was about 20% complete). It stopped within a couple of seconds. I waited about 30 seconds before resuming it, and it crashed within a few seconds. During its run time, the GPU usage was low (under 65%), and on all 6 of the CPU cores, usage was jumping up and down from 0 to 100%, according to Afterburner. I never seen that before.

I didn't get a chance to run it on a multiple GPU computer, but send out more units and I will let you know what happens.

See link:

http://www.gpugrid.net/result.php?resultid=21352529





ID: 52589 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,714,845,728
RAC: 648,677
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52592 - Posted: 7 Sep 2019, 2:46:54 UTC

I ran 2 of the units on my windows 10 machine. This machine has a gtx 980 ti, which was running long unit, while the rtx 2080 ti was running the new version of ACEMD v2.06 (cuda100)unit. When I let the test unit run from start to finish without interruption, it finishes successfully, but when I suspend it and then resume it, it will crash within a few seconds. GPU usage on this machine was 80% maximum, compared to 90% usage for the long run, which was running on the 980 ti.

http://www.gpugrid.net/results.php?hostid=263612&offset=0&show_names=1&state=0&appid=32


ID: 52592 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
clemmo

Send message
Joined: 24 Jun 12
Posts: 2
Credit: 63,396,146
RAC: 0
Level
Thr
Scientific publications
watwat
Message 52596 - Posted: 7 Sep 2019, 12:39:31 UTC
Last modified: 7 Sep 2019, 12:48:20 UTC

I've also had a test app have an error when suspended then resumed. Currently have one running. I'll let it go to see if it goes to completion.

The workunit seems to be using 1 full CPU core and 92% GPU Load.
My CPU is i7-4790 and GPU is GTX 1660.
ID: 52596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kksplace

Send message
Joined: 4 Mar 18
Posts: 53
Credit: 2,815,476,011
RAC: 84,145
Level
Phe
Scientific publications
wat
Message 52598 - Posted: 7 Sep 2019, 13:32:54 UTC - in response to Message 52583.  

This test WU was suspended twice (once using Suspend, once using Suspend GPU in BOINC Manager) and successfully restarted and completed.

http://www.gpugrid.net/result.php?resultid=21354832
ID: 52598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,714,845,728
RAC: 648,677
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52599 - Posted: 7 Sep 2019, 14:19:53 UTC - in response to Message 52598.  

This test WU was suspended twice (once using Suspend, once using Suspend GPU in BOINC Manager) and successfully restarted and completed.

http://www.gpugrid.net/result.php?resultid=21354832


You're running linux with a GTX1080 card, while I am running windows with a RTX card. This is either a OS problem or a card type problem. To determine what is the problem we need to run these WU's on a non RTX card with windows and/or RTX card on linux.





ID: 52599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bedrich Hajek

Send message
Joined: 28 Mar 09
Posts: 490
Credit: 11,714,845,728
RAC: 648,677
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52600 - Posted: 7 Sep 2019, 14:43:50 UTC - in response to Message 52599.  

This test WU was suspended twice (once using Suspend, once using Suspend GPU in BOINC Manager) and successfully restarted and completed.

http://www.gpugrid.net/result.php?resultid=21354832


You're running linux with a GTX1080 card, while I am running windows with a RTX card. This is either a OS problem or a card type problem. To determine what is the problem we need to run these WU's on a non RTX card with windows and/or RTX card on linux.






It looks like it is a windows problem. I ran this unit on a GTX 980 ti on windows 10. I suspended and resumed it. It crashed a few seconds after resuming.



http://www.gpugrid.net/result.php?resultid=21355024



ID: 52600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Billy Ewell 1931

Send message
Joined: 22 Oct 10
Posts: 42
Credit: 1,728,050,315
RAC: 786,734
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52601 - Posted: 7 Sep 2019, 15:14:24 UTC

9/7/2019 9:44:40 AM | GPUGRID | task a70-TONI_TESTDHFR206b-9-30-RND0994_0

This task assigned to i7 Windows 10 and RTX 2080: Processed about 1:00 minute, suspended for 30 seconds, resumed and the task immediately failed.
ID: 52601 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jul 16
Posts: 338
Credit: 7,904,541,558
RAC: 376,887
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52602 - Posted: 7 Sep 2019, 16:37:08 UTC
Last modified: 7 Sep 2019, 16:37:22 UTC

Running OK on 2x GPU system:
https://www.gpugrid.net/results.php?hostid=475308

Results show a -device 0 or -device 1.
ID: 52602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
clemmo

Send message
Joined: 24 Jun 12
Posts: 2
Credit: 63,396,146
RAC: 0
Level
Thr
Scientific publications
watwat
Message 52603 - Posted: 8 Sep 2019, 0:15:42 UTC

Just received another test task. Decided to check the suspend/resume. Computation error on resume still. GTX1660
ID: 52603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers

Send message
Joined: 13 Dec 17
Posts: 1400
Credit: 8,616,046,190
RAC: 8,556,950
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52604 - Posted: 8 Sep 2019, 0:34:18 UTC

I continue to have no luck getting any of these new test tasks.
ID: 52604 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Billy Ewell 1931

Send message
Joined: 22 Oct 10
Posts: 42
Credit: 1,728,050,315
RAC: 786,734
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52605 - Posted: 9 Sep 2019, 16:55:58 UTC - in response to Message 52583.  
Last modified: 9 Sep 2019, 16:59:14 UTC

TONI:

The GPUGrid configuration (below)is set specifically to accommodate my i7, Windows 10 with RTX 2080. I momentarily selected both short and long runs ACEMD tasks and two immediately in sequence failed.
Do you wish us to continue the Pause-then-Resume on the ACEMD3 and other special test tasks for the RTX cards.

My three other machines with Windows and GTX 750ti and 1060s set idle as far as GPUGrid is concerned.

ACEMD short runs (2-3 hours on fastest card): no
ACEMD long runs (8-12 hours on fastest GPU): no
ACEMD3: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no
If no work for selected applications is available, accept work from other applications?no
ID: 52605 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 193,866
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52606 - Posted: 9 Sep 2019, 22:45:03 UTC - in response to Message 52605.  

The GPUGrid configuration (below)is set specifically to accommodate my i7, Windows 10 with RTX 2080. I momentarily selected both short and long runs ACEMD tasks and two immediately in sequence failed.
These were downloaded from the "long" queue, which has only the old client, which is not compatible with Turing (RTX + GTX 1660, 1650) cards. As of yet, you should select only the ACEMD3 queue for Turing cards.

My three other machines with Windows and GTX 750ti and 1060s set idle as far as GPUGrid is concerned.
You should set up two different venues (one for ACEMD3 only for Turing, one for short+long for older cards), and assign your hosts to these venues according their GPUs.
ID: 52606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers

Send message
Joined: 13 Dec 17
Posts: 1400
Credit: 8,616,046,190
RAC: 8,556,950
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52607 - Posted: 10 Sep 2019, 0:48:49 UTC

Since I have been unable to get any of these new acemd3 tasks, is it valid to say that only the Windows hosts are having issues? And that the Linux hosts continue to not have any issues with the new app or tasks? I've only seen one post from a Linux user saying they had no issues.

I was hoping to test for myself the new apps and higher rewarding tasks. I had no issues with the previous beta and tasks back in July. No such luck for the new apps and tasks in retrieving either so far.
ID: 52607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
klepel

Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,792,731,008
RAC: 124,733
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52608 - Posted: 10 Sep 2019, 1:01:14 UTC - in response to Message 52607.  

And that the Linux hosts continue to not have any issues with the new app or tasks? I've only seen one post from a Linux user saying they had no issues.

It seems to me that LINUX hosts do not have issues with the new app (Acemd3). My three hosts work just fine, if they receive WUs (once a day).
Since I have been unable to get any of these new acemd3 tasks, is it valid to say that only the Windows hosts are having issues?

Only one of my Windows hosts with Turing Card has received WUs: The first was finished successfully. The second one, I stopped at the one minute mark, after restart it crashed after 2 seconds: http://www.gpugrid.net/result.php?resultid=21364023

From my small samples size, I would think LINUX works fine and we might start regular production (Toni?), Windows does not work yet.
ID: 52608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : News : More Acemd3 tests

©2025 Universitat Pompeu Fabra