Message boards :
News :
Update acemd3 app
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The task I mentioned has now completed and reported - visible on the link I posted last night. The actual binary executable is still acemd3, dated 28 September 2021 - you can see the name in stderr.txt So all that has changed is the 'friendly name' stored in the projects's database for that application ID. The only other thing I noticed was that the big upload started incredibly slowly - averaging around 30 kilobyte/sec. But it must have speeded up to something closer to my raw upload link speed of 16 megabit/sec - the whole thing was finished in about half an hour. I can only ascribe it to roadworks somewhere on the route from UK to Spain. Probably more due to the current geopolitical situation, and not under the project's control. |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
That's what I see from the client_state file: <app> <name>acemd3</name> <user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
Then the next 2 WUs show up as 1.0 and: <app> <name>acemd4</name> <user_friendly_name>Advanced molecular dynamics simulations for GPUs</user_friendly_name> <non_cpu_intensive>0</non_cpu_intensive> </app> |
![]() Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() |
That's interesting. You'd think they'd have a link to the Apps page, but no. The first 5 of those new acemd4 WUs failed within a few minutes. Stderr output <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 07:45:46 (99083): wrapper (7.7.26016): starting 07:45:46 (99083): wrapper (7.7.26016): starting 07:45:46 (99083): wrapper: running /bin/tar (xf x86_64-pc-linux-gnu__cuda1121.tar.bz2) 07:52:57 (99083): /bin/tar exited; CPU time 424.196280 07:52:57 (99083): wrapper: running bin/python (pre_run.py) File "/var/lib/boinc-client/slots/36/pre_run.py", line 1 <soft_link>../../projects/www.gpugrid.net/T1_3-RAIMIS_TEST-0-pre_run</soft_link> ^ SyntaxError: invalid syntax 07:52:58 (99083): bin/python exited; CPU time 0.137151 07:52:58 (99083): app exit status: 0x1 07:52:58 (99083): called boinc_finish(195) </stderr_txt> ]]> |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Then the next 2 WUs show up as 1.0 and: Just had one of those run through to completion: T5_5-RAIMIS_TEST-1-3-RND1908_0 I think that's the first I've seen from RAIMIS which both: * Was explicitly designated as a GPU task (cuda 1121) * Ran right through to validation Congratulations! It was a very quick test run - under 7 minutes - but all the moving parts seem to have been assembled into the right order. The actual binary (as listed in stderr_txt) is 'acemd', not acemd4: that might be worth tidying up in the future. |
Send message Joined: 16 Dec 08 Posts: 7 Credit: 1,549,469,403 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ok. After research and googling, I got this app work on my 3rd machine. Reason tasks to fail seemed to be in vcruntime DLL´s. Why that was? I have no idea since my other two Win10pro machines havent had this problem. At least task I got now has been running 8mins ok instead usual 15-38s fail. |
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,731,645,728 RAC: 47,738 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had 2 WUs running today. They both made it up to 66.666% complete, then they stayed there for a few hours doing nothing, There was no CPU nor GPU usage. So I aborted both of them. How long was I supposed to keep them "running" like that? https://www.gpugrid.net/result.php?resultid=32880450 https://www.gpugrid.net/result.php?resultid=32880506 That enough of beta testing for a day or so. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
You should have either exited BOINC and restarted or suspend/resume the tasks to get them moving again. The Pythons tasks do checkpoint and can be resumed across different cards with no ill effect. |
Send message Joined: 9 May 21 Posts: 16 Credit: 1,435,881,404 RAC: 20 Level ![]() Scientific publications ![]() |
ALL my tasks finish with 195 (0xc3) EXIT_CHILD_FAILED WHY? |
Send message Joined: 21 Feb 20 Posts: 1114 Credit: 40,838,348,595 RAC: 4,765,598 Level ![]() Scientific publications ![]() |
ALL my tasks finish with you have several more specific errors. "ACEMD failed: Error invoking kernel: CUDA_ERROR_ILLEGAL_ADDRESS (700)" "ACEMD failed: Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)" "ACEMD failed: Particle coordinate is nan" it's possible a driver issue for the CUDA errors. use a program called DDU (Display Driver Uninstaller) to totally wipe out the drivers. then re-install fresh from the nvidia package. In my opinion, the 470-series driver are most stable for crunching. the newer drivers will get you the slightly different CUDA 11.21 app also. "particle coordinate is nan" (nan= not a number) is usually an overclocking issue, or a bad WU. ![]() |
![]() ![]() Send message Joined: 23 Feb 11 Posts: 101 Credit: 1,589,743,957 RAC: 302,797 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Good afternoon everyone, wu's have not been available in acemd3 environment for a long time. While waiting to migrate to acemd4, can you tell me if there will be work for the Windows environment soon? Thanks in advance Dreams do not always come true. But not because they are too big or impossible. Why did we stop believing. (Martin Luther King) |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Good afternoon everyone, I would be surprised if you receive a reply :-( |
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 869 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Good afternoon everyone, so, did I promise too much? |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
The preponderance of work lately has been 99:1 for the Python tasks. I was very surprised to get a acemd3 a couple of days ago. I haven't seen any acemd4 tasks since their initial beta run. I have been doing nothing but Python tasks almost exclusively since Abouh opened the taps for them. My gpus are constantly busy with Python tasks and haven't had a break in months. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've had eight ACEMD 3 tasks since Friday - six 'ADRIA' (the long-running ones), and two 'CRYPTICSCOUT' (significantly shorter). One oddity is that the credit for 'ADRIA' tasks has been substantially reduced, but the credit for 'CRYPTICSCOUT' hasn't. Was that deliberate, I wonder? |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Richard, maybe you can answer this puzzle. I have reduced the resource share for GPUGrid on all my hosts. I have observed no change in the frequency of Python tasks running. They run non-stop, one finishing and reporting and then downloading the next and run the next one with no interruption. However the acemd3 task sat for two days before it finally started running. I know the REC balancing mechanism came into play when I reduced the resource share among projects. Does the REC mechanism somehow take account of the different APR ratings for separate applications? I would have thought its lowest granularity would be at the simple project level. But it seems to have been applied at the application level. The APR for the acemd3 tasks has been developed over many years along with the tally of credits for that application. The python tasks however are relatively new and haven't produced as much credit so far compared to the total project credit for acemd3. Was this the case for the REC mechanism? That the python tasks are still in need of balancing against the acemd3 credit history? And that is why the acemd3 task was not in need of immediate running compared to the python tasks? Both types have the same 5 day deadlines. But a Python task is serviced immediately still and has not let my other project gpu applications a chance to run yet. |
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 295,172 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
The difference in behaviour will be down to the client scheduler (not a separate program - an integral part of the client code). On the Linux machines, where I run Python, deadline will be exceeded by so much that the client actively throws other tasks off the card so that they start immediately. On my Windows machines, where I run ACEMD 3, I think deadline pressure is much lower, so the client waits for a convenient moment to switch over (*). The trouble is: I'm running 2 x Einstein tasks when there's no ACEMD, and they don't necessarily finish at the same moment. So the client scheduler only sees 0.5 GPUs free, and (on my settings) an ACEMD task won't fit in half a card. So the client starts another Einstein half-task instead, and the cycle starts again. And can continue until the deadline pressure gets really serious. The client knows about deadlines, and will honour them as best it can: but it doesn't know about 50% and 25% bonuses, so it takes no notice of them. * unverified: I'll take a proper look tomorrow. I've been out all day. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
OK, thanks for the comment. As usual, I overthought the problem. It is simply a matter of the estimated time to completion in the rr_simulation and cpu scheduling code pushing the python tasks to the forefront because of their outlandish estimated runtimes. The acemd3 having actual realistic values was not in any hurry to be started allowing my other gpu tasks a chance to run normally. I did lose out on any bonus points and was the only downside to the late running. It got less credit than the Python tasks. But as I mentioned, the acemd3 tasks lately have been a rarity here. I have restricted the GPUGrid tasks to run only on the slower gpus to allow the most powerful gpus to run the Einstein and MW tasks where their production is most noticeable. If I need to I can NNT the GPUGrid work to get more production out of my hosts allowing all the gpus to be used for all my gpu projects. |
Send message Joined: 27 Aug 21 Posts: 38 Credit: 7,254,068,306 RAC: 0 Level ![]() Scientific publications ![]() |
I am not sure if I am wording this question properly, but does acemd3 use single or double point precision? This is more just out of curiosity versus anything else. Also, is there a way to tell what a program/app is using by looking within the OS that is running it? Even though this is an acemd3 thread, what about the python tasks? Thanks for any insights. |
![]() Send message Joined: 13 Dec 17 Posts: 1416 Credit: 9,119,446,190 RAC: 614,515 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Yes, the running tasks can be identified by their science applications in the running processes on a host. For acemd3, logically it is acemd3 along with the BOINC wrapper application. For Python on Gpu tasks, it is 32 python processes along with the BOINC wrapper application. Depending on OS you can see the running processes in something called Task Manager or System Monitor or Process Explorer. |
©2025 Universitat Pompeu Fabra