Message boards :
Number crunching :
New version of ACEMD 2.17 on multi GPU hosts
Message board moderation
Previous · 1 · 2 · 3 · Next
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have two ADRIA tasks running now on host 132158 - Linux Mint, driver v460. htop shows that they have different command lines, ending in '--device 0' and '--device 1'. nvidia-smi shows an acemd3 app running on GPU 0, and another running on GPU 1. All is looking good so far. The only strange thing is that one is running app version 101, and the other is running version 1121. Two identical cards, so we'll see who wins! |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level ![]() Scientific publications
|
that's the best test we can hope for, the most apples to apples. I'd certainly be interested to know if one is significantly faster than the other.
|
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I got two new 2.18 tasks, one each on two hosts. Both CUDA_101 though. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Here's a taster show, after 4 hours elapsed: v1121 at 12.727% (device 1, in 4x PCIe slot) v101 at 11.368% (device 0, in 16x PCIe slot, driving monitor) |
|
Send message Joined: 28 Jan 21 Posts: 6 Credit: 106,022,917 RAC: 0 Level ![]() Scientific publications
|
I received 4 GPUGrid WU's on my dual GPU system - RTX3070 and RTX2070..... And it was happily crunching 1 unit on each of the GPu's, until Boinc downloaded and ran a WCG unit. The GPUGrid unit then failed with this message.... <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 15:50:51 (128895): wrapper (7.7.26016): starting 15:50:51 (128895): wrapper (7.7.26016): starting 15:50:51 (128895): wrapper: running /bin/tar (xf conda-pack.tar.bz2) 15:52:07 (128895): /bin/tar exited; CPU time 75.576773 15:52:07 (128895): wrapper: running bin/acemd3 (--boinc --device 0) 19:27:16 (136305): wrapper (7.7.26016): starting 19:27:16 (136305): wrapper (7.7.26016): starting 19:27:16 (136305): wrapper: running bin/acemd3 (--boinc --device 1) ERROR: /home/user/conda/conda-bld/acemd3_1618916459379/work/src/mdsim/context.cpp line 318: Cannot use a restart file on a different device! 19:27:20 (136305): bin/acemd3 exited; CPU time 3.452513 19:27:20 (136305): app exit status: 0x9e 19:27:20 (136305): called boinc_finish(195) 19:27:16 is exactly the timestamp that the WGC process started. looks like it wont play happily with different projects. Has anyone else seen this? I've suspended WCG for the moment. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,186,510 RAC: 4,349,800 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Has anyone else seen this? It is an old known problem . Please take a look to Toni Message #52865, dated on Oct 17 2019. Specially, question about Can I use it on multi-GPU systems? Your failed task started at device 0, then it restarted at device 1... |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level ![]() Scientific publications
|
I received 4 GPUGrid WU's on my dual GPU system - RTX3070 and RTX2070..... you need to extend the time period for task switching in compute preferences. depending on how slow or fast your GPU is, and since these GPUGRID tasks can take 12-24+ hrs depending on GPU power, you might need to set this to a very high value. I have it set to 24hrs (1440 minutes) on my hosts. If you're running GPUGRID, might be a better option to set other projects to a resource share of 0 so that they only ask for work when no GPUGRID work is present. FYI, this issue will happen if you simply stop BOINC and/or reboot your system. you'll need to be fine with leaving your system on for days at a time potentially.
|
|
Send message Joined: 28 Jan 21 Posts: 6 Credit: 106,022,917 RAC: 0 Level ![]() Scientific publications
|
Thanks for the quick reply clarifying the problem. But 2 years and no fix to what seems like quite a basic problem...... |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Wait a minute . . . . . I thought I read in this thread on the previous beta releases that the restarting on a different device issue was solved??? |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level ![]() Scientific publications
|
Wait a minute . . . . . I thought I read in this thread on the previous beta releases that the restarting on a different device issue was solved??? That wasn’t the problem seen in previous app versions. We were seeing all tasks running on the same GPU.
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'd certainly be interested to know if one is significantly faster than the other. The head-to-head speed comparison results are in. Both tasks completed and validated, and both were given the same credit score. Cards are GTX 1660 SUPER (ASUS TUF, if it matters). Runtime: v1121 113,110.14 sec v101 126,707.98 sec (12% longer) Speed: v1121 3.18% / hour (12% faster) v101 2.84% / hour |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
If they keep both apps active, then the BOINC mechanism for choosing the most efficient application should become active once 10 valid tasks are completed on both apps. The 1121 app's APR should prevail. |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Hmmmm . . . . not enough tasks to draw a concrete conclusion, but on my daily driver with three identical RTX 2080 cards, the CUDA101 app was 2000 seconds faster than the CUDA1121 app. https://www.gpugrid.net/results.php?userid=516740&offset=0&show_names=0&state=3&appid= Though might be attributed to restarting on a different device. But same type of card. All cards are hybrids and have temps well under control and boost the same. |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Anybody seen any sign of your credits exported to 3rd party aggregation websites yet? Finished work over a day ago and still no stats from GPUGrid. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Anybody seen any sign of your credits exported to 3rd party aggregation websites yet? No. https://www.gpugrid.net/stats/ is accessible, but the files in it are dated September 16. Somebody needs to restart a script. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,186,510 RAC: 4,349,800 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Anybody seen any sign of your credits exported to 3rd party aggregation websites yet? Good observation. My statistics for GPUGRID at BOINC STATS are still also blank since new app v2.18 ADRIA tasks came out. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Looking into this |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
fixed |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,187,696,190 RAC: 1,276,885 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Thanks, Gianni. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,186,510 RAC: 4,349,800 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
fixed Working again, thanks |
©2026 Universitat Pompeu Fabra