Message boards :
Number crunching :
New version of ACEMD requires libboost v1.74
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So, late on a Saturday evening, I get sent WU 27077711. Initial replication 2, quorum 1. Mine was _6, sent after six previous failures. I could have handled that, because I've installed the required library. But it was 'cancelled by server', because nobody else has. Everyone, please refer to Message 57067. It only takes seconds, and you don't even have to reboot. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I managed to pick up 3 cryptic scout resends today and successfully crunched and validated them. |
|
Send message Joined: 28 Jan 21 Posts: 6 Credit: 106,022,917 RAC: 0 Level ![]() Scientific publications
|
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. I've installed libboost v1.74 and am getting a different error. I shouldn't have to diagnose, alter and tweak my system, when the issue is with the app itself - Hopefully the developers will fix this soon. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. I definitely agree. The app should be updated to include the necessary package.
|
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. +1 To lacking of libboost v1.74 library, a problem of many wrongly constructed WUs in current batch can be added. Even with libboost workaround applied locally, these tasks are failing anyway with this other error: EXCEPTIONAL CONDITION: /home/user/conda/conda-bld/acemd3_1618916459379/work/src/mdio/bincoord.c, line 193: "nelems != 1" ...And they come to extinguish due to "max # of errors" reached. Since number of tasks in progress is reduced to 3 at this time, it is consistent with Project be flushing the tasks buffer before a new App version is launched. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. yeah i've seen that on all new tasks that have come through recently. they seem to be malformed from the project. not a problem with missing packages here, just a problem with the WU itself.
|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've just got a new ADRIA task: e1s5_I6-ADRIA_test_acemd3_update_KIXCMYB-1-2-RND1396 'test_acemd3_update' sounds hopeful, but: 1) Although created today, it's still being sent out with the ACEMD v2.12 (cuda1121) application. 2) The first user spat their copy out with the familiar libboost error. I got the second copy: I've patched my own machine, and it's running normally. So, the test doesn't seem to be telling us anything we didn't know already. |
triggglSend message Joined: 6 Mar 09 Posts: 25 Credit: 102,324,681 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm done until it gets fixed on the server side. My only boost option is 1.76.0 in gentoo. 1.74.0 isn't available. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm done until it gets fixed on the server side. My only boost option is 1.76.0 in gentoo. This problem seems to have been corrected in new version of ACEMD 2.17 tasks. I've seen several computers previously failing due to the lack of libboost v1.74, now succeeding in v2.17 Also, the problem due to tasks restarting in a different device at blended multi GPU systems is (corrected) avoided in current new version of ACEMD 2.17 But the way this known problem is avoided leads to a potential performance waste at this kind of systems, because when "N" tasks are received simultaneously, every of them are being executed effectively at device #0 only, thus multiplying by "N" the execution times, while "N-1" GPUs stay idle... |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
That's interesting. I'm betting the fix for the errors for restarting on a different device is what is causing the problem of all tasks starting on device#0. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
It must be hard coded to gpu0, or the whatever checks or communication it might have with BOINC to see which GPU is available isn’t being properly communicated so it somehow always thinks gpu0 is available. So every task runs there.
|
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
thaks for reporting |
©2025 Universitat Pompeu Fabra