Message boards : Number crunching : New version of ACEMD requires libboost v1.74
Author | Message |
---|---|
So, late on a Saturday evening, I get sent WU 27077711. Initial replication 2, quorum 1. Mine was _6, sent after six previous failures. | |
ID: 57206 | Rating: 0 | rate: / Reply Quote | |
I managed to pick up 3 cryptic scout resends today and successfully crunched and validated them. | |
ID: 57209 | Rating: 0 | rate: / Reply Quote | |
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. | |
ID: 57213 | Rating: 0 | rate: / Reply Quote | |
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. I definitely agree. The app should be updated to include the necessary package. ____________ | |
ID: 57214 | Rating: 0 | rate: / Reply Quote | |
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. +1 To lacking of libboost v1.74 library, a problem of many wrongly constructed WUs in current batch can be added. Even with libboost workaround applied locally, these tasks are failing anyway with this other error: EXCEPTIONAL CONDITION: /home/user/conda/conda-bld/acemd3_1618916459379/work/src/mdio/bincoord.c, line 193: "nelems != 1" ...And they come to extinguish due to "max # of errors" reached. Since number of tasks in progress is reduced to 3 at this time, it is consistent with Project be flushing the tasks buffer before a new App version is launched. | |
ID: 57215 | Rating: 0 | rate: / Reply Quote | |
I think the solution is for the app to be updated - which should have happened already, not for thousands of users to try and install something extra. yeah i've seen that on all new tasks that have come through recently. they seem to be malformed from the project. not a problem with missing packages here, just a problem with the WU itself. ____________ | |
ID: 57219 | Rating: 0 | rate: / Reply Quote | |
I've just got a new ADRIA task: | |
ID: 57224 | Rating: 0 | rate: / Reply Quote | |
| |
ID: 57248 | Rating: 0 | rate: / Reply Quote | |
I'm done until it gets fixed on the server side. My only boost option is 1.76.0 in gentoo. This problem seems to have been corrected in new version of ACEMD 2.17 tasks. I've seen several computers previously failing due to the lack of libboost v1.74, now succeeding in v2.17 Also, the problem due to tasks restarting in a different device at blended multi GPU systems is (corrected) avoided in current new version of ACEMD 2.17 But the way this known problem is avoided leads to a potential performance waste at this kind of systems, because when "N" tasks are received simultaneously, every of them are being executed effectively at device #0 only, thus multiplying by "N" the execution times, while "N-1" GPUs stay idle... | |
ID: 57268 | Rating: 0 | rate: / Reply Quote | |
That's interesting. I'm betting the fix for the errors for restarting on a different device is what is causing the problem of all tasks starting on device#0. | |
ID: 57269 | Rating: 0 | rate: / Reply Quote | |
It must be hard coded to gpu0, or the whatever checks or communication it might have with BOINC to see which GPU is available isn’t being properly communicated so it somehow always thinks gpu0 is available. So every task runs there. | |
ID: 57270 | Rating: 0 | rate: / Reply Quote | |
thaks for reporting | |
ID: 57271 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : New version of ACEMD requires libboost v1.74