Message boards :
Number crunching :
New app update (acemd3)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
| Author | Message |
|---|---|
|
Send message Joined: 3 Sep 13 Posts: 53 Credit: 1,533,531,731 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
name a11-TONI_TEST3-0-1-RND0663 https://www.gpugrid.net/workunit.php?wuid=16517242 Failure on all machines. My result here:https://www.gpugrid.net/result.php?resultid=20976177 My log: 50 GPUGRID 6/4/2019 2:44:30 AM Started download of acemd3.119.exe 51 GPUGRID 6/4/2019 2:44:30 AM Started download of boost_filesystem-vc140-mt-1_65_1.119.dll 52 GPUGRID 6/4/2019 2:44:32 AM Finished download of acemd3.119.exe 53 GPUGRID 6/4/2019 2:44:32 AM Started download of boost_system-vc140-mt-1_65_1.119.dll 54 GPUGRID 6/4/2019 2:44:33 AM Finished download of boost_filesystem-vc140-mt-1_65_1.119.dll 55 GPUGRID 6/4/2019 2:44:33 AM Finished download of boost_system-vc140-mt-1_65_1.119.dll 56 GPUGRID 6/4/2019 2:44:33 AM Started download of cufft64_80.119.dll 57 GPUGRID 6/4/2019 2:44:33 AM Started download of msvcp140.119.dll 58 GPUGRID 6/4/2019 2:44:38 AM Finished download of msvcp140.119.dll 59 GPUGRID 6/4/2019 2:44:38 AM Started download of nvrtc64_80.119.dll 60 GPUGRID 6/4/2019 2:45:06 AM Finished download of nvrtc64_80.119.dll 61 GPUGRID 6/4/2019 2:45:06 AM Started download of nvrtc-builtins64_80.119.dll 62 GPUGRID 6/4/2019 2:45:28 AM Finished download of nvrtc-builtins64_80.119.dll 63 GPUGRID 6/4/2019 2:45:28 AM Started download of OpenMMCPU.119.dll 64 GPUGRID 6/4/2019 2:45:30 AM Finished download of OpenMMCPU.119.dll 65 GPUGRID 6/4/2019 2:45:30 AM Started download of OpenMMCudaCompiler.119.dll 66 GPUGRID 6/4/2019 2:45:32 AM Finished download of OpenMMCudaCompiler.119.dll 67 GPUGRID 6/4/2019 2:45:32 AM Started download of OpenMMCUDA.119.dll 68 GPUGRID 6/4/2019 2:45:39 AM Finished download of OpenMMCUDA.119.dll 69 GPUGRID 6/4/2019 2:45:39 AM Started download of OpenMM.119.dll 70 GPUGRID 6/4/2019 2:45:48 AM Finished download of OpenMM.119.dll 71 GPUGRID 6/4/2019 2:45:48 AM Started download of OpenMMOpenCL.119.dll 72 GPUGRID 6/4/2019 2:45:54 AM Finished download of OpenMMOpenCL.119.dll 73 GPUGRID 6/4/2019 2:45:54 AM Started download of OpenMMPME.119.dll 74 GPUGRID 6/4/2019 2:45:58 AM Finished download of OpenMMPME.119.dll 75 GPUGRID 6/4/2019 2:45:58 AM Started download of psprolib.119.dll 76 GPUGRID 6/4/2019 2:46:00 AM Finished download of psprolib.119.dll 77 GPUGRID 6/4/2019 2:46:00 AM Started download of vcruntime140.119.dll 78 GPUGRID 6/4/2019 2:46:01 AM Finished download of vcruntime140.119.dll 79 GPUGRID 6/4/2019 2:46:01 AM Started download of a11-TONI_TEST3-0-conf_file_enc 80 GPUGRID 6/4/2019 2:46:02 AM Finished download of a11-TONI_TEST3-0-conf_file_enc 81 GPUGRID 6/4/2019 2:46:02 AM Started download of a11-TONI_TEST3-0-coor_file 82 GPUGRID 6/4/2019 2:46:03 AM Finished download of a11-TONI_TEST3-0-coor_file 83 GPUGRID 6/4/2019 2:46:03 AM Started download of a11-TONI_TEST3-0-vel_file 84 GPUGRID 6/4/2019 2:46:04 AM Finished download of a11-TONI_TEST3-0-vel_file 85 GPUGRID 6/4/2019 2:46:04 AM Started download of a11-TONI_TEST3-0-idx_file 86 GPUGRID 6/4/2019 2:46:05 AM Finished download of a11-TONI_TEST3-0-idx_file 87 GPUGRID 6/4/2019 2:46:05 AM Started download of a11-TONI_TEST3-0-xsc_file 88 GPUGRID 6/4/2019 2:46:06 AM Finished download of a11-TONI_TEST3-0-xsc_file 89 GPUGRID 6/4/2019 2:46:06 AM Started download of a11-TONI_TEST3-0-pdb_file 90 GPUGRID 6/4/2019 2:46:11 AM Finished download of a11-TONI_TEST3-0-pdb_file 91 GPUGRID 6/4/2019 2:46:11 AM Started download of a11-TONI_TEST3-0-psf_file 92 GPUGRID 6/4/2019 2:46:24 AM Finished download of a11-TONI_TEST3-0-psf_file 93 GPUGRID 6/4/2019 2:46:24 AM Started download of a11-TONI_TEST3-0-par_file 94 GPUGRID 6/4/2019 2:46:26 AM Finished download of a11-TONI_TEST3-0-par_file 95 GPUGRID 6/4/2019 2:46:26 AM Started download of a11-TONI_TEST3-0-prmtop_file 96 GPUGRID 6/4/2019 2:46:27 AM Finished download of a11-TONI_TEST3-0-prmtop_file 97 GPUGRID 6/4/2019 2:49:48 AM Finished download of cufft64_80.119.dll 98 GPUGRID 6/4/2019 2:49:49 AM Starting task a11-TONI_TEST3-0-1-RND0663_5 99 GPUGRID 6/4/2019 2:49:50 AM Computation for task a11-TONI_TEST3-0-1-RND0663_5 finished 100 GPUGRID 6/4/2019 2:49:50 AM Output file a11-TONI_TEST3-0-1-RND0663_5_0 for task a11-TONI_TEST3-0-1-RND0663_5 absent 101 GPUGRID 6/4/2019 2:49:50 AM Output file a11-TONI_TEST3-0-1-RND0663_5_9 for task a11-TONI_TEST3-0-1-RND0663_5 absent Team USA forum | Team USA page Join us and #crunchforcures. We are now also folding:join team ID 236370! |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
I think I debugged it (app version 201). 100 new WUs sent. Progress bar should also work (please report if not). |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
There are many more successes now. Edit. The reason for failures is not really clear. Question for anybody who has seen a success: do you have the CUDA Toolkit installed? |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
There are many more successes now. Hello Toni, I have received many successes and when I typed "nvcc -V" to verify the CUDA Toolkit version, it says "The program 'nvcc' is currently not installed. You can install it by typing: sudo apt install nvidia-cuda-toolkit" My system seems to not have it installed. This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281 |
|
Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 246,185 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281 access denied :-( |
|
Send message Joined: 21 Mar 16 Posts: 513 Credit: 4,673,458,277 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281 Perhaps you can view a single WU? http://www.gpugrid.net/result.php?resultid=20978809 |
|
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have 2 machines. Both have linux mint 19.1 installed, same nvidia driver (390.116), cuda toolkit release 9.1 (both tested as functional), same boinc version 7.14.2. The hardware is different: dual GTX 1080's on 2700X: All tasks are failing. http://www.gpugrid.net/results.php?hostid=482792 dual GTX 1080 Ti's on E5-2690 v2: All tasks are completing successfully! http://www.gpugrid.net/results.php?hostid=464987 There must be a clue here. Any ideas? |
|
Send message Joined: 12 Jul 17 Posts: 404 Credit: 17,408,899,587 RAC: 0 Level ![]() Scientific publications ![]() ![]()
|
Question for anybody who has seen a success: do you have the CUDA Toolkit installed? No. I installed the Nvidia 430.14 drivers as Linux metapackages. According to the Synaptic Package Manager I do not have the CUDA Toolkit installed.
|
|
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have 2 machines. Both have linux mint 19.1 installed, same nvidia driver (390.116), cuda toolkit release 9.1 (both tested as functional), same boinc version 7.14.2. I can't find anything in the logs. I was running Rosetta on the machine that had the failed GPUGrid tasks. There was no other project running on the machine that had the successful GPUGrid tasks. |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,189,196,190 RAC: 1,326,743 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
There are many more successes now. Even though nvcc is actually present on my Jetson Nano, nvcc -V yielded program not found. It is located at /usr/local/cuda-10.0/bin/nvcc I had to export the directory where nvcc was located for it to be found. That enabled a program to find nvcc. keith@Nano:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sun_Sep_30_21:09:22_CDT_2018 Cuda compilation tools, release 10.0, V10.0.166 But as soon as I rebooted, nvcc could not be found. So I ended up adding the library directory as an export in .bashrc and then I could find nvcc after reboots. |
|
Send message Joined: 2 Jul 16 Posts: 339 Credit: 7,990,341,558 RAC: 3,287 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I completed one while 5 others had errors. https://www.gpugrid.net/workunit.php?wuid=16520276 nvcc -V results nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 |
|
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another option is to place the cuda library path in a file in /etc/ld.so.conf.d. you could name the file cuda.conf then: sudo ldconfig |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,189,196,190 RAC: 1,326,743 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Correct. That is the other method I researched as a popular solution. So am I correct in understanding now is that one has to install the CUDA toolkit to run the new acemd application? That the wrapper download itself is insufficient? |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Hi all, thanks for the reports. The app SHOULD not require the cuda toolkit (which includes nvcc), yet on SOME hosts it is looking for it, and fails (the error message is more or less the same). I still don't understand the conditions when this occurs. In particular, as biodoc's precious example, there is no clear relationship between the card generation, driver, and success/failure. @biodoc, can you see other obvious differences between the two machines? E.g. - boinc installation method - presence of the gcc package |
|
Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,189,196,190 RAC: 1,326,743 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Well, I see I attempted to run a task that failed on one host. I looked over all the downloaded files and thought to do a sanity check on the executable. This is what ldd showed. keith@Numbskull:~/Desktop/BOINC/projects/www.gpugrid.net$ ldd '/home/keith/Desktop/BOINC/projects/www.gpugrid.net/acemd.919-80.bin' linux-vdso.so.1 (0x00007ffdf14d5000) libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fa630a0c000) libcudart.so.8.0 => not found libcufft.so.8.0 => not found libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa630808000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa6305e9000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa630260000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa62fec2000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa62fcaa000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa62f8b9000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa62f6b1000) libnvidia-fatbinaryloader.so.418.56 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 (0x00007fa62f463000) /lib64/ld-linux-x86-64.so.2 (0x00007fa631b63000) keith@Numbskull:~/Desktop/BOINC/projects/www.gpugrid.net$ So right off the bat, the app had no chance of succeeding when it can't find its own downloaded libcudart.so.8.0 and libcufft.so.8.0 files in the project directory. I don't think it would make any difference if/when all the files and work unit get copied into a slot. |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
You don't (shouldn't) need to install any additional software, if everything works as intended (not the wrapper, nor the cuda toolkit). You may need to update the drivers, though. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
libcudart.so.8.0 => not found If somebody can post or upload the three components of a test workunit specification: * <app_version> * <workunit> * <result> all from client_state.xml - make sure you get the right (latest) version of <app_version>, there will be several of them - I can proofread that there are no bugs in the BOINC deployment of the app files. This one could be a problem with the version renaming or copying. |
|
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi all, thanks for the reports. No, the boinc installation method is the same (repository meta package) and gcc is installed on both machines (build-essential package). I ran ldd on wrapper_26198_x86_64-pc-linux-gnu and acemd3.e72153abf98cb1fcd0f05fc443818dfc on both machines and the output is identical. Working machine: mark@x20-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./wrapper_26198_x86_64-pc-linux-gnu linux-vdso.so.1 (0x00007ffc1bfab000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7ab23ba000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7ab21a2000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7ab1f83000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7ab1b92000) /lib64/ld-linux-x86-64.so.2 (0x00007f7ab2758000) mark@x20-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./acemd3.e72153abf98cb1fcd0f05fc443818dfc linux-vdso.so.1 (0x00007ffda9bfe000) libOpenMM.so => not found libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffb4cb37000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffb4c7ae000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffb4c410000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffb4c1f8000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffb4be07000) /lib64/ld-linux-x86-64.so.2 (0x00007ffb4cd3b000) machine with failures: mark@x16-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./wrapper_26198_x86_64-pc-linux-gnu linux-vdso.so.1 (0x00007ffd96952000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd300b09000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd3008f1000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd3006d2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd3002e1000) /lib64/ld-linux-x86-64.so.2 (0x00007fd300ea7000) mark@x16-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./acemd3.e72153abf98cb1fcd0f05fc443818dfc linux-vdso.so.1 (0x00007ffef0097000) libOpenMM.so => not found libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fabe9b83000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fabe97fa000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fabe945c000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fabe9244000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fabe8e53000) /lib64/ld-linux-x86-64.so.2 (0x00007fabe9d87000) |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
We are distributing the two files with the app. They are copied (via copy_file) into the slot, and the slot is added to LD_LIBRARY_PATH. It works locally and on many machines; I am inclined to think it's not the problem. The "permission denied" bit seems related to a later stage, possibly an attempt to compile the cuda bytecode into the form necessary for the specific graphic card (done via nvrtc). If anybody is able to capture the "progress.log" file before it's deleted, thanks! T |
|
Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I did find a "messy" install of the nvidia driver on the offending machine. There seems to be remnants of a driver installed via download directly from nvidia. I'll clean that up. 'sudo apt search nvidia' showed significant differences between the 2 machines. |
©2026 Universitat Pompeu Fabra