New app update (acemd3)

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 52000 - Posted: 5 Jun 2019, 10:06:38 UTC - in response to Message 51999. Last modified: 5 Jun 2019, 10:07:04 UTC I did find a "messy" install of the nvidia driver on the offending machine. There seems to be remnants of a driver installed via download directly from nvidia. I'll clean that up. 'sudo apt search nvidia' showed significant differences between the 2 machines. From what I know, "apt search" does not look at the packages installed in your system but those "accessible" online. So, the difference may be in the repository configurations. ID: 52000 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 52001 - Posted: 5 Jun 2019, 10:49:24 UTC - in response to Message 52000. From what I know, "apt search" does not look at the packages installed in your system but those "accessible" online. So, the difference may be in the repository configurations. Yeah, dpkg -l \| grep -i nvidia is the right command. I went ahead and purged everything nvidia and reinstalled the nvidia driver. I didn't install the cuda toolkit though. UPDATE: tasks still failing on this machine. ID: 52001 · Rating: 0 · rate: / Reply Quote

Anon Send message Joined: 4 Jun 19 Posts: 3 Credit: 11,999,700 RAC: 0 Level Scientific publications	Message 52002 - Posted: 5 Jun 2019, 11:40:30 UTC - in response to Message 51998. Last modified: 5 Jun 2019, 12:39:34 UTC Toni host: CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 418.56, CUDA version 10.1, compute capability 6.1, 4096MB, 3968MB available, 9718 GFLOPS peak) OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 418.56, device version OpenCL 1.2 CUDA, 8112MB, 3968MB available, 9718 GFLOPS peak) Progress.log from a vaild task: # # ACEMD version 3.2.0rc0-65-gdb8d7f8[/code] # # Copyright (C) 2017-2019 Acellera (www.acellera.com) # # When publishing, please cite: # ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale # M. J. Harvey, G. Giupponi and G. De Fabritiis, # J Chem. Theory. Comput. 2009 5(6), pp1632-1639 # DOI: 10.1021/ct9000685 # # ACEMD is running in Boinc mode! # # Read input file: input # Parse input file # WARNING: Keyword "hydrogenscale" is deprecated: Hydrogen mass scaling enabled when timestep > 2.0 # WARNING: Keyword "rigidbonds" is deprecated: Rigid bonds set when timestep > 1.0 # WARNING: Keyword "exclude" is deprecated: 1-4 exclusion automatically set by force-field # WARNING: Keyword "1-4scaling" is deprecated: 1-4 scaling automatically set by force-field # WARNING: Keyword "pmegridsizex" is deprecated: Feature not supported # WARNING: Keyword "pmegridsizey" is deprecated: Feature not supported # WARNING: Keyword "pmegridsizez" is deprecated: Feature not supported # WARNING: Keyword "pmefreq" is deprecated: MTS not supported # WARNING: Deprecated keyword "langevin" is replaced with "thermostat" # WARNING: Deprecated keyword "langevindamping" is replaced with "thermostatDamping" # WARNING: Keyword "energyfreq" is deprecated: Energies are now output every trajectoryFreq steps $ $# Forcefield configuration $ $ parameters parameters $ $# Initial State $ $ structure structure.psf $ coordinates structure.pdb $ temperature 300.00 # K $ celldimension 62.230000 62.230000 62.230000 # A $ $# Output $ $ trajectoryFile output.xtc $ trajectoryFreq 25000 $ $# Electrostatics $ $ PME on $ cutoff 9.00 # A $ switching on $ switchDist 7.50 # A $ implicit off $ $# Temperature Control $ $ thermostat on $ thermostatTemp 298.15 # K $ thermostatDamping 1.00 # /ps $ $# Pressure Control $ $ barostat off $ barostatPressure 1.0000 # bar $ useFlexibleCell off $ useConstantArea off $ useConstantRatio off $ $# Integration $ $ timestep 4.00 # fs $ $# External forces $ $ $# Restraints $ $ $# Run Configuration $ $ restart off $ run 250000 # Topology reports 23558 atoms # Initializing engine # Version: 7.3.1 # WARNING: overriding the plugin path to /var/lib/boinc-client/slots/40 with ACEMD_PLUGIN_DIR # Plugin directory: /var/lib/boinc-client/slots/40 # Loaded plugins # libOpenMMCUDA # libOpenMMPME # libOpenMMOpenCL # libOpenMMCPU # libOpenMMCudaCompiler # Available platforms # CUDA # OpenCL # CPU # # Bonded interactions # Harmonic bond interactions # Number of terms: 16569 # Harmonic angle interactions # Number of terms: 11584 # Urey-Bradley interactions # Number of terms: 2117 # Proper dihedral interations # Number of terms: 5621 # Number of skipped terms: 1379 # NOTE: the skipped terms have zero force constants # Improper dihedral interations # Number of terms: 408 # Number of skipped terms: 10 # NOTE: the skipped terms have zero force constants # CMAP interactions # Number of terms: 0 # NOTE: CMAP interations skipped # # Non-bonded interactions # Number of exclusions: 34709 # Lennard-Jones terms # Cutoff distance: 9.000 A # Switching distance: 7.500 A # Coulombic (PME) term # Ewald tolerance: 0.000500 # No NBFIX # No implicit solvent # # Constraining hydrogen (X-H) bonds # Number of constrained bonds: 15267 # Making water molecules rigid # Number of water molecules: 7023 # Number of constraints: 22290 # # Repartitioning hydrogen atom mass # New hydrogen mass: 4.032 au # Number of hydrogen atoms: 15267 # # Creating simulation system # Number of particles: 23558 # Number of degrees of freedom 48381 # Periodic box size: 62.230 62.230 62.230 A # # Using Langevin integrator (with temperature control) # Thermostat target temperature: 298.15 K # Thermostat friction coeficient: 1.00 ps^-1 # Slotfolder 40 zip: https://filebin.net/jfv8ec4c6q8uszuw/Slot_40.zip?t=tvn13kdj On failed host slot folder are empty. Boinc wipe at crash or application never add files to slotfolder. I could not grab progress.log. Task failed after in 1 sec is impossible to grab and it doesnt store to upload so it wiped out. Getting error this on older os 16.04 with GTX970 driver: 418.56. Same drivers hand out valid task on later system 18.10. So it looks to be on system not driver version. This compile issue still exist on latest application but only effect old system. <core_client_version>7.6.31</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61) </message> <stderr_txt> 14:22:07 (102554): wrapper (7.7.26016): starting 14:22:07 (102554): wrapper (7.7.26016): starting 14:22:07 (102554): wrapper: running acemd3 (--boinc input --device 0) # Engine failed: Error launching CUDA compiler: 32512 sh: 1: : Permission denied 14:22:08 (102554): acemd3 exited; CPU time 0.132000 14:22:08 (102554): app exit status: 0x1 14:22:08 (102554): called boinc_finish(195) </stderr_txt> ]]> ID: 52002 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 52003 - Posted: 5 Jun 2019, 12:15:25 UTC - in response to Message 52002. Aehm, to clarify: I see the process.log file of successful tasks only. ID: 52003 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 52004 - Posted: 5 Jun 2019, 13:38:42 UTC - in response to Message 52003. Last modified: 5 Jun 2019, 13:39:05 UTC If anybody is so inclined, can they try to run the boinc client manually with the --exit_after_finish flag, so the slot directory is preserved on failure? Thanks ID: 52004 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 52005 - Posted: 5 Jun 2019, 15:31:07 UTC - in response to Message 51996. If somebody can post or upload the three components of a test workunit specification: * <app_version> * <workunit> * <result> all from client_state.xml - make sure you get the right (latest) version of <app_version>, there will be several of them - I can proofread that there are no bugs in the BOINC deployment of the app files. This one could be a problem with the version renaming or copying. There is information in <app_version> but nothing for <workunit> or <result <app_version> <app_name>acemd3</app_name> <version_num>202</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>0.987442</avg_ncpus> <flops>28742507251613.187500</flops> <plan_class>cuda80</plan_class> <api_version>7.7.0</api_version> <file_ref> <file_name>wrapper_26198_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> <file_ref> <file_name>acemd3.e72153abf98cb1fcd0f05fc443818dfc</file_name> <open_name>acemd3</open_name> <copy_file/> </file_ref> <file_ref> <file_name>job.xml.1245cc127550a015dcc9b3e1c2c84e13</file_name> <open_name>job.xml</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenMMOpenCL.so.6a31fa1ff5ae3a26ea64f2abfb5a66cc</file_name> <open_name>libOpenMMOpenCL.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenCL.so.1.0.0.43d4300566ce59d77e0fa316f8ee5b02</file_name> <open_name>libOpenCL.so.1</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libgomp.so.1.0.0.efdf718669edc7fff00e0c5f7f0b8791</file_name> <open_name>libgomp.so.1.0.0</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenMM.so.5406dfd716045d08ad6369e2399a98e2</file_name> <open_name>libOpenMM.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenMMCUDA.so.8867021fdc0daf2e39f1b7228ece45af</file_name> <open_name>libOpenMMCUDA.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libcudart.so.8.0.61.af43be839e6366e731accc514633bd1f</file_name> <open_name>libcudart.so.8.0</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libfftw3f_threads.so.3.4.4.dd0c6fcfa550371acf730db2d9d5a270</file_name> <open_name>libfftw3f_threads.so.3</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libgcc_s.so.1.d7f787a9bf6c3633eaebb9015c6d9044</file_name> <open_name>libgcc_s.so.1</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libnvrtc-builtins.so.8.0.61.684f2f1d9f0934bcce91e77b69e17ec7</file_name> <open_name>libnvrtc-builtins.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenMMCudaCompiler.so.aaed781fe4caa9d1099312d458a9b902</file_name> <open_name>libOpenMMCudaCompiler.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libfftw3f.so.3.4.4.a4580ddf9efebaad56fab49847a8c899</file_name> <open_name>libfftw3f.so.3</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenMMPME.so.3208e45e71567824e8390ab1c79c6a66</file_name> <open_name>libOpenMMPME.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libnvrtc.so.8.0.61.ea3bff3d91151ddf671a0a1491635b57</file_name> <open_name>libnvrtc.so.8.0</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libOpenMMCPU.so.19849b4ff1cf4d33f75d9433b4d5c6bb</file_name> <open_name>libOpenMMCPU.so</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libcufft.so.8.0.61.889be25939bec6f9a2abec790772d28f</file_name> <open_name>libcufft.so.8.0</open_name> <copy_file/> </file_ref> <file_ref> <file_name>libstdc++.so.6.0.25.e344f48acfbd4f5abbf99b2c75cc5e50</file_name> <open_name>libstdc++.so.6</open_name> <copy_file/> </file_ref> <coproc> <type>NVIDIA</type> <count>1.000000</count> </coproc> <gpu_ram>512.000000</gpu_ram> <dont_throttle/> </app_version> ID: 52005 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 0 Level Scientific publications	Message 52006 - Posted: 5 Jun 2019, 17:12:58 UTC - in response to Message 52005. Thanks. The context was libcudart.so.8.0 => not found libcufft.so.8.0 => not found So right off the bat, the app had no chance of succeeding when it can't find its own downloaded libcudart.so.8.0 and libcufft.so.8.0 files in the project directory. Both files will be copied with the correct names into the slot directory, although they will be downloaded under a different (versioned) name. So a static test outside the running BOINC environment will fail to find them, but a dynamic test during running should be OK. I don't think this one will take us much further. ID: 52006 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 11,680 Level Scientific publications	Message 52007 - Posted: 5 Jun 2019, 17:44:05 UTC - in response to Message 52004. If anybody is so inclined, can they try to run the boinc client manually with the --exit_after_finish flag, so the slot directory is preserved on failure? Thanks I just tried the manual run of the client with the suggested --exit_after_finish parameter but it did not preserve the slot contents. 05-Jun-2019 10:38:59 [GPUGRID] Starting task a3-TONI_TEST9-2-3-RND2847_2 05-Jun-2019 10:39:03 [GPUGRID] [sched_op] Deferring communication for 00:06:31 05-Jun-2019 10:39:03 [GPUGRID] [sched_op] Reason: Unrecoverable error for task a3-TONI_TEST9-2-3-RND2847_2 mv: cannot stat 'slots/8/output.coor': No such file or directory mv: cannot stat 'slots/8/output.vel': No such file or directory mv: cannot stat 'slots/8/output.idx': No such file or directory mv: cannot stat 'slots/8/output.dcd': No such file or directory mv: cannot stat 'slots/8/COLVAR': No such file or directory mv: cannot stat 'slots/8/log.file': No such file or directory mv: cannot stat 'slots/8/HILLS': No such file or directory mv: cannot stat 'slots/8/output.vel.dcd': No such file or directory mv: cannot stat 'slots/8/output.xtc': No such file or directory mv: cannot stat 'slots/8/output.xsc': No such file or directory mv: cannot stat 'slots/8/output.xstfile': No such file or directory 05-Jun-2019 10:39:03 [GPUGRID] Computation for task a3-TONI_TEST9-2-3-RND2847_2 finished 05-Jun-2019 10:39:03 [GPUGRID] Output file a3-TONI_TEST9-2-3-RND2847_2_9 for task a3-TONI_TEST9-2-3-RND2847_2 absent 05-Jun-2019 10:39:05 [GPUGRID] Started upload of a3-TONI_TEST9-2-3-RND2847_2_0 05-Jun-2019 10:39:07 [GPUGRID] Finished upload of a3-TONI_TEST9-2-3-RND2847_2_0 ^C05-Jun-2019 10:39:11 [---] Received signal 2 05-Jun-2019 10:39:11 [---] Exiting keith@Darksider:~/Desktop/BOINC$ ID: 52007 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 11,680 Level Scientific publications	Message 52008 - Posted: 5 Jun 2019, 18:17:09 UTC I thought that all the tasks I had downloaded had failed but I see I have one host that has been successfully processing the acemd3 tasks. But I just aborted the cache thinking all the hosts were unsuccessful. Oops. Now to try and compare what is different about that machine compared to the rest. I believe the difference is that at one time I had installed the cuda toolkit on that host and then removed it long in the past. ID: 52008 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 11,680 Level Scientific publications	Message 52009 - Posted: 5 Jun 2019, 18:33:55 UTC Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards. Could this be the reason that app fails? ID: 52009 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 52010 - Posted: 5 Jun 2019, 18:49:55 UTC I see that there is a new version 2.02, which I just tried on my GTX 1070 (Ubuntu 16.04.6). I just use the Ubuntu repository driver, which is 396.54 (proprietary), without any toolbox that I know of. It failed immediately. GPUGRID 2.02 New version of ACEMD (cuda80) a67-TONI_TEST8-2-3-RND3156_0 00:00:03 (-) 0.00 100.000 - 6/10/2019 2:42:16 PM 0.985C + 1NV Computation error 0.00 MB i7-4790-G http://www.gpugrid.net/results.php?hostid=482386 Explain to me (simply) what I should check, and I will do it. ID: 52010 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 52011 - Posted: 5 Jun 2019, 19:16:03 UTC - in response to Message 52009. Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards. Could this be the reason that app fails? I think the plan is to get a stable acemd3 app running on legacy hardware and then release a beta for turing cards. @jim1348, I get the same error on one of my machines with dual GTX 1080 cards. ID: 52011 · Rating: 0 · rate: / Reply Quote

mdxi Send message Joined: 11 Feb 18 Posts: 1 Credit: 104,599,162 RAC: 0 Level Scientific publications	Message 52012 - Posted: 5 Jun 2019, 19:27:21 UTC I am also seeing failures due to the acemd binary not finding some libs: [root@node02 www.gpugrid.net]# ldd acemd.919-80.bin linux-vdso.so.1 (0x00007fff6a317000) libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007f740db2b000) libcudart.so.8.0 => not found libcufft.so.8.0 => not found libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f740db26000) libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f740db05000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f740d975000) libm.so.6 => /usr/lib/libm.so.6 (0x00007f740d82d000) libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f740d813000) libc.so.6 => /usr/lib/libc.so.6 (0x00007f740d64e000) librt.so.1 => /usr/lib/librt.so.1 (0x00007f740d644000) libnvidia-fatbinaryloader.so.430.14 => /usr/lib/libnvidia-fatbinaryloader.so.430.14 (0x00007f740d3f6000) /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f740eca8000) This is despite the libs being right there in the directory with the binary: [root@node02 www.gpugrid.net]# ls -l libcu* -rwxr-xr-x 1 boinc boinc 394472 May 17 18:34 libcudart.so.8.0 -rwxr-xr-x 1 boinc boinc 426680 Jun 4 18:26 libcudart.so.8.0.61.af43be839e6366e731accc514633bd1f -rwxr-xr-x 1 boinc boinc 146745600 May 17 18:35 libcufft.so.8.0 -rwxr-xr-x 1 boinc boinc 146772424 Jun 4 18:28 libcufft.so.8.0.61.889be25939bec6f9a2abec790772d28f This machine is running Arch linux. Boinc was compiled locally, from the github source. The NVIDIA drivers are from Arch, with no modifications. [root@node02 www.gpugrid.net]# pacman -Ss nvidia \| grep installed extra/nvidia 430.14-6 [installed] extra/nvidia-utils 430.14-1 [installed] extra/opencl-nvidia 430.14-1 [installed] This machine is currently successfully crunching GPGPU WUs for Primegrid and Einstein@Home, so its configuration is known good. ID: 52012 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 11,680 Level Scientific publications	Message 52013 - Posted: 5 Jun 2019, 19:34:07 UTC - in response to Message 52011. Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards. Could this be the reason that app fails? I think the plan is to get a stable acemd3 app running on legacy hardware and then release a beta for turing cards. @jim1348, I get the same error on one of my machines with dual GTX 1080 cards. OK, that is a very different comprehension that I have for the wrapper app. I thought it was to allow use of the Turing cards. I guess I should put the gpu_exclude back in play for the hosts that failed the tasks. ID: 52013 · Rating: 0 · rate: / Reply Quote

biodoc Send message Joined: 26 Aug 08 Posts: 183 Credit: 10,085,929,375 RAC: 0 Level Scientific publications	Message 52014 - Posted: 5 Jun 2019, 19:55:12 UTC - in response to Message 52013. Last modified: 5 Jun 2019, 19:55:59 UTC Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards. Could this be the reason that app fails? I think the plan is to get a stable acemd3 app running on legacy hardware and then release a beta for turing cards. @jim1348, I get the same error on one of my machines with dual GTX 1080 cards. OK, that is a very different comprehension that I have for the wrapper app. I thought it was to allow use of the Turing cards. I guess I should put the gpu_exclude back in play for the hosts that failed the tasks. See this post: http://www.gpugrid.net/forum_thread.php?id=4927&nowrap=true#51934 ID: 52014 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 11,680 Level Scientific publications	Message 52015 - Posted: 5 Jun 2019, 20:04:13 UTC - in response to Message 52014. Last modified: 5 Jun 2019, 20:50:16 UTC Thanks for the edification. [Edit]This is the error for trying to run on a Turing card. <core_client_version>7.15.0</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 12:04:16 (22587): wrapper (7.7.26016): starting 12:04:16 (22587): wrapper (7.7.26016): starting 12:04:16 (22587): wrapper: running acemd3 (--boinc input --device 0) # Engine failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 12:04:17 (22587): acemd3 exited; CPU time 0.164594 12:04:17 (22587): app exit status: 0x1 12:04:17 (22587): called boinc_finish(195) </stderr_txt> ]]> ID: 52015 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 339 Credit: 7,990,341,558 RAC: 28 Level Scientific publications	Message 52016 - Posted: 5 Jun 2019, 22:20:05 UTC - in response to Message 51990. I completed one while 5 others had errors. https://www.gpugrid.net/workunit.php?wuid=16520276 nvcc -V results nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 Same result on another PC but all tasks error on a 1080Ti https://www.gpugrid.net/show_host_detail.php?hostid=477247 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 ID: 52016 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1424 Credit: 9,189,946,190 RAC: 11,680 Level Scientific publications	Message 52017 - Posted: 5 Jun 2019, 23:11:10 UTC I think I should take one of the hosts that fail the app and install the cuda toolkit and see if it changes anything. I know that Toni said the toolkit is unnecessary supposedly, but it might show something. ID: 52017 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 339 Credit: 7,990,341,558 RAC: 28 Level Scientific publications	Message 52018 - Posted: 6 Jun 2019, 1:38:53 UTC It won't hurt. One PC of mine with 1070/1070Ti works and another with 1080Ti doesn't. Both have the same nvcc -V results. ID: 52018 · Rating: 0 · rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 52019 - Posted: 6 Jun 2019, 8:59:43 UTC - in response to Message 52018. Last modified: 6 Jun 2019, 9:00:22 UTC Misc answers: - No turing support YET. If the app works, there will be many more possibilities - I don't think installing the cuda toolkit will change anything, but who knows... but please don't break your systems (e.g. tweaking PATH) to install it. - I'm fairly positive about library copying/renaming being ok. - I'll be updating the app soon. Seems some system-specific non-reproducible behavior. - In any case, updated drivers won't hurt. ID: 52019 · Rating: 0 · rate: / Reply Quote