195 (0xc3) EXIT_CHILD

Author	Message
Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 57885 - Posted: 27 Nov 2021, 11:17:30 UTC - in response to Message 57884. Have a look at the job.xml.xxxxxx file. I have one dated 22 September that says <job_desc> <task> <application>bin/acemd3.exe</application> <command_line>--boinc --device $GPU_DEVICE_NUM</command_line> <stdout_filename>progress.log</stdout_filename> <checkpoint_filename>restart.chk</checkpoint_filename> <fraction_done_filename>progress</fraction_done_filename> </task> <unzip_input> <zipfilename>conda-pack.zip</zipfilename> </unzip_input> </job_desc> and another dated yesterday that says <job_desc> <task> <application>bin/acemd3.exe</application> <command_line>--boinc --device $GPU_DEVICE_NUM</command_line> <stdout_filename>progress.log</stdout_filename> <checkpoint_filename>restart.chk</checkpoint_filename> <fraction_done_filename>progress</fraction_done_filename> </task> <unzip_input> <zipfilename>windows_x86_64__cuda101.zip</zipfilename> </unzip_input> </job_desc> To be certain, you'd need to look at the job specification in client_state.xml, but I think I'd go with the newest. Note that you'd also need to have the matching versions of cudart and cufft for the app you end up using. ID: 57885 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 271,810 Level Scientific publications	Message 57887 - Posted: 27 Nov 2021, 13:46:24 UTC - in response to Message 57885. Have a look at the job.xml.xxxxxx file. ... My job.xml.xxxxxx files look exactly like yours. Also date-wise. To me, this shows that the new tasks no longer use the former <zipfilename>conda-pack.zip< but rather the new <zipfilename>windows_x86_64__cuda101.zip< And since no "...cuda1121.zip" was downloaded into the GPUGRID folder, I suppose that the new WUs are running cuda101 only. Which further means that these new WUs will not work with Ampere cards :-( Looks as simple as that, most sadly :-( Unless someone here can report about successful completion of the new WUs with an Ampere card. If possible, some kind of confirmation/statement/explanation or whatever from the team would also help a lot. ID: 57887 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,703,375,525 RAC: 1,193,553 Level Scientific publications	Message 57888 - Posted: 27 Nov 2021, 14:23:34 UTC - in response to Message 57887. Unless someone here can report about successful completion of the new WUs with an Ampere card. I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. All tasks on all cards have worked, am going to try some slower cards given the tasks are smaller. I have never renamed any of the project files. ID: 57888 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 271,810 Level Scientific publications	Message 57889 - Posted: 27 Nov 2021, 14:49:32 UTC - in response to Message 57888. I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. thanks for the information, sounds interesting. Could you please let me/us know whether your www.gpugrid.net folder (in BOINC > projects) contains any conda-pack.zip-files (if yes, which ones?), and whether besides the "windows_x86_64_cuda101.zip.c0d...b21" it contains such a file with "...cuda1121" (instead cuda101). ID: 57889 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 57890 - Posted: 27 Nov 2021, 15:31:47 UTC - in response to Message 57889. I have completed a Windows x64 cuda1121 task, and I have a windows_x86_64__cuda1121.zip file on that machine. You can download a copy from my Google drive. ID: 57890 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 593 Credit: 12,147,686,510 RAC: 4,315,110 Level Scientific publications	Message 57891 - Posted: 27 Nov 2021, 15:39:37 UTC - in response to Message 57888. I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. This lastly commented problem is only impacting Windows hosts. If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected. ID: 57891 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57892 - Posted: 27 Nov 2021, 15:42:46 UTC What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well. ID: 57892 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 57893 - Posted: 27 Nov 2021, 15:44:37 UTC - in response to Message 57892. What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well. I agree and I've said this a few times also. no point in keeping the CUDA101 app when there's the 1121 app. ID: 57893 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 271,810 Level Scientific publications	Message 57894 - Posted: 27 Nov 2021, 15:52:30 UTC - in response to Message 57891. I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. This lastly commented problem is only impacting Windows hosts. If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected. too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us. What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well. good question ID: 57894 · Rating: 0 · rate: / Reply Quote

Werinbert Send message Joined: 12 May 13 Posts: 5 Credit: 100,032,540 RAC: 0 Level Scientific publications	Message 57896 - Posted: 27 Nov 2021, 21:18:57 UTC - in response to Message 57876. I'm also estimating that this batch is considerably slighter than precedent ones, and my GTX 1660 Ti will be hitting full bonus with its current task. ServiceEnginIC, I noticed that your task completed in under 64,000 sec. My 1660 TI is looking to finish in just under 88,000 sec. I am wondering what could be causing such a big difference. The tasks, mine is a Cuda101 running under Win 7 and yours is Cuda1121 running under Linux. Are either of these the culprit? ID: 57896 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 593 Credit: 12,147,686,510 RAC: 4,315,110 Level Scientific publications	Message 57898 - Posted: 27 Nov 2021, 22:44:29 UTC - in response to Message 57896. Working under Linux helps to squeeze maximum performance. Some optimized settings at BOINC Manager and a moderate overclocking do the rest. At Managing non-high-end hosts thread I try to share all what I know about it. ID: 57898 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,703,375,525 RAC: 1,193,553 Level Scientific publications	Message 57905 - Posted: 28 Nov 2021, 10:05:52 UTC - in response to Message 57894. I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. This lastly commented problem is only impacting Windows hosts. If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected. too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us. You asked: Unless someone here can report about successful completion of the new WUs with an Ampere card. As I posted previously I am using linux. ID: 57905 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 271,810 Level Scientific publications	Message 57906 - Posted: 28 Nov 2021, 10:53:41 UTC - in response to Message 57905. Unless someone here can report about successful completion of the new WUs with an Ampere card. As I posted previously I am using linux. oh okay, thanks for the information (which explains why it works well on your system). ID: 57906 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57911 - Posted: 28 Nov 2021, 13:55:24 UTC - in response to Message 57906. Last modified: 28 Nov 2021, 13:56:49 UTC I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. This lastly commented problem is only impacting Windows hosts. If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected. too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us. You asked: Unless someone here can report about successful completion of the new WUs with an Ampere card. As I posted previously I am using linux. oh okay, thanks for the information (which explains why it works well on your system). No, it does not explain it. I've tried to run a CUDA 101 task on my Ubuntu 18.04.6 host on an RTX 3080 Ti (driver: 495.44), and it's failed after a few minutes. <core_client_version>7.16.17</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> 14:33:16 (1675): wrapper (7.7.26016): starting 14:33:23 (1675): wrapper (7.7.26016): starting 14:33:23 (1675): wrapper: running bin/acemd3 (--boinc --device 0) ACEMD failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 14:35:30 (1675): bin/acemd3 exited; CPU time 127.166324 14:35:30 (1675): app exit status: 0x1 14:35:30 (1675): called boinc_finish(195) </stderr_txt> ]]> ID: 57911 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57918 - Posted: 28 Nov 2021, 14:26:19 UTC - in response to Message 57911. Last modified: 28 Nov 2021, 14:39:37 UTC Another example: http://www.gpugrid.net/result.php?resultid=32706825 EDIT: 3rd attempt (failed as well): http://www.gpugrid.net/result.php?resultid=32706943 ID: 57918 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 57959 - Posted: 29 Nov 2021, 14:28:37 UTC after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over. ID: 57959 · Rating: 0 · rate: / Reply Quote

Billy Ewell 1931 Send message Joined: 22 Oct 10 Posts: 42 Credit: 1,758,800,315 RAC: 40,420 Level Scientific publications	Message 57966 - Posted: 29 Nov 2021, 18:07:32 UTC - in response to Message 57880. Quote:Your tasks are failing with 'app exit status: 0xc0000135' - in all likelihood, you are missing a Microsoft runtime DLL file. Please refer to message 57353.Quote Richard: Thank you kindly for solving the problem. I installed both 86 and 64 updating applications and now both machines are processing GPU Grid tasks without fault. Billy Ewell 1931; celebrating the passage of my 90th birthday a few days ago and am physically in good shape and still mentally quite capable. ID: 57966 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level Scientific publications	Message 57967 - Posted: 29 Nov 2021, 18:09:21 UTC I missed out on all the new work because I had to get new master lists on all the hosts when their 24 hour timeouts finally expired. ID: 57967 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 57969 - Posted: 29 Nov 2021, 18:24:03 UTC - in response to Message 57967. I missed out on all the new work because I had to get new master lists on all the hosts when their 24 hour timeouts finally expired. I think the 24 hour (master file fetch) backoff is set by the client, rather than the server - so it can be over-ridden by a manual update. That's unlike the 'daily quota exceeded' and the 'last request too recent' backoffs, which are enforced by the server and can't be bypassed. I might use one of these boring lockdown days to force a client into 'master file fetch' mode, so I can see how it's recorded in client_state.xml, and hence how to remove it again - whenever and wherever that knowledge might be useful in the future. ID: 57969 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1423 Credit: 9,188,446,190 RAC: 1,336,521 Level Scientific publications	Message 57971 - Posted: 29 Nov 2021, 19:54:05 UTC Manual updates did nothing but keep resetting the 24 hour timer backoff. Same with an update script running every 15 minutes. Backoff never got below 23 hours before resetting back to 24. ID: 57971 · Rating: 0 · rate: / Reply Quote

195 (0xc3) EXIT_CHILD_FAILED