Message boards :
Number crunching :
195 (0xc3) EXIT_CHILD_FAILED
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Have a look at the job.xml.xxxxxx file. I have one dated 22 September that says <job_desc>
<task>
<application>bin/acemd3.exe</application>
<command_line>--boinc --device $GPU_DEVICE_NUM</command_line>
<stdout_filename>progress.log</stdout_filename>
<checkpoint_filename>restart.chk</checkpoint_filename>
<fraction_done_filename>progress</fraction_done_filename>
</task>
<unzip_input>
<zipfilename>conda-pack.zip</zipfilename>
</unzip_input>
</job_desc>and another dated yesterday that says <job_desc>
<task>
<application>bin/acemd3.exe</application>
<command_line>--boinc --device $GPU_DEVICE_NUM</command_line>
<stdout_filename>progress.log</stdout_filename>
<checkpoint_filename>restart.chk</checkpoint_filename>
<fraction_done_filename>progress</fraction_done_filename>
</task>
<unzip_input>
<zipfilename>windows_x86_64__cuda101.zip</zipfilename>
</unzip_input>
</job_desc>To be certain, you'd need to look at the job specification in client_state.xml, but I think I'd go with the newest. Note that you'd also need to have the matching versions of cudart and cufft for the app you end up using. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Have a look at the job.xml.xxxxxx file. ... My job.xml.xxxxxx files look exactly like yours. Also date-wise. To me, this shows that the new tasks no longer use the former <zipfilename>conda-pack.zip< but rather the new <zipfilename>windows_x86_64__cuda101.zip< And since no "...cuda1121.zip" was downloaded into the GPUGRID folder, I suppose that the new WUs are running cuda101 only. Which further means that these new WUs will not work with Ampere cards :-( Looks as simple as that, most sadly :-( Unless someone here can report about successful completion of the new WUs with an Ampere card. If possible, some kind of confirmation/statement/explanation or whatever from the team would also help a lot. |
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Unless someone here can report about successful completion of the new WUs with an Ampere card. I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. All tasks on all cards have worked, am going to try some slower cards given the tasks are smaller. I have never renamed any of the project files. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. thanks for the information, sounds interesting. Could you please let me/us know whether your www.gpugrid.net folder (in BOINC > projects) contains any conda-pack.zip-files (if yes, which ones?), and whether besides the "windows_x86_64_cuda101.zip.c0d...b21" it contains such a file with "...cuda1121" (instead cuda101). |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have completed a Windows x64 cuda1121 task, and I have a windows_x86_64__cuda1121.zip file on that machine. You can download a copy from my Google drive. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. This lastly commented problem is only impacting Windows hosts. If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well. I agree and I've said this a few times also. no point in keeping the CUDA101 app when there's the 1121 app.
|
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us. What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well. good question |
|
Send message Joined: 12 May 13 Posts: 5 Credit: 100,032,540 RAC: 0 Level ![]() Scientific publications
|
I'm also estimating that this batch is considerably slighter than precedent ones, and my GTX 1660 Ti will be hitting full bonus with its current task. ServiceEnginIC, I noticed that your task completed in under 64,000 sec. My 1660 TI is looking to finish in just under 88,000 sec. I am wondering what could be causing such a big difference. The tasks, mine is a Cuda101 running under Win 7 and yours is Cuda1121 running under Linux. Are either of these the culprit? |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 592 Credit: 11,972,186,510 RAC: 1,447 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Working under Linux helps to squeeze maximum performance. Some optimized settings at BOINC Manager and a moderate overclocking do the rest. At Managing non-high-end hosts thread I try to share all what I know about it. |
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. You asked: Unless someone here can report about successful completion of the new WUs with an Ampere card. As I posted previously I am using linux. |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Unless someone here can report about successful completion of the new WUs with an Ampere card. As I posted previously I am using linux. oh okay, thanks for the information (which explains why it works well on your system). |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No, it does not explain it.I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. I've tried to run a CUDA 101 task on my Ubuntu 18.04.6 host on an RTX 3080 Ti (driver: 495.44), and it's failed after a few minutes. <core_client_version>7.16.17</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:33:16 (1675): wrapper (7.7.26016): starting
14:33:23 (1675): wrapper (7.7.26016): starting
14:33:23 (1675): wrapper: running bin/acemd3 (--boinc --device 0)
ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)
14:35:30 (1675): bin/acemd3 exited; CPU time 127.166324
14:35:30 (1675): app exit status: 0x1
14:35:30 (1675): called boinc_finish(195)
</stderr_txt>
]]> |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Another example: http://www.gpugrid.net/result.php?resultid=32706825 EDIT: 3rd attempt (failed as well): http://www.gpugrid.net/result.php?resultid=32706943 |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over.
|
|
Send message Joined: 22 Oct 10 Posts: 42 Credit: 1,752,050,315 RAC: 57 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Quote:Your tasks are failing with 'app exit status: 0xc0000135' - in all likelihood, you are missing a Microsoft runtime DLL file. Please refer to message 57353.Quote Richard: Thank you kindly for solving the problem. I installed both 86 and 64 updating applications and now both machines are processing GPU Grid tasks without fault. Billy Ewell 1931; celebrating the passage of my 90th birthday a few days ago and am physically in good shape and still mentally quite capable. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I missed out on all the new work because I had to get new master lists on all the hosts when their 24 hour timeouts finally expired. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I missed out on all the new work because I had to get new master lists on all the hosts when their 24 hour timeouts finally expired. I think the 24 hour (master file fetch) backoff is set by the client, rather than the server - so it can be over-ridden by a manual update. That's unlike the 'daily quota exceeded' and the 'last request too recent' backoffs, which are enforced by the server and can't be bypassed. I might use one of these boring lockdown days to force a client into 'master file fetch' mode, so I can see how it's recorded in client_state.xml, and hence how to remove it again - whenever and wherever that knowledge might be useful in the future. |
|
Send message Joined: 13 Dec 17 Posts: 1419 Credit: 9,119,446,190 RAC: 891 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Manual updates did nothing but keep resetting the 24 hour timer backoff. Same with an update script running every 15 minutes. Backoff never got below 23 hours before resetting back to 24. |
©2025 Universitat Pompeu Fabra