195 (0xc3) EXIT_CHILD_FAILED

Message boards : Number crunching : 195 (0xc3) EXIT_CHILD_FAILED
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57885 - Posted: 27 Nov 2021, 11:17:30 UTC - in response to Message 57884.  

Have a look at the job.xml.xxxxxx file.

I have one dated 22 September that says

<job_desc>
    <task>
        <application>bin/acemd3.exe</application>
        <command_line>--boinc --device $GPU_DEVICE_NUM</command_line>
        <stdout_filename>progress.log</stdout_filename>
        <checkpoint_filename>restart.chk</checkpoint_filename>
        <fraction_done_filename>progress</fraction_done_filename>
    </task>
    <unzip_input>
       <zipfilename>conda-pack.zip</zipfilename>
    </unzip_input>
</job_desc>

and another dated yesterday that says

<job_desc>
    <task>
        <application>bin/acemd3.exe</application>
        <command_line>--boinc --device $GPU_DEVICE_NUM</command_line>
        <stdout_filename>progress.log</stdout_filename>
        <checkpoint_filename>restart.chk</checkpoint_filename>
        <fraction_done_filename>progress</fraction_done_filename>
    </task>
    <unzip_input>
       <zipfilename>windows_x86_64__cuda101.zip</zipfilename>
    </unzip_input>
</job_desc>

To be certain, you'd need to look at the job specification in client_state.xml, but I think I'd go with the newest.

Note that you'd also need to have the matching versions of cudart and cufft for the app you end up using.
ID: 57885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57887 - Posted: 27 Nov 2021, 13:46:24 UTC - in response to Message 57885.  

Have a look at the job.xml.xxxxxx file. ...


My job.xml.xxxxxx files look exactly like yours. Also date-wise.

To me, this shows that the new tasks no longer use the former <zipfilename>conda-pack.zip< but rather the new <zipfilename>windows_x86_64__cuda101.zip<

And since no "...cuda1121.zip" was downloaded into the GPUGRID folder, I suppose that the new WUs are running cuda101 only.
Which further means that these new WUs will not work with Ampere cards :-(

Looks as simple as that, most sadly :-(

Unless someone here can report about successful completion of the new WUs with an Ampere card.

If possible, some kind of confirmation/statement/explanation or whatever from the team would also help a lot.


ID: 57887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Mar 14
Posts: 18
Credit: 6,575,125,525
RAC: 2
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57888 - Posted: 27 Nov 2021, 14:23:34 UTC - in response to Message 57887.  

Unless someone here can report about successful completion of the new WUs with an Ampere card.

I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine. All tasks on all cards have worked, am going to try some slower cards given the tasks are smaller.

I have never renamed any of the project files.
ID: 57888 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57889 - Posted: 27 Nov 2021, 14:49:32 UTC - in response to Message 57888.  

I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine.

thanks for the information, sounds interesting.

Could you please let me/us know whether your www.gpugrid.net folder (in BOINC > projects) contains any conda-pack.zip-files (if yes, which ones?), and whether besides the "windows_x86_64_cuda101.zip.c0d...b21" it contains such a file with "...cuda1121" (instead cuda101).
ID: 57889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57890 - Posted: 27 Nov 2021, 15:31:47 UTC - in response to Message 57889.  

I have completed a Windows x64 cuda1121 task, and I have a windows_x86_64__cuda1121.zip file on that machine.

You can download a copy from my Google drive.
ID: 57890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57891 - Posted: 27 Nov 2021, 15:39:37 UTC - in response to Message 57888.  

I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine.

This lastly commented problem is only impacting Windows hosts.
If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected.
ID: 57891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57892 - Posted: 27 Nov 2021, 15:42:46 UTC

What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well.
ID: 57892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57893 - Posted: 27 Nov 2021, 15:44:37 UTC - in response to Message 57892.  

What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well.


I agree and I've said this a few times also. no point in keeping the CUDA101 app when there's the 1121 app.
ID: 57893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57894 - Posted: 27 Nov 2021, 15:52:30 UTC - in response to Message 57891.  

I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine.

This lastly commented problem is only impacting Windows hosts.
If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected.

too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us.


What's the point in keeping the CUDA10 app alive? The CUDA11 app works on older cards as well.

good question
ID: 57894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 12 May 13
Posts: 5
Credit: 100,032,540
RAC: 0
Level
Cys
Scientific publications
wat
Message 57896 - Posted: 27 Nov 2021, 21:18:57 UTC - in response to Message 57876.  

I'm also estimating that this batch is considerably slighter than precedent ones, and my GTX 1660 Ti will be hitting full bonus with its current task.

ServiceEnginIC, I noticed that your task completed in under 64,000 sec. My 1660 TI is looking to finish in just under 88,000 sec. I am wondering what could be causing such a big difference. The tasks, mine is a Cuda101 running under Win 7 and yours is Cuda1121 running under Linux. Are either of these the culprit?
ID: 57896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ServicEnginIC
Avatar

Send message
Joined: 24 Sep 10
Posts: 592
Credit: 11,972,186,510
RAC: 1,447
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57898 - Posted: 27 Nov 2021, 22:44:29 UTC - in response to Message 57896.  

Working under Linux helps to squeeze maximum performance.
Some optimized settings at BOINC Manager and a moderate overclocking do the rest.
At Managing non-high-end hosts thread I try to share all what I know about it.
ID: 57898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PDW

Send message
Joined: 7 Mar 14
Posts: 18
Credit: 6,575,125,525
RAC: 2
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57905 - Posted: 28 Nov 2021, 10:05:52 UTC - in response to Message 57894.  

I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine.

This lastly commented problem is only impacting Windows hosts.
If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected.

too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us.

You asked:
Unless someone here can report about successful completion of the new WUs with an Ampere card.

As I posted previously I am using linux.
ID: 57905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Erich56

Send message
Joined: 1 Jan 15
Posts: 1166
Credit: 12,260,898,501
RAC: 1
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57906 - Posted: 28 Nov 2021, 10:53:41 UTC - in response to Message 57905.  

Unless someone here can report about successful completion of the new WUs with an Ampere card.

As I posted previously I am using linux.

oh okay, thanks for the information (which explains why it works well on your system).
ID: 57906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57911 - Posted: 28 Nov 2021, 13:55:24 UTC - in response to Message 57906.  
Last modified: 28 Nov 2021, 13:56:49 UTC

I have Ampere cards completing 101 and 1121 tasks from the latest batch just fine.

This lastly commented problem is only impacting Windows hosts.
If your hosts are running under any kind of Linux distribution, it is normal that they aren't being affected.

too bad that the user PDW has hidden his computers in the profile. So no idea what OS is being used ... unless he tells us.

You asked:
Unless someone here can report about successful completion of the new WUs with an Ampere card.

As I posted previously I am using linux.

oh okay, thanks for the information (which explains why it works well on your system).
No, it does not explain it.
I've tried to run a CUDA 101 task on my Ubuntu 18.04.6 host on an RTX 3080 Ti (driver: 495.44), and it's failed after a few minutes.
<core_client_version>7.16.17</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
14:33:16 (1675): wrapper (7.7.26016): starting
14:33:23 (1675): wrapper (7.7.26016): starting
14:33:23 (1675): wrapper: running bin/acemd3 (--boinc --device 0)
ACEMD failed:
    Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

14:35:30 (1675): bin/acemd3 exited; CPU time 127.166324
14:35:30 (1675): app exit status: 0x1
14:35:30 (1675): called boinc_finish(195)

</stderr_txt>
]]>
ID: 57911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Retvari Zoltan
Avatar

Send message
Joined: 20 Jan 09
Posts: 2380
Credit: 16,897,957,044
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57918 - Posted: 28 Nov 2021, 14:26:19 UTC - in response to Message 57911.  
Last modified: 28 Nov 2021, 14:39:37 UTC

ID: 57918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 21 Feb 20
Posts: 1116
Credit: 40,839,470,595
RAC: 6,423
Level
Trp
Scientific publications
wat
Message 57959 - Posted: 29 Nov 2021, 14:28:37 UTC

after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over.
ID: 57959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Billy Ewell 1931

Send message
Joined: 22 Oct 10
Posts: 42
Credit: 1,752,050,315
RAC: 57
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57966 - Posted: 29 Nov 2021, 18:07:32 UTC - in response to Message 57880.  

Quote:Your tasks are failing with 'app exit status: 0xc0000135' - in all likelihood, you are missing a Microsoft runtime DLL file. Please refer to message 57353.Quote

Richard: Thank you kindly for solving the problem. I installed both 86 and 64 updating applications and now both machines are processing GPU Grid tasks without fault.

Billy Ewell 1931; celebrating the passage of my 90th birthday a few days ago and am physically in good shape and still mentally quite capable.
ID: 57966 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57967 - Posted: 29 Nov 2021, 18:09:21 UTC

I missed out on all the new work because I had to get new master lists on all the hosts when their 24 hour timeouts finally expired.
ID: 57967 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 428
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57969 - Posted: 29 Nov 2021, 18:24:03 UTC - in response to Message 57967.  

I missed out on all the new work because I had to get new master lists on all the hosts when their 24 hour timeouts finally expired.

I think the 24 hour (master file fetch) backoff is set by the client, rather than the server - so it can be over-ridden by a manual update.

That's unlike the 'daily quota exceeded' and the 'last request too recent' backoffs, which are enforced by the server and can't be bypassed.

I might use one of these boring lockdown days to force a client into 'master file fetch' mode, so I can see how it's recorded in client_state.xml, and hence how to remove it again - whenever and wherever that knowledge might be useful in the future.
ID: 57969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 13 Dec 17
Posts: 1419
Credit: 9,119,446,190
RAC: 891
Level
Tyr
Scientific publications
watwatwatwatwat
Message 57971 - Posted: 29 Nov 2021, 19:54:05 UTC

Manual updates did nothing but keep resetting the 24 hour timer backoff.
Same with an update script running every 15 minutes. Backoff never got below 23 hours before resetting back to 24.
ID: 57971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : 195 (0xc3) EXIT_CHILD_FAILED

©2025 Universitat Pompeu Fabra