195 (0xc3) EXIT_CHILD

Author	Message
Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level Scientific publications	Message 57480 - Posted: 5 Oct 2021, 8:42:33 UTC Last modified: 5 Oct 2021, 8:55:54 UTC My RTX 3080 machine completed a first task successfully. Afterwards, two more tasks crashed with an 195 (0xc3) EXIT_CHILD_FAILED error message and the following log (after only a few seconds of run time): Name e2s184_e1s254p0f959-ADRIA_AdB_KIXCMYB_HIP-0-2-RND9959_9 Arbeitspaket 27080023 Erstellt 4 Oct 2021 \| 9:59:05 UTC Gesendet 4 Oct 2021 \| 10:48:16 UTC Empfangen 4 Oct 2021 \| 22:07:40 UTC Serverstatus Abgeschlossen Resultat Berechnungsfehler Clientstatus Berechnungsfehler Endstatus 195 (0xc3) EXIT_CHILD_FAILED Computer ID 584499 Ablaufdatum 9 Oct 2021 \| 10:48:16 UTC Laufzeit 25.51 CPU Zeit 0.00 Prüfungsstatus Ungültig Punkte 0.00 Anwendungsversion New version of ACEMD v2.18 (cuda101) Stderr Ausgabe <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3)</message> <stderr_txt> 00:05:49 (30732): wrapper (7.9.26016): starting 00:05:49 (30732): wrapper: running bin/acemd3.exe (--boinc --device 0) ACEMD failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 00:05:59 (30732): bin/acemd3.exe exited; CPU time 0.000000 00:05:59 (30732): app exit status: 0x1 00:05:59 (30732): called boinc_finish(195) 0 bytes in 0 Free Blocks. 186 bytes in 4 Normal Blocks. 1144 bytes in 1 CRT Blocks. 0 bytes in 0 Ignore Blocks. 0 bytes in 0 Client Blocks. Largest number used: 0 bytes. Total allocations: 239849 bytes. Dumping objects -> {323256} normal block at 0x000001B7D23D3BC0, 85 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 ..\api\boinc_api.cpp(309) : {323253} normal block at 0x000001B7D23D4940, 8 bytes long. Data: < 1Ò· > 00 00 31 D2 B7 01 00 00 {322608} normal block at 0x000001B7D23D3C60, 85 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 {321994} normal block at 0x000001B7D23D46C0, 8 bytes long. Data: <@Ê?Ò· > 40 CA 3F D2 B7 01 00 00 ..\zip\boinc_zip.cpp(122) : {146} normal block at 0x000001B7D23D3090, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 {133} normal block at 0x000001B7D23D48A0, 16 bytes long. Data: < ø<Ò· > 10 F8 3C D2 B7 01 00 00 00 00 00 00 00 00 00 00 {132} normal block at 0x000001B7D23CF810, 40 bytes long. Data: < H=Ò· conda-pa> A0 48 3D D2 B7 01 00 00 63 6F 6E 64 61 2D 70 61 {125} normal block at 0x000001B7D23CF340, 48 bytes long. Data: <--boinc --device> 2D 2D 62 6F 69 6E 63 20 2D 2D 64 65 76 69 63 65 {124} normal block at 0x000001B7D23D49E0, 16 bytes long. Data: <XN=Ò· > 58 4E 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {123} normal block at 0x000001B7D23D4C60, 16 bytes long. Data: <0N=Ò· > 30 4E 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {122} normal block at 0x000001B7D23D4850, 16 bytes long. Data: < N=Ò· > 08 4E 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {121} normal block at 0x000001B7D23D3DB0, 16 bytes long. Data: <àM=Ò· > E0 4D 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {120} normal block at 0x000001B7D23D4030, 16 bytes long. Data: <¸M=Ò· > B8 4D 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {119} normal block at 0x000001B7D23D4080, 16 bytes long. Data: < M=Ò· > 90 4D 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {118} normal block at 0x000001B7D23D4120, 16 bytes long. Data: <pM=Ò· > 70 4D 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {117} normal block at 0x000001B7D23D4990, 16 bytes long. Data: <HM=Ò· > 48 4D 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {116} normal block at 0x000001B7D23D42B0, 16 bytes long. Data: < M=Ò· > 20 4D 3D D2 B7 01 00 00 00 00 00 00 00 00 00 00 {115} normal block at 0x000001B7D23D4D20, 496 bytes long. Data: <°B=Ò· bin/acem> B0 42 3D D2 B7 01 00 00 62 69 6E 2F 61 63 65 6D {65} normal block at 0x000001B7D23C2D80, 16 bytes long. Data: < ê{÷ > 80 EA 97 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {64} normal block at 0x000001B7D23C2B50, 16 bytes long. Data: <@é{÷ > 40 E9 97 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {63} normal block at 0x000001B7D23C2B00, 16 bytes long. Data: <øW{÷ > F8 57 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {62} normal block at 0x000001B7D23C2AB0, 16 bytes long. Data: <ØW{÷ > D8 57 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {61} normal block at 0x000001B7D23C3370, 16 bytes long. Data: <P {÷ > 50 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {60} normal block at 0x000001B7D23C2A60, 16 bytes long. Data: <0 {÷ > 30 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {59} normal block at 0x000001B7D23C3500, 16 bytes long. Data: <à {÷ > E0 02 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {58} normal block at 0x000001B7D23C3640, 16 bytes long. Data: < {÷ > 10 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {57} normal block at 0x000001B7D23C2A10, 16 bytes long. Data: <p {÷ > 70 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {56} normal block at 0x000001B7D23C3870, 16 bytes long. Data: < À{÷ > 18 C0 92 7B F7 7F 00 00 00 00 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> Name e4s109_e1s39p0f745-ADRIA_AdB_KIXCMYB_HIP-1-2-RND2493_0 Arbeitspaket 27081645 Erstellt 4 Oct 2021 \| 22:12:32 UTC Gesendet 4 Oct 2021 \| 22:14:12 UTC Empfangen 4 Oct 2021 \| 22:16:12 UTC Serverstatus Abgeschlossen Resultat Berechnungsfehler Clientstatus Berechnungsfehler Endstatus 195 (0xc3) EXIT_CHILD_FAILED Computer ID 584499 Ablaufdatum 9 Oct 2021 \| 22:14:12 UTC Laufzeit 7.26 CPU Zeit 0.00 Prüfungsstatus Ungültig Punkte 0.00 Anwendungsversion New version of ACEMD v2.18 (cuda101) Stderr Ausgabe <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3)</message> <stderr_txt> 00:14:24 (14320): wrapper (7.9.26016): starting 00:14:24 (14320): wrapper: running bin/acemd3.exe (--boinc --device 0) ACEMD failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 00:14:26 (14320): bin/acemd3.exe exited; CPU time 0.000000 00:14:26 (14320): app exit status: 0x1 00:14:26 (14320): called boinc_finish(195) 0 bytes in 0 Free Blocks. 186 bytes in 4 Normal Blocks. 1144 bytes in 1 CRT Blocks. 0 bytes in 0 Ignore Blocks. 0 bytes in 0 Client Blocks. Largest number used: 0 bytes. Total allocations: 241603 bytes. Dumping objects -> {323256} normal block at 0x000002061D1C3BC0, 85 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 ..\api\boinc_api.cpp(309) : {323253} normal block at 0x000002061D1C43F0, 8 bytes long. Data: < > 00 00 02 1D 06 02 00 00 {322608} normal block at 0x000002061D1C3C60, 85 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 {321994} normal block at 0x000002061D1C42B0, 8 bytes long. Data: <@Ê > 40 CA 1E 1D 06 02 00 00 ..\zip\boinc_zip.cpp(122) : {146} normal block at 0x000002061D1C3090, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 {133} normal block at 0x000002061D1C3EF0, 16 bytes long. Data: <Ðò > D0 F2 1B 1D 06 02 00 00 00 00 00 00 00 00 00 00 {132} normal block at 0x000002061D1BF2D0, 40 bytes long. Data: <ð> conda-pa> F0 3E 1C 1D 06 02 00 00 63 6F 6E 64 61 2D 70 61 {125} normal block at 0x000002061D1BF180, 48 bytes long. Data: <--boinc --device> 2D 2D 62 6F 69 6E 63 20 2D 2D 64 65 76 69 63 65 {124} normal block at 0x000002061D1C4940, 16 bytes long. Data: <XN > 58 4E 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {123} normal block at 0x000002061D1C4490, 16 bytes long. Data: <0N > 30 4E 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {122} normal block at 0x000002061D1C4800, 16 bytes long. Data: < N > 08 4E 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {121} normal block at 0x000002061D1C47B0, 16 bytes long. Data: <àM > E0 4D 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {120} normal block at 0x000002061D1C48A0, 16 bytes long. Data: <¸M > B8 4D 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {119} normal block at 0x000002061D1C4710, 16 bytes long. Data: < M > 90 4D 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {118} normal block at 0x000002061D1C48F0, 16 bytes long. Data: <pM > 70 4D 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {117} normal block at 0x000002061D1C4990, 16 bytes long. Data: <HM > 48 4D 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {116} normal block at 0x000002061D1C4A80, 16 bytes long. Data: < M > 20 4D 1C 1D 06 02 00 00 00 00 00 00 00 00 00 00 {115} normal block at 0x000002061D1C4D20, 496 bytes long. Data: < J bin/acem> 80 4A 1C 1D 06 02 00 00 62 69 6E 2F 61 63 65 6D {65} normal block at 0x000002061D1B36E0, 16 bytes long. Data: < ê{÷ > 80 EA 97 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {64} normal block at 0x000002061D1B3410, 16 bytes long. Data: <@é{÷ > 40 E9 97 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {63} normal block at 0x000002061D1B3820, 16 bytes long. Data: <øW{÷ > F8 57 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {62} normal block at 0x000002061D1B33C0, 16 bytes long. Data: <ØW{÷ > D8 57 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {61} normal block at 0x000002061D1B3190, 16 bytes long. Data: <P {÷ > 50 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {60} normal block at 0x000002061D1B3000, 16 bytes long. Data: <0 {÷ > 30 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {59} normal block at 0x000002061D1B2FB0, 16 bytes long. Data: <à {÷ > E0 02 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {58} normal block at 0x000002061D1B3320, 16 bytes long. Data: < {÷ > 10 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {57} normal block at 0x000002061D1B2F60, 16 bytes long. Data: <p {÷ > 70 04 94 7B F7 7F 00 00 00 00 00 00 00 00 00 00 {56} normal block at 0x000002061D1B3140, 16 bytes long. Data: < À{÷ > 18 C0 92 7B F7 7F 00 00 00 00 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> Any idea what is going on? Very annoying is the fact that after these two consecutive crashes, it took the GPUGRID server 4 hours to send out a new task (which is now in progress) - making my machine uselessly idling for hours. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. ID: 57480 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level Scientific publications	Message 57481 - Posted: 5 Oct 2021, 8:59:03 UTC - in response to Message 57480. Your computers are hidden, so I can't be certain, but your problem seems to be Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) There are two versions of the new GPUGrid application: cuda1121 and cuda101. You will be able to see in your task list which worked, and which didn't work. Despite some posts to the contrary, the general consensus is that cuda1121 works on an RTX 3080, and cuda101 doesn't. And despite an assurance from the project that they have prevented the cuda101 application being sent to RTX cards, clearly they haven't. There's nothing you can do to prevent the wrong application being sent to your card: just take comfort from the fact that cuda101 tasks will fail very quickly, and you won't waste computing power on the tasks. The only hit you and the project are taking is the waste of bandwidth downloading the inappropriate tasks. ID: 57481 · Rating: 0 · rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level Scientific publications	Message 57482 - Posted: 5 Oct 2021, 9:13:09 UTC Thank you Richard - only cuda1121 works for me. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. ID: 57482 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 331,341 Level Scientific publications	Message 57484 - Posted: 5 Oct 2021, 11:43:49 UTC - in response to Message 57481. And despite an assurance from the project that they have prevented the cuda101 application being sent to RTX cards, clearly they haven't. I endorse this statement. I have been sent cuda101 tasks to my two RTX3070, the latest one this morning. There's nothing you can do to prevent the wrong application being sent to your card: just take comfort from the fact that cuda101 tasks will fail very quickly, and you won't waste computing power on the tasks. The only hit you and the project are taking is the waste of bandwidth downloading the inappropriate tasks. however, there is more to it: What might also happen is that if one deletes erronously downloaded cuda101 tasks from the BOINC task list too often, one will not receive any more tasks for the next 24 hours. Hence, this problem should be solved by the project team ASAP ! ID: 57484 · Rating: 0 · rate: / Reply Quote

bozz4science Send message Joined: 22 May 20 Posts: 110 Credit: 115,525,136 RAC: 0 Level Scientific publications	Message 57486 - Posted: 5 Oct 2021, 13:02:45 UTC I also don't quite understand what information determines the app version to be sent out. This task f.ex. has been sent out 6 times before my host caught it. Once it was 1121 app version, all others were sent out as the 101 app version. It did fail on all previous hosts and went through 3 Ampere cards (3060Ti, 3070 & 3090). Seems to be quite an annoyance for anyone with the latest cards. And older cards take some serious chewing on the new tasks. Mine takes a little over 31hrs. This project could be working much more efficiently if it were able to fully capture the potential of these RTX 3000 series cards. ID: 57486 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level Scientific publications	Message 57487 - Posted: 5 Oct 2021, 13:30:47 UTC - in response to Message 57486. But there are two different failure modes - three of each: three missing DLLs (probably vcruntime140_1), and three wrong architecture (cuda101 on RTX) You need all three to align - right version, on right architecture, with right software support - before it'll run. One out of eight is about the right probability. ID: 57487 · Rating: 0 · rate: / Reply Quote

bozz4science Send message Joined: 22 May 20 Posts: 110 Credit: 115,525,136 RAC: 0 Level Scientific publications	Message 57488 - Posted: 5 Oct 2021, 13:36:57 UTC That's sounds about right. Only meant to highlight the Ampere cards that all failed obviously due to the wrong version having been sent to these hosts, but somehow older gen cards getting the 1121 app version instead on some occasions. ID: 57488 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 57489 - Posted: 5 Oct 2021, 13:41:17 UTC All the more reason to just retire the cuda101 app, and force everyone to update their drivers to use the cuda1121 app ID: 57489 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level Scientific publications	Message 57490 - Posted: 5 Oct 2021, 13:45:41 UTC - in response to Message 57489. All the more reason to just retire the cuda101 app, and force everyone to update their drivers to use the cuda1121 app I disagree. People should be allowed their own choice of driver (you don't know why they've kept an older one), but the project should manage the minimum limits better. ID: 57490 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 57491 - Posted: 5 Oct 2021, 13:56:55 UTC - in response to Message 57490. Last modified: 5 Oct 2021, 14:32:33 UTC IMO, the "choice" of driver in the ranges of CUDA101 and CUDA1121 compatibility will be arbitrary. the list of supported products is exactly the same so it's not like some older GPU wont be supported anymore with the newer driver. Nvidia drivers are very stable and it's pretty rare that a new driver fully breaks something. CUDA101 requires driver 418.xx, CUDA1121 requires driver 461.xx, there's not a huge difference here. but even still there's a large range of "choice" between the minimum driver required for cuda1121 and what is the current latest driver release. they don't need to be bleeding edge. CUDA 11.2 was introduced almost a year ago, and it's currently up to CUDA 11.4 Update 2. If you have some software issue that actively prevents installing a new driver, then fix your software issues. there's really no good reason not to update if you're already on hardware and drivers new enough to support the CUDA101 app. it's not a huge change to get more recent drivers, and the observed negative impacts from the project maintaining two app versions far outweigh the impact of requiring a user to update their drivers. ID: 57491 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level Scientific publications	Message 57492 - Posted: 5 Oct 2021, 13:59:51 UTC - in response to Message 57491. it's not a huge change to get more recent drivers, and the observed negative impacts from the project maintaining two app versions far outweigh the impact of requiring a user to update their drivers. It can be if your computer is managed by your employer's domain controller, and group policy prevents you updating it yourself. Just an example. ID: 57492 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 423,674 Level Scientific publications	Message 57493 - Posted: 5 Oct 2021, 14:04:46 UTC - in response to Message 57492. it's not a huge change to get more recent drivers, and the observed negative impacts from the project maintaining two app versions far outweigh the impact of requiring a user to update their drivers. It can be if your computer is managed by your employer's domain controller, and group policy prevents you updating it yourself. Just an example. in this case, it's MORE likely that these systems will (should) be updated to recent, as any competent SA will (should) be keeping everything on the up and up in terms of security patches, and there has been a stronger push from Nvidia in this regard lately. ID: 57493 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level Scientific publications	Message 57494 - Posted: 5 Oct 2021, 14:15:13 UTC I think we're far enough off topic. Let's leave it there. ID: 57494 · Rating: 0 · rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level Scientific publications	Message 57544 - Posted: 8 Oct 2021, 12:13:48 UTC Well, this project's incapability of delivering the proper GPU app/plan class to the corresponding GPU systems simply results in a massive loss of project overall performance: Due to the repetitive "compute errors" the clients do not receive further tasks for a while and idle around for hours. I figured that this way instead of two tasks per day, I can deliver only one. Well, not my problem. A second project is occupying the idle time now. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. ID: 57544 · Rating: 0 · rate: / Reply Quote

bcavnaugh Send message Joined: 8 Nov 13 Posts: 56 Credit: 1,002,640,163 RAC: 0 Level Scientific publications	Message 57572 - Posted: 11 Oct 2021, 2:26:22 UTC Last modified: 11 Oct 2021, 2:59:44 UTC Glad at least one of my Host is running but all the other are NOT! "[img]Not Working[/img]" 2080 (441.20) running 1080 (431.86) not running also 2080 (431.86) not running What NVIDIA Driver must me have to Run GPUGRID? As you can see even with the new or current runtimes it still fails 14.29.30135.0 Current VS 2022 the version with the tasks is 14.28.29325.2 https://live.staticflickr.com/65535/51574059037_5ae789d24d_b.jpg Update: For me Driver 441.20 seems to work on all my Host,Yahoo! ID: 57572 · Rating: 0 · rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 101 Credit: 15,776,211,122 RAC: 3,857 Level Scientific publications	Message 57575 - Posted: 11 Oct 2021, 4:26:33 UTC Nvidia driver version 441.20 is a bit old. I am currently running version 471.11 on my Windows hosts. ID: 57575 · Rating: 0 · rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level Scientific publications	Message 57598 - Posted: 13 Oct 2021, 10:29:25 UTC - in response to Message 57481. Last modified: 13 Oct 2021, 10:31:09 UTC Despite some posts to the contrary, the general consensus is that cuda1121 works on an RTX 3080, and cuda101 doesn't. And despite an assurance from the project that they have prevented the cuda101 application being sent to RTX cards, clearly they haven't. I second that. There's nothing you can do to prevent the wrong application being sent to your card: just take comfort from the fact that cuda101 tasks will fail very quickly, and you won't waste computing power on the tasks. Unfortunately, that is exactly NOT the case. Here an example of a task which ran for almost 15 hrs before failing with an error: Task: https://www.gpugrid.net/result.php?resultid=32653715 Name e7s106_e5s196p1f1036-ADRIA_AdB_KIXCMYB_HIP-1-2-RND0214_4 Arbeitspaket 27082868 Erstellt 11 Oct 2021 \| 6:23:21 UTC Gesendet 11 Oct 2021 \| 6:24:56 UTC Empfangen 12 Oct 2021 \| 9:32:05 UTC Serverstatus Abgeschlossen Resultat Berechnungsfehler Clientstatus Berechnungsfehler Endstatus 195 (0xc3) EXIT_CHILD_FAILED Computer ID 588794 Ablaufdatum 16 Oct 2021 \| 6:24:56 UTC Laufzeit 53,608.48 CPU Zeit 53,473.36 Prüfungsstatus Ungültig Punkte 0.00 Anwendungsversion New version of ACEMD v2.18 (cuda101) Stderr Ausgabe <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3)</message> <stderr_txt> 08:25:11 (11620): wrapper (7.9.26016): starting 08:25:11 (11620): wrapper: running bin/acemd3.exe (--boinc --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {323250} normal block at 0x000002C574996E70, 8 bytes long. Data: < $v > 00 00 24 76 C5 02 00 00 ..\lib\diagnostics_win.cpp(417) : {321999} normal block at 0x000002C576431310, 1080 bytes long. Data: < > FC 08 00 00 CD CD CD CD 0C 01 00 00 00 00 00 00 ..\zip\boinc_zip.cpp(122) : {149} normal block at 0x000002C57499EBA0, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Object dump complete. 10:58:41 (4808): wrapper (7.9.26016): starting 10:58:41 (4808): wrapper: running bin/acemd3.exe (--boinc --device 0) Detected memory leaks! Dumping objects -> ..\api\boinc_api.cpp(309) : {323286} normal block at 0x000001DA265261F0, 8 bytes long. Data: < J& > 00 00 4A 26 DA 01 00 00 ..\lib\diagnostics_win.cpp(417) : {322035} normal block at 0x000001DA26591B80, 1080 bytes long. Data: <h > 68 1A 00 00 CD CD CD CD 20 01 00 00 00 00 00 00 ..\zip\boinc_zip.cpp(122) : {149} normal block at 0x000001DA2652EB00, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Object dump complete. 11:30:47 (13592): wrapper (7.9.26016): starting 11:30:47 (13592): wrapper: running bin/acemd3.exe (--boinc --device 0) ACEMD failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch) 11:30:49 (13592): bin/acemd3.exe exited; CPU time 0.000000 11:30:49 (13592): app exit status: 0x1 11:30:49 (13592): called boinc_finish(195) 0 bytes in 0 Free Blocks. 298 bytes in 4 Normal Blocks. 1144 bytes in 1 CRT Blocks. 0 bytes in 0 Ignore Blocks. 0 bytes in 0 Client Blocks. Largest number used: 0 bytes. Total allocations: 130740 bytes. Dumping objects -> {323289} normal block at 0x0000014269701A70, 141 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 ..\api\boinc_api.cpp(309) : {323286} normal block at 0x00000142696C62F0, 8 bytes long. Data: < eiB > 00 00 65 69 42 01 00 00 {322649} normal block at 0x00000142697020F0, 141 bytes long. Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65 {322036} normal block at 0x00000142696C6890, 8 bytes long. Data: <p siB > 70 1B 73 69 42 01 00 00 ..\zip\boinc_zip.cpp(122) : {149} normal block at 0x00000142696CE940, 260 bytes long. Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 {136} normal block at 0x00000142696C7060, 16 bytes long. Data: <p«liB > 70 AB 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {135} normal block at 0x00000142696CAB70, 40 bytes long. Data: <`pliB conda-pa> 60 70 6C 69 42 01 00 00 63 6F 6E 64 61 2D 70 61 {128} normal block at 0x00000142696CAB00, 48 bytes long. Data: <--boinc --device> 2D 2D 62 6F 69 6E 63 20 2D 2D 64 65 76 69 63 65 {127} normal block at 0x00000142696C6FC0, 16 bytes long. Data: <8øliB > 38 F8 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {126} normal block at 0x00000142696C6AC0, 16 bytes long. Data: < øliB > 10 F8 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {125} normal block at 0x00000142696C6A70, 16 bytes long. Data: <è÷liB > E8 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {124} normal block at 0x00000142696C6A20, 16 bytes long. Data: <À÷liB > C0 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {123} normal block at 0x00000142696C6C00, 16 bytes long. Data: < ÷liB > 98 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {122} normal block at 0x00000142696C6980, 16 bytes long. Data: <p÷liB > 70 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {121} normal block at 0x00000142696C70B0, 16 bytes long. Data: <P÷liB > 50 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {120} normal block at 0x00000142696C6930, 16 bytes long. Data: <(÷liB > 28 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {119} normal block at 0x00000142696C6570, 16 bytes long. Data: < ÷liB > 00 F7 6C 69 42 01 00 00 00 00 00 00 00 00 00 00 {118} normal block at 0x00000142696CF700, 496 bytes long. Data: <peliB bin/acem> 70 65 6C 69 42 01 00 00 62 69 6E 2F 61 63 65 6D {68} normal block at 0x00000142696C62A0, 16 bytes long. Data: < ê¼ ö > 80 EA BC 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {67} normal block at 0x00000142696C6CF0, 16 bytes long. Data: <@é¼ ö > 40 E9 BC 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {66} normal block at 0x00000142696C6480, 16 bytes long. Data: <øW¹ ö > F8 57 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {65} normal block at 0x00000142696C6520, 16 bytes long. Data: <ØW¹ ö > D8 57 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {64} normal block at 0x00000142696C6840, 16 bytes long. Data: <P ¹ ö > 50 04 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {63} normal block at 0x00000142696C6B60, 16 bytes long. Data: <0 ¹ ö > 30 04 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {62} normal block at 0x00000142696C6390, 16 bytes long. Data: <à ¹ ö > E0 02 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {61} normal block at 0x00000142696C6250, 16 bytes long. Data: < ¹ ö > 10 04 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {60} normal block at 0x00000142696C66B0, 16 bytes long. Data: <p ¹ ö > 70 04 B9 1A F6 7F 00 00 00 00 00 00 00 00 00 00 {59} normal block at 0x00000142696C67F0, 16 bytes long. Data: < À· ö > 18 C0 B7 1A F6 7F 00 00 00 00 00 00 00 00 00 00 Object dump complete. </stderr_txt> ]]> Currently, I have another one with cuda101 (falsely selected by the server for this client) which is now running for several hours. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. ID: 57598 · Rating: 0 · rate: / Reply Quote

Michael H.W. Weber Send message Joined: 9 Feb 16 Posts: 78 Credit: 656,229,684 RAC: 0 Level Scientific publications	Message 57605 - Posted: 14 Oct 2021, 12:35:20 UTC - in response to Message 57598. Currently, I have another one with cuda101 (falsely selected by the server for this client) which is now running for several hours. ...it actually caused my machine to crash and was re-starting after re-boot. So I aborted it. Michael. President of Rechenkraft.net - Germany's first and largest distributed computing organization. ID: 57605 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 331,341 Level Scientific publications	Message 57642 - Posted: 24 Oct 2021, 15:16:50 UTC I had an "195 (0xc3) EXIT_CHILD_FAILED" case this afternoon, a few seconds after start: https://www.gpugrid.net/result.php?resultid=32657585 anyone any idea what the reason might have been? ID: 57642 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 3 Level Scientific publications	Message 57643 - Posted: 24 Oct 2021, 15:52:09 UTC - in response to Message 57642. Look at your own link: EXCEPTIONAL CONDITION: src\mdio\bincoord.c, line 193: "nelems != 1" Faulty data - a bad task. Not your fault. ID: 57643 · Rating: 0 · rate: / Reply Quote

195 (0xc3) EXIT_CHILD_FAILED