195 (0xc3) EXIT_CHILD

Author	Message
Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 57973 - Posted: 29 Nov 2021, 21:04:16 UTC - in response to Message 57971. That's because another failure doesn't reset the failure count. We need to find out where that's stored, and reduce it to less than 10. ID: 57973 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57975 - Posted: 29 Nov 2021, 21:37:29 UTC - in response to Message 57959. after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over. That's easy to check: the CUDA1121 is 990MB while the CUDA101 is 491MB (503406KB). I think it's impossible to run the CUDA101 on RTX3000 series, as that was the main reason demanding a CUDA11 client not so long ago. ID: 57975 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,876,970,595 RAC: 347,555 Level Scientific publications	Message 57976 - Posted: 30 Nov 2021, 2:50:49 UTC - in response to Message 57975. after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over. That's easy to check: the CUDA1121 is 990MB while the CUDA101 is 491MB (503406KB). I think it's impossible to run the CUDA101 on RTX3000 series, as that was the main reason demanding a CUDA11 client not so long ago. my gpugrid project folder contains two compressed files for acemd3. x86_64-pc-linux-gnu__cuda101.zip.<alphanumeric> (515.5 MB) x86_64-pc-linux-gnu__cuda1121.zip.<alphanumeric> (1.0 GB) so it seems it did indeed use the cuda101 code on my 3080Ti and both tasks succeeded. https://www.gpugrid.net/result.php?resultid=32707549 https://www.gpugrid.net/result.php?resultid=32701203 since both apps use the same filename of just 'acemd3', it's possible some bug is causing the wrong (or is it right? lol) one to be used or something to that effect. ID: 57976 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,703,375,525 RAC: 1,193,553 Level Scientific publications	Message 57978 - Posted: 30 Nov 2021, 8:08:50 UTC - in response to Message 57976. since both apps use the same filename of just 'acemd3', it's possible some bug is causing the wrong (or is it right? lol) one to be used or something to that effect. Don't forget it could be this... http://www.gpugrid.net/forum_thread.php?id=5256&nowrap=true#57473 Have completed 5 of the recent cuda101 tasks on Ampere hosts now, a sixth is running and a seventh lined up. Have seen no failures as yet. ID: 57978 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 57985 - Posted: 1 Dec 2021, 9:43:31 UTC - in response to Message 57978. since both apps use the same filename of just 'acemd3', it's possible some bug is causing the wrong (or is it right? lol) one to be used or something to that effect. Don't forget it could be this... http://www.gpugrid.net/forum_thread.php?id=5256&nowrap=true#57473 Have completed 5 of the recent cuda101 tasks on Ampere hosts now, a sixth is running and a seventh lined up. Have seen no failures as yet. I guess that you still use the "special" BOINC manager (compiled for SETI@home), and that handles apps in a different way. That would explain this anomaly. ID: 57985 · Rating: 0 · rate: / Reply Quote

PDW Send message Joined: 7 Mar 14 Posts: 18 Credit: 6,703,375,525 RAC: 1,193,553 Level Scientific publications	Message 57987 - Posted: 1 Dec 2021, 9:59:34 UTC - in response to Message 57985. I guess that you still use the "special" BOINC manager (compiled for SETI@home), and that handles apps in a different way. That would explain this anomaly. No. No modified manager or client here, just the bog standard BOINC 7.16.6 ID: 57987 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level Scientific publications	Message 57988 - Posted: 1 Dec 2021, 10:21:40 UTC - in response to Message 57987. Last modified: 1 Dec 2021, 10:24:55 UTC ... just the bog standard BOINC 7.16.6 You are recommended to upgrade to v7.16.20 - it's pretty good code, and - importantly - it has updated SSL security certificates needed by some BOINC projects. (Edit - the above advice applies only to Windows machines. If you're running Linux, you can ignore it. Your computers are hidden, so I don't know which applies) ID: 57988 · Rating: 0 · rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1168 Credit: 12,311,898,501 RAC: 271,810 Level Scientific publications	Message 58071 - Posted: 11 Dec 2021, 15:16:28 UTC a few hours ago, I had another task which failed after a few seconds with 195 (0xc3) EXIT_CHILD_FAILED ACEMD failed: Particle coordinate is nan https://www.gpugrid.net/workunit.php?wuid=27099407 As can be seen, the task failed on a total of 8 different hosts. I am questioning the rationale behind sending out a faulty task 8 x :-((( ID: 58071 · Rating: 0 · rate: / Reply Quote

195 (0xc3) EXIT_CHILD_FAILED