Message boards :
Number crunching :
195 (0xc3) EXIT_CHILD_FAILED
Message board moderation
Previous · 1 · 2 · 3 · 4
| Author | Message |
|---|---|
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
That's because another failure doesn't reset the failure count. We need to find out where that's stored, and reduce it to less than 10. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over.That's easy to check: the CUDA1121 is 990MB while the CUDA101 is 491MB (503406KB). I think it's impossible to run the CUDA101 on RTX3000 series, as that was the main reason demanding a CUDA11 client not so long ago. |
|
Send message Joined: 21 Feb 20 Posts: 1116 Credit: 40,839,470,595 RAC: 6,423 Level ![]() Scientific publications
|
after yesterday's snafu, I picked up two cuda101 tasks this morning on my Linux Ubuntu 20.04 3080Ti system. currently running ok. been running about 20 mins now, and is utilizing the GPU @99% so it's definitely working. I basically executed a project reset yesterday on this host, so I don't think my previous modifications to swap out the 101 app for 1121 carried over.That's easy to check: the CUDA1121 is 990MB while the CUDA101 is 491MB (503406KB). my gpugrid project folder contains two compressed files for acemd3. x86_64-pc-linux-gnu__cuda101.zip.<alphanumeric> (515.5 MB) x86_64-pc-linux-gnu__cuda1121.zip.<alphanumeric> (1.0 GB) so it seems it did indeed use the cuda101 code on my 3080Ti and both tasks succeeded. https://www.gpugrid.net/result.php?resultid=32707549 https://www.gpugrid.net/result.php?resultid=32701203 since both apps use the same filename of just 'acemd3', it's possible some bug is causing the wrong (or is it right? lol) one to be used or something to that effect.
|
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
since both apps use the same filename of just 'acemd3', it's possible some bug is causing the wrong (or is it right? lol) one to be used or something to that effect. Don't forget it could be this... http://www.gpugrid.net/forum_thread.php?id=5256&nowrap=true#57473 Have completed 5 of the recent cuda101 tasks on Ampere hosts now, a sixth is running and a seventh lined up. Have seen no failures as yet. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I guess that you still use the "special" BOINC manager (compiled for SETI@home), and that handles apps in a different way. That would explain this anomaly.since both apps use the same filename of just 'acemd3', it's possible some bug is causing the wrong (or is it right? lol) one to be used or something to that effect. |
PDWSend message Joined: 7 Mar 14 Posts: 18 Credit: 6,575,125,525 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
I guess that you still use the "special" BOINC manager (compiled for SETI@home), and that handles apps in a different way. That would explain this anomaly. No. No modified manager or client here, just the bog standard BOINC 7.16.6 |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 428 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
... just the bog standard BOINC 7.16.6 You are recommended to upgrade to v7.16.20 - it's pretty good code, and - importantly - it has updated SSL security certificates needed by some BOINC projects. (Edit - the above advice applies only to Windows machines. If you're running Linux, you can ignore it. Your computers are hidden, so I don't know which applies) |
|
Send message Joined: 1 Jan 15 Posts: 1166 Credit: 12,260,898,501 RAC: 1 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
a few hours ago, I had another task which failed after a few seconds with 195 (0xc3) EXIT_CHILD_FAILED ACEMD failed: Particle coordinate is nan https://www.gpugrid.net/workunit.php?wuid=27099407 As can be seen, the task failed on a total of 8 different hosts. I am questioning the rationale behind sending out a faulty task 8 x :-((( |
©2025 Universitat Pompeu Fabra