All Gerard WUs erroring

Author	Message
Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 0 Level Scientific publications	Message 42537 - Posted: 2 Jan 2016, 20:14:14 UTC Hi, I'm seeing this happening with the last dowloaded units, wingmen also have the same error "process exited with code 212 (0xd4, -44)" Not sure but it could be only for linux WUs ID: 42537 · Rating: 0 · rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 0 Level Scientific publications	Message 42538 - Posted: 2 Jan 2016, 23:41:03 UTC - in response to Message 42537. Also for windows, error message "(unknown error) - exit code -97 (0xffffff9f)" ID: 42538 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 826 Level Scientific publications	Message 42539 - Posted: 3 Jan 2016, 1:23:18 UTC - in response to Message 42537. Last modified: 3 Jan 2016, 1:24:23 UTC Hi, I'm seeing this happening with the last downloaded units, wingmen also have the same error "process exited with code 212 (0xd4, -44)" Not sure but it could be only for linux WUs Yes, there seems to be a batch of WUs, that are failing on previously reliable Linux machines and some mostly bad windows hosts, but they are running fine on my windows computers. One has already completed successfully at this time. See links: https://www.gpugrid.net/workunit.php?wuid=11397999 https://www.gpugrid.net/workunit.php?wuid=11398213 https://www.gpugrid.net/workunit.php?wuid=11398820 https://www.gpugrid.net/workunit.php?wuid=11398294 ID: 42539 · Rating: 0 · rate: / Reply Quote

Max Ringler Send message Joined: 27 Apr 15 Posts: 2 Credit: 147,218,248 RAC: 0 Level Scientific publications	Message 42540 - Posted: 3 Jan 2016, 9:15:30 UTC On my Windows 7 machine, (I7-3770, GTX 980) I currently had ~10 GERALD WU (more in the cue and still comming in) that were running @ less then %1 GPU usage (according to GPU-Z) while the progress in the BOINC manager appeared to be normal/a little slow (~15 hour estimation per WU). All these WU suddenly disappeared from the BOINC manager without any error massage and also without showing up in my results in my GPUGRID stats. Certainly there is something flawed with these WUs! ID: 42540 · Rating: 0 · rate: / Reply Quote

Max Ringler Send message Joined: 27 Apr 15 Posts: 2 Credit: 147,218,248 RAC: 0 Level Scientific publications	Message 42541 - Posted: 3 Jan 2016, 9:23:45 UTC - in response to Message 42540. I missed the other WUs, but right now this happened to the WU: e14s27_e9s23p1f368-GERARD_CXCL12_DIM_HEP_GLYCAM-0-1-RND5008 This WU was running @ <1% GPU usage but at close to normal progress speed, however it was restarting every ~10 hours or so. I now cancelled this WU, and the next one in my cue seems to work normally again (e13s16_e8s26p11f203-GERARD_CXCL12_DIMPROTO3-0-1-RND2849; estimated time ~12 hours, 82% GPU usage) ID: 42541 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 42542 - Posted: 3 Jan 2016, 10:16:21 UTC I have both kind of these WUs: 1. Erroring on all hosts, including mine. https://www.gpugrid.net/workunit.php?wuid=11396918 https://www.gpugrid.net/workunit.php?wuid=11396911 2. Erroring on all hosts, except on mine: https://www.gpugrid.net/workunit.php?wuid=11397526 https://www.gpugrid.net/workunit.php?wuid=11398513 https://www.gpugrid.net/workunit.php?wuid=11397102 https://www.gpugrid.net/workunit.php?wuid=11397012 https://www.gpugrid.net/workunit.php?wuid=11398161 https://www.gpugrid.net/workunit.php?wuid=11398515 https://www.gpugrid.net/workunit.php?wuid=11396116 https://www.gpugrid.net/workunit.php?wuid=11398187 ID: 42542 · Rating: 0 · rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 826 Level Scientific publications	Message 42546 - Posted: 3 Jan 2016, 12:15:47 UTC - in response to Message 42542. Last modified: 3 Jan 2016, 12:25:56 UTC I have both kind of these WUs: 1. Erroring on all hosts, including mine. https://www.gpugrid.net/workunit.php?wuid=11396918 https://www.gpugrid.net/workunit.php?wuid=11396911 2. Erroring on all hosts, except on mine: https://www.gpugrid.net/workunit.php?wuid=11397526 https://www.gpugrid.net/workunit.php?wuid=11398513 https://www.gpugrid.net/workunit.php?wuid=11397102 https://www.gpugrid.net/workunit.php?wuid=11397012 https://www.gpugrid.net/workunit.php?wuid=11398161 https://www.gpugrid.net/workunit.php?wuid=11398515 https://www.gpugrid.net/workunit.php?wuid=11396116 https://www.gpugrid.net/workunit.php?wuid=11398187 So how many errors did you get recently? If it's a small number, you could attribute that to running into the occasional bad WU. If you have a lot more, than it's more than just a linux problem. For the record, I have 2 errors since the new year. All WUs on my machines are currently running okay and I hope it stays that way!. So, I would say that I ran into 2 bad WUs. ID: 42546 · Rating: 0 · rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 595 Credit: 12,249,686,510 RAC: 383,773 Level Scientific publications	Message 42547 - Posted: 3 Jan 2016, 12:16:31 UTC I've found the same behavior in my linux hosts, in WUs received since Jan-02-2016 past midday. Consequently, statistics are getting worse, possibly due to those failing linux WUs... This can be seen at the bottom of "Server status" page. https://www.gpugrid.net/server_status.php On Jan-02-2016 at 22:41 UTC, the medium error rate over the 25 kinds of WUs in progress was 20,9952 % This has increased to 25,7552 % at 11:44 UTC on Jan-03-2016. ID: 42547 · Rating: 0 · rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level Scientific publications	Message 42549 - Posted: 3 Jan 2016, 14:07:23 UTC - in response to Message 42546. So how many errors did you get recently? If it's a small number, you could attribute that to running into the occasional bad WU. If you have a lot more, than it's more than just a linux problem. I have four errors recently. It's a bit more than usual. The two aborted WUs are my fault. ID: 42549 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 42550 - Posted: 3 Jan 2016, 15:00:26 UTC I haven't seen the problem yet on a pair of GTX 960s. https://www.gpugrid.net/results.php?hostid=194224&offset=0&show_names=0&state=0&appid= I had originally boosted the P2 memory clock as per ETA's suggestion (https://einstein.phys.uwm.edu/forum_thread.php?id=11044), but saw a few "simulation unstable" messages, though I don't think they led to actual errors at that point. But that was a little to close to the edge for me, so I removed that boost and the cards are back to factory default, which is not much of an overclock on these MSI 2GD5T OC cards. Maybe that keeps them stable on the most difficult work units. ID: 42550 · Rating: 0 · rate: / Reply Quote

northcup Send message Joined: 29 Dec 15 Posts: 1 Credit: 135,300 RAC: 0 Level Scientific publications	Message 42551 - Posted: 3 Jan 2016, 16:55:40 UTC 14814161 11399908 286919 3 Jan 2016 \| 16:38:22 UTC 3 Jan 2016 \| 16:39:05 UTC Error while computing 0.00 0.00 --- Long runs 14814079 11399366 286919 3 Jan 2016 \| 16:16:38 UTC 3 Jan 2016 \| 16:32:43 UTC Error while computing 0.00 0.00 --- 14813534 11399465 286919 3 Jan 2016 \| 13:04:05 UTC 3 Jan 2016 \| 13:06:03 UTC Error while computing 0.00 0.00 --- 14801182 11384321 286919 29 Dec 2015 \| 20:21:34 UTC 1 Jan 2016 \| 9:50:05 UTC Completed and validated 212,450.23 4,110.21 135,300.00 Long runs Same problem here with a valid run from dezember last year. Greets, Klaus ID: 42551 · Rating: 0 · rate: / Reply Quote

Rion Family Send message Joined: 13 Jan 14 Posts: 21 Credit: 15,415,926,517 RAC: 0 Level Scientific publications	Message 42553 - Posted: 3 Jan 2016, 17:52:02 UTC Last modified: 3 Jan 2016, 17:53:18 UTC I have seen the same thing on my linux host - all work units since the one below error out the same way Stderr output <core_client_version>7.3.15</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44) </message> <stderr_txt> </stderr_txt> ]]> 14811283 11398815 176528 3 Jan 2016 \| 0:28:00 UTC 3 Jan 2016 \| 1:08:39 UTC Error while computing 0.00 0.00 --- Long runs (8-12 hours on fastest card) v8.46 (cuda65) ID: 42553 · Rating: 0 · rate: / Reply Quote

opr Send message Joined: 24 May 11 Posts: 7 Credit: 93,272,937 RAC: 0 Level Scientific publications	Message 42554 - Posted: 3 Jan 2016, 19:04:13 UTC Hello , I'm using ubuntu 14.04 lts. Gerard-WU's stopped after 1 second and were uploaded. "Output file was absent" for four files at a time. I did some collatz conjecture earlier today but I guess that didn't mess up my computer as others are having problems too. opr ID: 42554 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 42556 - Posted: 3 Jan 2016, 20:26:43 UTC Not sure if it's related, but I too just had an error with a Gerard unit, which is a rare thing to happen for me. http://www.gpugrid.net/workunit.php?wuid=11389493 Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT (unknown error) - exit code 194 (0xc2) Name e3s31_e2s25p1f424-GERARD_CXCL12_CHALC4_DIM1-0-1-RND7047_1 Workunit 11389493 Created 1 Jan 2016 \| 22:45:42 UTC Sent 1 Jan 2016 \| 22:45:48 UTC Received 3 Jan 2016 \| 11:06:22 UTC Server state Over Outcome Computation error Client state Compute error Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT Computer ID 153764 Report deadline 6 Jan 2016 \| 22:45:48 UTC Run time 80,101.12 CPU time 11,903.64 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65) Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code 194 (0xc2) </message> ID: 42556 · Rating: 0 · rate: / Reply Quote

Stroppy Send message Joined: 10 Feb 09 Posts: 4 Credit: 2,772,597,960 RAC: 5,404 Level Scientific publications	Message 42561 - Posted: 4 Jan 2016, 18:09:44 UTC Since 16:48 UTC on the second of January, my Linux host(206986) has failed all WU's it has received. My 2 Windows hosts are working as usual. A quick look through the task lists for the top 10 users shows the same pattern. Has anyone come up with a theory as to what is happening? In the meantime I have set that host to NNT to avoid causing any congestion at the server-side. ID: 42561 · Rating: 0 · rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 0 Level Scientific publications	Message 42562 - Posted: 4 Jan 2016, 18:29:57 UTC This issue continues ocurring in all my hosts (Linux). Guess is that administrators are still in holidays, no claim, they deserve them. ID: 42562 · Rating: 0 · rate: / Reply Quote

MJH Send message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level Scientific publications	Message 42563 - Posted: 4 Jan 2016, 23:13:51 UTC - in response to Message 42562. The Linux app binary has expired and needs to be updated. I'll get that done tomorrow, hopefully. ID: 42563 · Rating: 0 · rate: / Reply Quote

Stoneageman Send message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level Scientific publications	Message 42565 - Posted: 5 Jan 2016, 10:46:48 UTC Thanks Matt. Hope the update will improve it's performance ID: 42565 · Rating: 0 · rate: / Reply Quote

God is Love, JC proves it. I t... Send message Joined: 24 Nov 11 Posts: 30 Credit: 201,648,059 RAC: 0 Level Scientific publications	Message 42566 - Posted: 7 Jan 2016, 0:05:06 UTC WU e15s19_e14s24p1f286-GERARD_CXCL12_DIMPROTO3-0-1-RND3500_2 has been stuck at '85% "progress" ' for some 12 hours now. I only have a 640, so WUs take 40-60 hours generally. This task has already run for 69:58. is this part of a defective batch? How many more hours should I sacrifice for this WU? I am presuming that if I abort it, there will be zero credit for these 70 hours (even if it is a flawed WU?) I Run Win 7 on my HP-1120, i7-2600. (I am NOT going to 'upgrade' to Win 10 for months, until (I hope) MS gets all the garbage in Win8-10 patched up.) Please advise. Meanwhile I have paused it and am putting my GPU to better use. Thanks. I think ∴ I THINK I am My thinking neither is the source of my being NOR proves it to you God Is Love, Jesus proves it! ∴ we are ID: 42566 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level Scientific publications	Message 42567 - Posted: 7 Jan 2016, 2:52:09 UTC - in response to Message 42566. I'd suggest restarting the PC. And if the problem still persists, then abort the task. ID: 42567 · Rating: 0 · rate: / Reply Quote