Message boards :
Number crunching :
All Gerard WUs erroring
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, I'm seeing this happening with the last dowloaded units, wingmen also have the same error "process exited with code 212 (0xd4, -44)" Not sure but it could be only for linux WUs |
|
Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Also for windows, error message "(unknown error) - exit code -97 (0xffffff9f)" |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 116,723 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi, Yes, there seems to be a batch of WUs, that are failing on previously reliable Linux machines and some mostly bad windows hosts, but they are running fine on my windows computers. One has already completed successfully at this time. See links: https://www.gpugrid.net/workunit.php?wuid=11397999 https://www.gpugrid.net/workunit.php?wuid=11398213 https://www.gpugrid.net/workunit.php?wuid=11398820 https://www.gpugrid.net/workunit.php?wuid=11398294 |
|
Send message Joined: 27 Apr 15 Posts: 2 Credit: 147,218,248 RAC: 0 Level ![]() Scientific publications ![]()
|
On my Windows 7 machine, (I7-3770, GTX 980) I currently had ~10 GERALD WU (more in the cue and still comming in) that were running @ less then %1 GPU usage (according to GPU-Z) while the progress in the BOINC manager appeared to be normal/a little slow (~15 hour estimation per WU). All these WU suddenly disappeared from the BOINC manager without any error massage and also without showing up in my results in my GPUGRID stats. Certainly there is something flawed with these WUs! |
|
Send message Joined: 27 Apr 15 Posts: 2 Credit: 147,218,248 RAC: 0 Level ![]() Scientific publications ![]()
|
I missed the other WUs, but right now this happened to the WU: e14s27_e9s23p1f368-GERARD_CXCL12_DIM_HEP_GLYCAM-0-1-RND5008 This WU was running @ <1% GPU usage but at close to normal progress speed, however it was restarting every ~10 hours or so. I now cancelled this WU, and the next one in my cue seems to work normally again (e13s16_e8s26p11f203-GERARD_CXCL12_DIMPROTO3-0-1-RND2849; estimated time ~12 hours, 82% GPU usage) |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have both kind of these WUs: 1. Erroring on all hosts, including mine. https://www.gpugrid.net/workunit.php?wuid=11396918 https://www.gpugrid.net/workunit.php?wuid=11396911 2. Erroring on all hosts, except on mine: https://www.gpugrid.net/workunit.php?wuid=11397526 https://www.gpugrid.net/workunit.php?wuid=11398513 https://www.gpugrid.net/workunit.php?wuid=11397102 https://www.gpugrid.net/workunit.php?wuid=11397012 https://www.gpugrid.net/workunit.php?wuid=11398161 https://www.gpugrid.net/workunit.php?wuid=11398515 https://www.gpugrid.net/workunit.php?wuid=11396116 https://www.gpugrid.net/workunit.php?wuid=11398187 |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 116,723 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have both kind of these WUs: So how many errors did you get recently? If it's a small number, you could attribute that to running into the occasional bad WU. If you have a lot more, than it's more than just a linux problem. For the record, I have 2 errors since the new year. All WUs on my machines are currently running okay and I hope it stays that way!. So, I would say that I ran into 2 bad WUs. |
ServicEnginICSend message Joined: 24 Sep 10 Posts: 593 Credit: 12,146,936,510 RAC: 4,406,248 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've found the same behavior in my linux hosts, in WUs received since Jan-02-2016 past midday. Consequently, statistics are getting worse, possibly due to those failing linux WUs... This can be seen at the bottom of "Server status" page. https://www.gpugrid.net/server_status.php On Jan-02-2016 at 22:41 UTC, the medium error rate over the 25 kinds of WUs in progress was 20,9952 % This has increased to 25,7552 % at 11:44 UTC on Jan-03-2016. |
Retvari ZoltanSend message Joined: 20 Jan 09 Posts: 2380 Credit: 16,897,957,044 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
So how many errors did you get recently? If it's a small number, you could attribute that to running into the occasional bad WU. If you have a lot more, than it's more than just a linux problem.I have four errors recently. It's a bit more than usual. The two aborted WUs are my fault. |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I haven't seen the problem yet on a pair of GTX 960s. https://www.gpugrid.net/results.php?hostid=194224&offset=0&show_names=0&state=0&appid= I had originally boosted the P2 memory clock as per ETA's suggestion (https://einstein.phys.uwm.edu/forum_thread.php?id=11044), but saw a few "simulation unstable" messages, though I don't think they led to actual errors at that point. But that was a little to close to the edge for me, so I removed that boost and the cards are back to factory default, which is not much of an overclock on these MSI 2GD5T OC cards. Maybe that keeps them stable on the most difficult work units. |
|
Send message Joined: 29 Dec 15 Posts: 1 Credit: 135,300 RAC: 0 Level ![]() Scientific publications
|
14814161 11399908 286919 3 Jan 2016 | 16:38:22 UTC 3 Jan 2016 | 16:39:05 UTC Error while computing 0.00 0.00 --- Long runs 14814079 11399366 286919 3 Jan 2016 | 16:16:38 UTC 3 Jan 2016 | 16:32:43 UTC Error while computing 0.00 0.00 --- 14813534 11399465 286919 3 Jan 2016 | 13:04:05 UTC 3 Jan 2016 | 13:06:03 UTC Error while computing 0.00 0.00 --- 14801182 11384321 286919 29 Dec 2015 | 20:21:34 UTC 1 Jan 2016 | 9:50:05 UTC Completed and validated 212,450.23 4,110.21 135,300.00 Long runs Same problem here with a valid run from dezember last year. Greets, Klaus |
|
Send message Joined: 13 Jan 14 Posts: 21 Credit: 15,415,926,517 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have seen the same thing on my linux host - all work units since the one below error out the same way Stderr output <core_client_version>7.3.15</core_client_version> <![CDATA[ <message> process exited with code 212 (0xd4, -44) </message> <stderr_txt> </stderr_txt> ]]> 14811283 11398815 176528 3 Jan 2016 | 0:28:00 UTC 3 Jan 2016 | 1:08:39 UTC Error while computing 0.00 0.00 --- Long runs (8-12 hours on fastest card) v8.46 (cuda65) |
|
Send message Joined: 24 May 11 Posts: 7 Credit: 93,272,937 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello , I'm using ubuntu 14.04 lts. Gerard-WU's stopped after 1 second and were uploaded. "Output file was absent" for four files at a time. I did some collatz conjecture earlier today but I guess that didn't mess up my computer as others are having problems too. opr |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Not sure if it's related, but I too just had an error with a Gerard unit, which is a rare thing to happen for me. http://www.gpugrid.net/workunit.php?wuid=11389493 Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT (unknown error) - exit code 194 (0xc2) Name e3s31_e2s25p1f424-GERARD_CXCL12_CHALC4_DIM1-0-1-RND7047_1 |
|
Send message Joined: 10 Feb 09 Posts: 4 Credit: 2,771,097,960 RAC: 163,981 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Since 16:48 UTC on the second of January, my Linux host(206986) has failed all WU's it has received. My 2 Windows hosts are working as usual. A quick look through the task lists for the top 10 users shows the same pattern. Has anyone come up with a theory as to what is happening? In the meantime I have set that host to NNT to avoid causing any congestion at the server-side. |
|
Send message Joined: 25 Mar 12 Posts: 103 Credit: 14,948,929,771 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
This issue continues ocurring in all my hosts (Linux). Guess is that administrators are still in holidays, no claim, they deserve them. |
MJHSend message Joined: 12 Nov 07 Posts: 696 Credit: 27,266,655 RAC: 0 Level ![]() Scientific publications ![]()
|
The Linux app binary has expired and needs to be updated. I'll get that done tomorrow, hopefully. |
StoneagemanSend message Joined: 25 May 09 Posts: 224 Credit: 34,057,374,498 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks Matt. Hope the update will improve it's performance |
God is Love, JC proves it. I t...Send message Joined: 24 Nov 11 Posts: 30 Credit: 201,648,059 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
WU e15s19_e14s24p1f286-GERARD_CXCL12_DIMPROTO3-0-1-RND3500_2 has been stuck at '85% "progress" ' for some 12 hours now. I only have a 640, so WUs take 40-60 hours generally. This task has already run for 69:58. is this part of a defective batch? How many more hours should I sacrifice for this WU? I am presuming that if I abort it, there will be zero credit for these 70 hours (even if it is a flawed WU?) I Run Win 7 on my HP-1120, i7-2600. (I am NOT going to 'upgrade' to Win 10 for months, until (I hope) MS gets all the garbage in Win8-10 patched up.) Please advise. Meanwhile I have paused it and am putting my GPU to better use. Thanks. I think ∴ I THINK I am My thinking neither is the source of my being NOR proves it to you God Is Love, Jesus proves it! ∴ we are |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'd suggest restarting the PC. And if the problem still persists, then abort the task. |
©2026 Universitat Pompeu Fabra