Message boards :
Number crunching :
1-GERARD_MO_MOR WUs failing immediately
Message board moderation
| Author | Message |
|---|---|
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A couple examples: https://www.gpugrid.net/workunit.php?wuid=11674070 https://www.gpugrid.net/workunit.php?wuid=11674120 The error is always: ERROR: file force.cpp line 513: TCL evaluation of [calcforces] However I do have a 0-GERARD_MO_MOR WU that's still running after 5 hours. |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 116,723 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
A couple examples: Same thing here! Name e1s34_1-GERARD_MO_MOR_1-0-1-RND2358_2 Workunit 11674055 Created 18 Jul 2016 | 14:57:53 UTC Sent 18 Jul 2016 | 14:57:56 UTC Received 18 Jul 2016 | 19:18:31 UTC Server state Over Outcome Computation error Client state Compute error Exit status -98 (0xffffffffffffff9e) Unknown error number Computer ID 263612 Report deadline 23 Jul 2016 | 14:57:56 UTC Run time 3.25 CPU time 1.22 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65) Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -98 (0xffffff9e) </message> <stderr_txt> # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 Ti # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:01:00.0 # Device clock : 1266MHz # Memory clock : 3505MHz # Memory width : 384bit # Driver version : r358_00 : 35906 ERROR: file force.cpp line 513: TCL evaluation of [calcforces] 15:16:11 (6520): called boinc_finish </stderr_txt> ]]> Name e1s41_1-GERARD_MO_MOR_2-0-1-RND9697_0 Workunit 11674112 Created 18 Jul 2016 | 12:13:48 UTC Sent 18 Jul 2016 | 12:40:09 UTC Received 18 Jul 2016 | 19:18:31 UTC Server state Over Outcome Computation error Client state Compute error Exit status -98 (0xffffffffffffff9e) Unknown error number Computer ID 263612 Report deadline 23 Jul 2016 | 12:40:09 UTC Run time 3.45 CPU time 1.17 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65) Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -98 (0xffffff9e) </message> <stderr_txt> # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 Ti # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:01:00.0 # Device clock : 1266MHz # Memory clock : 3505MHz # Memory width : 384bit # Driver version : r358_00 : 35906 ERROR: file force.cpp line 513: TCL evaluation of [calcforces] 15:16:07 (2340): called boinc_finish </stderr_txt> ]]> |
|
Send message Joined: 22 Dec 12 Posts: 2 Credit: 272,996,387 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here : Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -98 (0xffffff9e) </message> <stderr_txt> # GPU [GeForce GTX 980] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:0F:00.0 # Device clock : 1215MHz # Memory clock : 3505MHz # Memory width : 256bit # Driver version : r352_00 : 35362 ERROR: file force.cpp line 513: TCL evaluation of [calcforces] 23:14:00 (5980): called boinc_finish </stderr_txt> ]]> |
|
Send message Joined: 28 Mar 09 Posts: 490 Credit: 11,739,145,728 RAC: 116,723 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had 2 more of these units error out: Name e1s39_1-GERARD_MO_MOR_1-0-1-RND2698_6 Workunit 11674060 Created 19 Jul 2016 | 21:42:11 UTC Sent 19 Jul 2016 | 21:42:18 UTC Received 19 Jul 2016 | 21:44:52 UTC Server state Over Outcome Computation error Client state Compute error Exit status -98 (0xffffffffffffff9e) Unknown error number Computer ID 263612 Report deadline 24 Jul 2016 | 21:42:18 UTC Run time 3.08 CPU time 1.17 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65) Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -98 (0xffffff9e) </message> <stderr_txt> # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 1 : # Name : GeForce GTX 980 Ti # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:02:00.0 # Device clock : 1190MHz # Memory clock : 3505MHz # Memory width : 384bit # Driver version : r358_00 : 35906 ERROR: file force.cpp line 513: TCL evaluation of [calcforces] 17:47:08 (6440): called boinc_finish </stderr_txt> ]]> Name e1s50_1-GERARD_MO_MOR_1-0-1-RND9002_1 Workunit 11674071 Created 19 Jul 2016 | 17:21:34 UTC Sent 19 Jul 2016 | 17:21:49 UTC Received 19 Jul 2016 | 21:37:22 UTC Server state Over Outcome Computation error Client state Compute error Exit status -98 (0xffffffffffffff9e) Unknown error number Computer ID 30790 Report deadline 24 Jul 2016 | 17:21:49 UTC Run time 6.00 CPU time 2.83 Validate state Invalid Credit 0.00 Application version Long runs (8-12 hours on fastest card) v8.48 (cuda65) Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -98 (0xffffff9e) </message> <stderr_txt> # GPU [GeForce GTX 980 Ti] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 980 Ti # ECC : Disabled # Global mem : 4095MB # Capability : 5.2 # PCI ID : 0000:02:00.0 # Device clock : 1190MHz # Memory clock : 3505MHz # Memory width : 384bit # Driver version : r355_00 : 35582 ERROR: file force.cpp line 513: TCL evaluation of [calcforces] 17:39:53 (3556): called boinc_finish </stderr_txt> ]]> This looks like a bad batch. |
|
Send message Joined: 6 Jan 15 Posts: 76 Credit: 25,499,534,331 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here 90.45% ends to fail, what can we do about it, and how does these 9.55 manage to success to those wu:s? 3-4 sec this time so low lost but would like to see low error rate. http://www.gpugrid.net/workunit.php?wuid=11674051 http://www.gpugrid.net/workunit.php?wuid=11674098 http://www.gpugrid.net/workunit.php?wuid=11674051 Got the same wu but to another host. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had e1s27_1-GERARD_MO_MOR_1-0-1-RND0098 - reached maximum number of errors with no successful returns at all. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
90% fail rate says there is a problem with the model. No tasks available confirms there is a problem. Why are we not using a beta queue to test such tasks? ----------------------------------------------------------------------------- FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Same here 90.45% ends to fail, what can we do about it, and how does these 9.55 manage to success to those wu:s? 3-4 sec this time so low lost but would like to see low error rate. Because the 0-GERARD_MO_MOR WUs seem OK while the 1-GERARD_MO_MOR WUs all fail. All the admins are apparently either on vacation or asleep. >> Why are we not using a beta queue to test such tasks? We're open to theories... |
©2026 Universitat Pompeu Fabra