Message boards :
Number crunching :
New batch KKi4
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Dears, this is the continuation of an experiment we'd like to publish soon. WUs are twice as large as the old "CAPBIND*" series. |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dear Toni, I have downloaded and processed already a few of this WU's. Also a few cancelled within 1 minute. Already known? Good luck and good weekend, Ton (ftpd) Netherlands |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
There should be nothing new with these WUs (except their length). By "cancelled" you mean that they failed? |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I had one this morning which has failed on two different machines so far: http://www.gpugrid.net/workunit.php?wuid=1966290 (Edit: but I've had one successful run, on the same machine, and another is currently at about 60%) |
SaengerSend message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
And a TONI_KK broken as well. stderr is this (my Linux): stderr out <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 98 (0x62, -158) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GT 240" # Clock rate: 1.34 GHz # Total amount of global memory: 536150016 bytes # Number of multiprocessors: 12 # Number of cores: 96 MDIO ERROR: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: file mdioload.cpp line 80: Unable to read bincoordfile 11:16:36 (3686): called boinc_finish </stderr_txt> ]]> and this (the other Windows): stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTX 260" # Clock rate: 1.35 GHz # Total amount of global memory: 919994368 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: file mdioload.cpp line 80: Unable to read bincoordfile called boinc_finish </stderr_txt> ]]> Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki |
|
Send message Joined: 6 Jun 08 Posts: 152 Credit: 328,250,382 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Dear Toni, They failed within 1 minute (10-15 seconds processing). Ton (ftpd) Netherlands |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have finished 4 and my systems are running at least one. Reasonable performance compared to the other tasks. However I also got one immediate failure: 3105670 1965540 8 Oct 2010 4:12:47 UTC 8 Oct 2010 8:55:18 UTC Error while computing 2.48 0.95 0.00 --- ACEMD2: GPU molecular dynamics v6.11 (cuda31) Name f178r2-TONI_KKi4-0-200-RND1238_2 Workunit 1965540 Created 8 Oct 2010 3:32:44 UTC Sent 8 Oct 2010 4:12:47 UTC Received 8 Oct 2010 8:55:18 UTC Server state Over Outcome Client error Client state Compute error Exit status 98 (0x62) Computer ID 71363 Report deadline 13 Oct 2010 4:12:47 UTC Run time 2.484375 CPU time 0.953125 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using device 0 # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 470" # Clock rate: 1.43 GHz # Total amount of global memory: 1341849600 bytes # Number of multiprocessors: 14 # Number of cores: 112 # Device 1: "GeForce GTX 470" # Clock rate: 1.43 GHz # Total amount of global memory: 1341718528 bytes # Number of multiprocessors: 14 # Number of cores: 112 SWAN: Using synchronization method 0 MDIO ERROR: read error for file "input.coor", byte number 0: expected to read number of atoms ERROR: file mdioload.cpp line 80: Unable to read bincoordfile called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 0.00480620718015305 Granted credit 0 application version ACEMD2: GPU molecular dynamics v6.11 (cuda31) |
|
Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() |
Hi, for those getting: byte number 0: expected to read number of atoms - it must have been a glitch in mass-WU creation, let them die. Richard - I think your other failure was on a mobile card. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thank you to everyone that reported this problem and thank you Toni for letting us know it is just a WU creation glitch. As these errors are immediate they will have almost no impact on peoples RAC. To date I have only had one such error - most KKi4 WU's run well. |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One of my 9800GTs had a go at h230r2-TONI_KKi4-0-200-RND9586, but unfortunately crashed with an assertion failure at the bitter end, after more than 24 hours of work. C'est la vie. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Richard, you took that blow well. Toni, perhaps Fermi-only long tasks would go down better; a failure after a few hours is no big deal but after a day it really bites, and not everyone is so understanding. I've now had 5 failures, but all under 10sec. 16 other KKi4 tasks ran well. |
BeyondSend message Joined: 23 Nov 08 Posts: 1112 Credit: 6,162,416,256 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Toni, perhaps Fermi-only long tasks would go down better; a failure after a few hours is no big deal but after a day it really bites, and not everyone is so understanding. My GTX 260/216 runs the TONI_KKi4 WUs well, in fact it runs everything well. The problem is with my three GT 240 cards. They won't run the TONI_KKi4 WUs. They don't like the TONI_HERGMETAXDOFE WUs either. They do run KASHIF_HIVPR, TONI_CAPBIND and IBUCH very well though. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have had 4 finish on a GT240, and just one that failed after 2.46sec. Vista x64, all 512MB DDR5 cards. |
Fred J. VersterSend message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
|
Fred J. VersterSend message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Found 2 WU's , computed by 4 hosts, which all failed, 2 still have to Report. h176r1-TONI_KKi4-0-200-RND5770 is giving problems as well!? The faults I've seen so far, all come from the x999y1-TONI_KKi4-0-200-RND5770, batch. Must be noticed by many others, concluding this from the # of INValid Results. dynamics v6.05 (cuda), dynamics v6.11 (cuda31) and dynamics v6.06 (cuda30), are involved, all with process exited with code 1 (0x1, -255). All cards are involved, 240, 250, 470, 480 NVIDIA. Knight Who Says Ni N! |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 351 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
One of my 9800GTs had a go at h230r2-TONI_KKi4-0-200-RND9586, but unfortunately crashed with an assertion failure at the bitter end, after more than 24 hours of work. C'est la vie. This 9800GT host really doesn't like KKi4 - now failed g105r2-TONI_KKi4-6-200-RND6062 with the same SWAN : FATAL : Failure executing kernel sync [frc_sum_kernel] [700] Assertion failed: 0, file swanlib_nv.cpp, line 121 error message. At least it only wasted 22 Ksec this time. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This WU might be bad, http://www.gpugrid.net/workunit.php?wuid=2016815 |
©2025 Universitat Pompeu Fabra