Message boards :
Graphics cards (GPUs) :
GTS 250 65nm G92 Rev A2 - Successes and Failures
Message board moderation
| Author | Message |
|---|---|
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Below is a list of tasks, from 1st Dec 09 to 7th Dec 09, that my GTS250 managed to complete successfully: 151-KASHIF_HIVPR_sub_so_ba2-73-100-RND7713_0 39-GIANNI_BIND_2-32-100-RND1522 38-IBUCH_2_reverse_TRYP_0911-11-40-RND1649 D160-TONI_HERGdof5-5-40-RND0496 467-GIANNI_BIND_166_119-32-100-RND4596 92-KASHIF_HIVPR_twomons_ba2-72-100-RND5413 p1515000-IBUCH_3_pYEEI_2011-13-20-RND1885 98-KASHIF_HIVPR_n1_for_1hhp_open_ba5-81-100-RND2003 70-GIANNI_BIND_166_119-43-100-RND0394 315-GIANNI_BIND_166_119-33-100-RND5680 Last week the same GTS 250 failed Four tasks, All ...TONI-HERG... TONI-HERG is BAD for this GTS 250 Card, so any I get for the card will be Aborted By User; They failed after the following amounts of time, 38565, 15028, 19544 and 3083 seconds. That is a total of 21h of lost crunching, or 12.5% lost time, last week. The previous week it was twice that, 25% lost time, when I had more of these Bad Work Units. So things are looking up again for the old 65nm G92 Rev A2 card. Using Boinc 6.10.18, Driver 19539, CUDA 3000. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Between the 10th Dec and 17th Dec by my GTS250 had 5 Errors and 9 Successes. 19h 15min were lost (69206s), or 11.5% of the time. Slightly better than the previous week (12.5) and still much better that the week before (25%). Since the 13th there has only been one failure, although it did fail after 9h30min! I suspect that failure was as a result of the task being run when I was using the system. So I made sure it does not run GPUGrid when I am using it (which is not too often)! All Error messages have the following line, MDIO ERROR: cannot open file "restart.coor" List of tasks undertaken: 1633773 1024720 15 Dec 2009 23:57:30 UTC 16 Dec 2009 13:41:29 UTC Completed and validated 47,063.52 4,500.40 3,977.21 5,369.23 Full-atom molecular dynamics v6.71 (cuda23) 1632027 1023349 15 Dec 2009 6:17:34 UTC 15 Dec 2009 23:57:30 UTC Error while computing 34,324.59 1,311.34 4,428.01 --- Full-atom molecular dynamics v6.71 (cuda23) 1629991 1022109 14 Dec 2009 15:42:01 UTC 15 Dec 2009 11:17:44 UTC Completed and validated 52,633.66 3,102.44 4,503.74 6,080.05 Full-atom molecular dynamics v6.71 (cuda23) 1627586 1020487 14 Dec 2009 0:18:19 UTC 14 Dec 2009 20:42:11 UTC Completed and validated 55,474.79 3,033.25 4,531.91 6,118.08 Full-atom molecular dynamics v6.71 (cuda23) 1625604 1007426 13 Dec 2009 11:55:40 UTC 14 Dec 2009 6:21:57 UTC Completed and validated 52,461.03 2,915.99 4,503.74 6,080.05 Full-atom molecular dynamics v6.71 (cuda23) 1624544 1018750 12 Dec 2009 11:21:41 UTC 13 Dec 2009 11:53:58 UTC Completed and validated 55,578.08 3,299.06 4,531.91 6,118.08 Full-atom molecular dynamics v6.71 (cuda23) 1624517 1018739 12 Dec 2009 10:56:14 UTC 12 Dec 2009 11:14:49 UTC Error while computing 1,015.11 54.30 4,022.81 --- Full-atom molecular dynamics v6.71 (cuda23) 1624470 1018708 12 Dec 2009 10:28:30 UTC 12 Dec 2009 10:34:45 UTC Error while computing 265.39 18.05 4,428.01 --- Full-atom molecular dynamics v6.71 (cuda23) 1622530 1013606 11 Dec 2009 20:43:48 UTC 12 Dec 2009 10:28:30 UTC Error while computing 32,402.90 1,903.20 4,531.91 --- Full-atom molecular dynamics v6.71 (cuda23) 1620740 1016195 11 Dec 2009 7:20:21 UTC 12 Dec 2009 5:01:01 UTC Completed and validated 50,106.54 2,207.99 4,022.81 5,430.80 Full-atom molecular dynamics v6.71 (cuda23) 1620010 1015882 12 Dec 2009 10:34:45 UTC 12 Dec 2009 10:56:14 UTC Error while computing 1,199.04 117.81 3,977.21 --- Full-atom molecular dynamics v6.71 (cuda23) 1617985 1014436 10 Dec 2009 10:43:53 UTC 11 Dec 2009 16:01:44 UTC Completed and validated 45,358.59 5,275.05 3,539.96 4,778.94 Full-atom molecular dynamics v6.71 (cuda23) 1616001 1013057 9 Dec 2009 23:10:00 UTC 10 Dec 2009 18:54:39 UTC Completed and validated 56,684.09 3,287.19 4,531.91 6,118.08 Full-atom molecular dynamics v6.71 (cuda23) Failure 1: ________________________________________ Name p270000-IBUCH_2_pYEEI_2011-5-20-RND2486_0 Workunit 1015882 Created 11 Dec 2009 1:24:50 UTC Sent 12 Dec 2009 10:34:45 UTC Received 12 Dec 2009 10:56:14 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 51279 Report deadline 17 Dec 2009 10:34:45 UTC Run time 1199.038244 CPU time 117.812 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTS 250" # Clock rate: 1.85 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [pme_fill_charges_overflow] failed in file 'fillcharges.cu' in line 97 : unknown error. </stderr_txt> ]]> Validate state Invalid Claimed credit 3977.21064814815 Granted credit 0 application version Full-atom molecular dynamics v6.71 (cuda23) Failure 2: ________________________________________ Name 471-GIANNI_BIND_166_119-30-100-RND4009_1 Workunit 1013606 Created 11 Dec 2009 20:08:53 UTC Sent 11 Dec 2009 20:43:48 UTC Received 12 Dec 2009 10:28:30 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 51279 Report deadline 16 Dec 2009 20:43:48 UTC Run time 32402.901728 CPU time 1903.197 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTS 250" # Clock rate: 1.85 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error. </stderr_txt> ]]> Validate state Invalid Claimed credit 4531.90972222222 Granted credit 0 application version Full-atom molecular dynamics v6.71 (cuda23) Failure 3: ________________________________________ Name 88-KASHIF_HIVPR_n1_for_1hhp_open_ba4-78-100-RND1283_0 Workunit 1018708 Created 12 Dec 2009 9:52:07 UTC Sent 12 Dec 2009 10:28:30 UTC Received 12 Dec 2009 10:34:45 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 51279 Report deadline 17 Dec 2009 10:28:30 UTC Run time 265.390701 CPU time 18.04932 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTS 250" # Clock rate: 1.85 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error. </stderr_txt> ]]> Validate state Invalid Claimed credit 4428.01157407407 Granted credit 0 application version Full-atom molecular dynamics v6.71 (cuda23) Failure 4: Name34-KASHIF_HIVPR_sub_so_ba1-72-100-RND1262_0 Workunit1018739 Created12 Dec 2009 10:17:10 UTC Sent12 Dec 2009 10:56:14 UTC Received12 Dec 2009 11:14:49 UTC Server stateOver OutcomeClient error Client stateCompute error Exit status1 (0x1) Computer ID51279 Report deadline17 Dec 2009 10:56:14 UTC Run time1015.111446 CPU time54.30395 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTS 250" # Clock rate: 1.85 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error. </stderr_txt> ]]> Validate stateInvalid Claimed credit4022.81481481481 Granted credit0 application versionFull-atom molecular dynamics v6.71 (cuda23) Failure 5: Name89-KASHIF_HIVPR_n1_for_1hhp_open_ba4-78-100-RND7252_1 Workunit1023349 Created15 Dec 2009 5:43:05 UTC Sent15 Dec 2009 6:17:34 UTC Received15 Dec 2009 23:57:30 UTC Server stateOver OutcomeClient error Client stateCompute error Exit status1 (0x1) Computer ID51279 Report deadline20 Dec 2009 6:17:34 UTC Run time34324.593376 CPU time1311.344 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTS 250" # Clock rate: 1.85 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 16 # Number of cores: 128 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # There is 1 device supporting CUDA # Device 0: "GeForce GTS 250" # Clock rate: 1.85 GHz # Total amount of global memory: 1073741824 bytes # Number of multiprocessors: 16 # Number of cores: 128 Cuda error: Kernel [PmeRealSpace_compute_forces] failed in file 'PmeRealSpace.cu' in line 172 : unknown error. </stderr_txt> ]]> Validate stateInvalid Claimed credit4428.01157407407 Granted credit0 application versionFull-atom molecular dynamics v6.71 (cuda23) |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
From the 18th Dec 2009 to the 24th my GTS250 successfully completed 7 tasks in a row, and averaged over 7000 points per day, with tasks completing in between 46000 and 60000 seconds. On the 24th there was a failure after 2seconds, 143-IBUCH_reverse1fix_pYEEI_2312-0-40-RND2977, from a known bad batch of tasks, and then a TONI_HERG task failed after 14,135 seconds. Surprisingly that task succeeded on a GeForce 9600 GT despite failing on 2 additional systems. No failures from the 24th to the 27th. So, since the 18th Dec (9 days ago) my GTS250 has only lost 14138seconds. (777600 – 14138) / 777600sec = 98% successful GPU processing time! A Huge improvement. Techs and Scientists - Thank You, |
©2025 Universitat Pompeu Fabra