Message boards :
Graphics cards (GPUs) :
Problems with Wu,s
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Name 27-KASHIF_HIVPR_dim_ba4-22-100-RND7589_0 Workunit 635752 Created 19 Jul 2009 13:04:29 UTC Sent 19 Jul 2009 14:54:10 UTC Received 19 Jul 2009 15:42:17 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 44377 Report deadline 24 Jul 2009 14:54:10 UTC CPU time 595.2172 stderr out <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 938803200 bytes # Number of multiprocessors: 24 # Number of cores: 192 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> Validate state Invalid Claimed credit 4038.48842592593 Granted credit 0 application version 6.64 All above is a wu run on linux with the version 180 recommended driver I am running linux mint 7 64bit and win xp pro on dual boot with amd athlon2 dual core 5600 with a GTX 260 All wus come up with the same error I get the same error on win xp pro with the 182.06 driver ( when i run that side of the comp ) I hope that someone can help me out here !! Thanks in advance for your time and comments |
HydropowerSend message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Have you checked the temperature on the card ? I recommend staying under 85 c. Also, try CPU-Z and switch on the error checking mode. If you get any error, your GPU is ready for an RMA. I have had this happen to me (under XP64) and all my problems went away after I swapped the card. Join team Bletchley Park, the innovators. |
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Thanks Hydropower The temp is normally around 72c I have now got CPU-z but cannot find any "error checking mode" in the program ?? but i will keep looking, My GPU is only a week old and works fine for graphics but i have not yet been able to "crunch" a single WU with it yet. My XP is 32 bit and I also run linux mint 64 bit on the same comp and get the same error on both sides. I will keep trying "things" Thanks Again |
HydropowerSend message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Hi flatron, Your temperature looks very good. I was mistaken about GPU-z, I meant OCCT, a performance measuring tool. Regards, H. Join team Bletchley Park, the innovators. |
Steve DoddSend message Joined: 26 Dec 08 Posts: 18 Credit: 4,614,833,506 RAC: 132 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I'm also having problems with a new GPU card (GTX 260 (Core 216)). New today. I've had 3 wu error out already. (639130, 638991, & 638617) One of those errored out on someone else's computer. Using 186.16 driver. XP (32-bit), Q6600 CPU. New power supply (650 Watt). No overclocking. In an air conditioned room. |
Steve DoddSend message Joined: 26 Dec 08 Posts: 18 Credit: 4,614,833,506 RAC: 132 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Addendum: getting these messages on wu completion 7/20/2009 10:11:20 PM GPUGRID Computation for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 finished 7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_1 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent 7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_2 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent 7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_3 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Hi, for the Linux machine it seems a driver issue, try the 182.xx driver (although 180 should work). Are you sure that you have installed it correctly? For the Windows machine, do you have XP 64? If not then you should also install the 182 driver. Let us know if it works. gdf |
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Hi Thanx for the replies On the linux side i now have installed the 185.18.14-pkg.run driver and will be trying it out in the near future, I have a few other issues to sort out on this OS first. My win is 32 bit and i have just installed the 182 driver and will wind it up soon after the occt tests. will report back shortly Thanks for the help guys steve |
|
Send message Joined: 19 Mar 09 Posts: 1 Credit: 125,295,057 RAC: 37 Level ![]() Scientific publications ![]() ![]()
|
I get the same errors right at the start of a WU with the 185 Drivers under 64 bit linux |
|
Send message Joined: 28 Feb 09 Posts: 37 Credit: 666,889 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I am using the 180.60 64-bit Linux drivers because none of the 185.X series has worked with GPUGrid.....ever. Mike D
|
Steve DoddSend message Joined: 26 Dec 08 Posts: 18 Credit: 4,614,833,506 RAC: 132 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
No love with the 182.50 driver. Same type error after over 11 hours :( 999594 640725 21 Jul 2009 14:14:21 UTC 22 Jul 2009 4:36:13 UTC Error while computing 831.39 4,531.91 --- Messages in BOINC: 7/21/2009 10:52:38 PM GPUGRID Computation for task 537-GIANNI_BINDTST1-7-100-RND1474_1 finished 7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_1 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent 7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_2 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent 7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_3 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent Really bummed out. I was soooo looking forward to my increased RAC with the new video card :) |
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Name p1685000-IBUCH_6_pYEEI_carb_2207-0-3-RND3824_0 Workunit 645477 Created 22 Jul 2009 17:33:47 UTC Sent 23 Jul 2009 0:14:51 UTC Received 23 Jul 2009 0:17:13 UTC Server state Over Outcome Client error Client state Compute error Exit status 98 (0x62) Computer ID 43981 Report deadline 28 Jul 2009 0:14:51 UTC CPU time 7.671875 stderr out <core_client_version>6.6.36</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 24 # Number of cores: 192 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: cufftExecR2C (gridcalc1) called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 4926.84722222222 Granted credit 0 application version 6.64 Above you can see the result of the latest fail This is on a win xp pro 32 bit with gxt 260 amd 64 dual core 2 and the latest 190. drivers and boinc 6.6.36 I hope we can sort this out soon temps are not a problem only 72c |
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
This is the latest fail after a complete reinstall of windows, drivers (190.xxx), boinc 6.6.36 everything!!! I even crunched about 60 seti wu´s a few of them failed but only about 5. temps are good average 76c. Any Ideas any one i want to crunch this project and not seti Name m85000-IBUCH_random_pYEEI_kxy01start_2407-2-3-RND0764_0 Workunit 655768 Created 26 Jul 2009 17:01:27 UTC Sent 26 Jul 2009 18:00:34 UTC Received 26 Jul 2009 23:31:09 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 43981 Report deadline 31 Jul 2009 18:00:34 UTC CPU time 868.922 stderr out <core_client_version>6.6.36</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 24 # Number of cores: 192 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 24 # Number of cores: 192 # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 24 # Number of cores: 192 # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 24 # Number of cores: 192 # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 24 # Number of cores: 192 Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unspecified launch failure. </stderr_txt> ]]> Validate state Invalid Claimed credit 3977.21064814815 Granted credit 0 application version 6.64 |
HydropowerSend message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Hi, I am sorry to hear that it is not working and sympathize with your frustration. Many of us 'have been there'. Did you run the occt test (an hour or so will usually do with the error checking option on), if so what was the result ? |
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Hi Hydropower Yes i ran the occt tests and got 134 errors!!! but i did not save the csv files (silly me) but the test are underway again now I have no idea if 134 errors is a lot and if they are going to stop me crunching here, As i said befor i can crunch Seti with no problem so something works Will post results of occt when they are done. |
HydropowerSend message Joined: 3 Apr 09 Posts: 70 Credit: 6,003,024 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
WOW ! ONE error is enough to RMA your card... Because it means that a certain calculation, whose results are predefined, has failed to meet the predefined result. Like 100/10 = 8. Lethal for CUDA. That Seti hasn't failed is because it analyzes noise, with your card it probably detects signals where there aren't any (or worse, detects nothing where there was a signal). I'd RMA that card with the OCCT results to the dealer. Join team Bletchley Park, the innovators. |
|
Send message Joined: 16 Jul 09 Posts: 7 Credit: 0 RAC: 0 Level ![]() Scientific publications
|
Hi I have run the tests twice more with 90 and 38 errors respectivley It looks like i will have to write a nasty letter to the dealer and get another card, thanks for the support be back soon |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Steve, an error after 11h means that generally your setup is fine but at some point something goes wrong. Is your cpu / RAM OC'ed? What are your GPU temps during GPU-Grid? Did you try OCCT or other stress test tools? You can also try the 190 WHQL driver, but it likely won't help if 182.50 didn't work. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
As i said befor i can crunch Seti with no problem so something works As you said before, seti worked except for 5 out of ~60 WUs. Seti WUs are much shorter and less stressful, so chances are that you can finish some of them despite the occasional error ;) MrS Scanning for our furry friends since Jan 2002 |
Steve DoddSend message Joined: 26 Dec 08 Posts: 18 Credit: 4,614,833,506 RAC: 132 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi ET Apes, This whole mess started with driver rev 190.xx I haven't tried any stress tests yet. The computer is in an air conditioned room; GPU temps are ~52C, well under any limit. Stock card, stock clock. (PNY) My HP running a GTX250 & VISTA 64-bit works fine. I've upgraded BOINC to 6.6.37 for this last test. Same result after 11 hours 44 minutes - Compute error - file blah blah blah is absent (times 3). NNT until this is sorted out. |
©2025 Universitat Pompeu Fabra