Problems with Wu,s

Message boards : Graphics cards (GPUs) : Problems with Wu,s
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11191 - Posted: 19 Jul 2009, 16:16:06 UTC

Name 27-KASHIF_HIVPR_dim_ba4-22-100-RND7589_0
Workunit 635752
Created 19 Jul 2009 13:04:29 UTC
Sent 19 Jul 2009 14:54:10 UTC
Received 19 Jul 2009 15:42:17 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 44377
Report deadline 24 Jul 2009 14:54:10 UTC
CPU time 595.2172
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 938803200 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 4038.48842592593
Granted credit 0
application version 6.64

All above is a wu run on linux with the version 180 recommended driver

I am running linux mint 7 64bit and win xp pro on dual boot with amd athlon2 dual core 5600 with a GTX 260
All wus come up with the same error

I get the same error on win xp pro with the 182.06 driver ( when i run that side of the comp )

I hope that someone can help me out here !!
Thanks in advance for your time and comments
ID: 11191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Hydropower
Avatar

Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11192 - Posted: 19 Jul 2009, 17:04:28 UTC - in response to Message 11191.  

Have you checked the temperature on the card ? I recommend staying under 85 c. Also, try CPU-Z and switch on the error checking mode. If you get any error, your GPU is ready for an RMA. I have had this happen to me (under XP64) and all my problems went away after I swapped the card.
Join team Bletchley Park, the innovators.
ID: 11192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11193 - Posted: 19 Jul 2009, 22:19:50 UTC

Thanks Hydropower

The temp is normally around 72c I have now got CPU-z but cannot find any "error checking mode" in the program ?? but i will keep looking, My GPU is only a week old and works fine for graphics but i have not yet been able to "crunch" a single WU with it yet.
My XP is 32 bit and I also run linux mint 64 bit on the same comp and get the same error on both sides.

I will keep trying "things"

Thanks Again
ID: 11193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Hydropower
Avatar

Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11197 - Posted: 20 Jul 2009, 8:51:31 UTC - in response to Message 11193.  

Hi flatron, Your temperature looks very good. I was mistaken about GPU-z, I meant OCCT, a performance measuring tool. Regards, H.
Join team Bletchley Park, the innovators.
ID: 11197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,614,833,506
RAC: 132
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11214 - Posted: 21 Jul 2009, 4:55:28 UTC

I'm also having problems with a new GPU card (GTX 260 (Core 216)). New today. I've had 3 wu error out already. (639130, 638991, & 638617) One of those errored out on someone else's computer. Using 186.16 driver. XP (32-bit), Q6600 CPU. New power supply (650 Watt). No overclocking. In an air conditioned room.
ID: 11214 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,614,833,506
RAC: 132
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11215 - Posted: 21 Jul 2009, 5:14:39 UTC - in response to Message 11214.  

Addendum: getting these messages on wu completion

7/20/2009 10:11:20 PM GPUGRID Computation for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 finished
7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_1 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent
7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_2 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent
7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_3 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent
ID: 11215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11218 - Posted: 21 Jul 2009, 9:02:13 UTC - in response to Message 11191.  

Hi,
for the Linux machine it seems a driver issue, try the 182.xx driver (although 180 should work). Are you sure that you have installed it correctly?

For the Windows machine, do you have XP 64? If not then you should also install the 182 driver.

Let us know if it works.

gdf
ID: 11218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11228 - Posted: 21 Jul 2009, 12:45:46 UTC

Hi Thanx for the replies
On the linux side i now have installed the 185.18.14-pkg.run driver and will be trying it out in the near future, I have a few other issues to sort out on this OS first.

My win is 32 bit and i have just installed the 182 driver and will wind it up soon after the occt tests.

will report back shortly

Thanks for the help guys

steve
ID: 11228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Raimund Barbeln

Send message
Joined: 19 Mar 09
Posts: 1
Credit: 125,295,057
RAC: 37
Level
Cys
Scientific publications
watwatwat
Message 11230 - Posted: 21 Jul 2009, 18:31:02 UTC - in response to Message 11228.  

I get the same errors right at the start of a WU with the 185 Drivers under 64 bit linux
ID: 11230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Doerner

Send message
Joined: 28 Feb 09
Posts: 37
Credit: 666,889
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 11238 - Posted: 22 Jul 2009, 3:46:14 UTC - in response to Message 11230.  
Last modified: 22 Jul 2009, 3:46:50 UTC

I am using the 180.60 64-bit Linux drivers because none of the 185.X series has worked with GPUGrid.....ever.

Mike D
ID: 11238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,614,833,506
RAC: 132
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11239 - Posted: 22 Jul 2009, 5:59:50 UTC - in response to Message 11238.  

No love with the 182.50 driver. Same type error after over 11 hours :(

999594 640725 21 Jul 2009 14:14:21 UTC 22 Jul 2009 4:36:13 UTC Error while computing 831.39 4,531.91 ---

Messages in BOINC:

7/21/2009 10:52:38 PM GPUGRID Computation for task 537-GIANNI_BINDTST1-7-100-RND1474_1 finished
7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_1 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent
7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_2 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent
7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_3 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent

Really bummed out. I was soooo looking forward to my increased RAC with the new video card :)
ID: 11239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11264 - Posted: 23 Jul 2009, 0:26:39 UTC

Name p1685000-IBUCH_6_pYEEI_carb_2207-0-3-RND3824_0
Workunit 645477
Created 22 Jul 2009 17:33:47 UTC
Sent 23 Jul 2009 0:14:51 UTC
Received 23 Jul 2009 0:17:13 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 98 (0x62)
Computer ID 43981
Report deadline 28 Jul 2009 0:14:51 UTC
CPU time 7.671875
stderr out

<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: cufftExecR2C (gridcalc1)
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 4926.84722222222
Granted credit 0
application version 6.64

Above you can see the result of the latest fail
This is on a win xp pro 32 bit with gxt 260 amd 64 dual core 2 and the latest 190. drivers and boinc 6.6.36
I hope we can sort this out soon temps are not a problem only 72c
ID: 11264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11352 - Posted: 26 Jul 2009, 23:43:24 UTC

This is the latest fail after a complete reinstall of windows, drivers (190.xxx),
boinc 6.6.36 everything!!!
I even crunched about 60 seti wu´s a few of them failed but only about 5.
temps are good average 76c.
Any Ideas any one i want to crunch this project and not seti

Name m85000-IBUCH_random_pYEEI_kxy01start_2407-2-3-RND0764_0
Workunit 655768
Created 26 Jul 2009 17:01:27 UTC
Sent 26 Jul 2009 18:00:34 UTC
Received 26 Jul 2009 23:31:09 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 43981
Report deadline 31 Jul 2009 18:00:34 UTC
CPU time 868.922
stderr out

<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unspecified launch failure.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 3977.21064814815
Granted credit 0
application version 6.64
ID: 11352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Hydropower
Avatar

Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11354 - Posted: 27 Jul 2009, 0:14:30 UTC - in response to Message 11352.  

Hi, I am sorry to hear that it is not working and sympathize with your frustration. Many of us 'have been there'. Did you run the occt test (an hour or so will usually do with the error checking option on), if so what was the result ?
ID: 11354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11383 - Posted: 27 Jul 2009, 16:55:17 UTC

Hi Hydropower

Yes i ran the occt tests and got 134 errors!!!
but i did not save the csv files (silly me)
but the test are underway again now
I have no idea if 134 errors is a lot and if they
are going to stop me crunching here, As i said befor i can crunch Seti with no problem so something works
Will post results of occt when they are done.

ID: 11383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Hydropower
Avatar

Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11385 - Posted: 27 Jul 2009, 17:34:21 UTC - in response to Message 11383.  

WOW ! ONE error is enough to RMA your card... Because it means that a certain calculation, whose results are predefined, has failed to meet the predefined result. Like 100/10 = 8. Lethal for CUDA. That Seti hasn't failed is because it analyzes noise, with your card it probably detects signals where there aren't any (or worse, detects nothing where there was a signal). I'd RMA that card with the OCCT results to the dealer.

Join team Bletchley Park, the innovators.
ID: 11385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
flatron97

Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11393 - Posted: 27 Jul 2009, 19:25:23 UTC

Hi
I have run the tests twice more with 90 and 38 errors respectivley
It looks like i will have to write a nasty letter to the dealer
and get another card,
thanks for the support

be back soon
ID: 11393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11398 - Posted: 27 Jul 2009, 19:52:40 UTC

Steve,

an error after 11h means that generally your setup is fine but at some point something goes wrong. Is your cpu / RAM OC'ed? What are your GPU temps during GPU-Grid? Did you try OCCT or other stress test tools? You can also try the 190 WHQL driver, but it likely won't help if 182.50 didn't work.

MrS
Scanning for our furry friends since Jan 2002
ID: 11398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11400 - Posted: 27 Jul 2009, 19:58:19 UTC - in response to Message 11383.  

As i said befor i can crunch Seti with no problem so something works


As you said before, seti worked except for 5 out of ~60 WUs. Seti WUs are much shorter and less stressful, so chances are that you can finish some of them despite the occasional error ;)

MrS
Scanning for our furry friends since Jan 2002
ID: 11400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,614,833,506
RAC: 132
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11413 - Posted: 28 Jul 2009, 4:19:36 UTC - in response to Message 11398.  

Hi ET Apes,
This whole mess started with driver rev 190.xx
I haven't tried any stress tests yet. The computer is in an air conditioned room; GPU temps are ~52C, well under any limit. Stock card, stock clock. (PNY)

My HP running a GTX250 & VISTA 64-bit works fine.

I've upgraded BOINC to 6.6.37 for this last test. Same result after 11 hours 44 minutes - Compute error - file blah blah blah is absent (times 3).

NNT until this is sorted out.
ID: 11413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Graphics cards (GPUs) : Problems with Wu,s

©2025 Universitat Pompeu Fabra