Advanced search

Message boards : Graphics cards (GPUs) : Problems with Wu,s

Author Message
flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11191 - Posted: 19 Jul 2009 | 16:16:06 UTC

Name 27-KASHIF_HIVPR_dim_ba4-22-100-RND7589_0
Workunit 635752
Created 19 Jul 2009 13:04:29 UTC
Sent 19 Jul 2009 14:54:10 UTC
Received 19 Jul 2009 15:42:17 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 44377
Report deadline 24 Jul 2009 14:54:10 UTC
CPU time 595.2172
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 938803200 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 4038.48842592593
Granted credit 0
application version 6.64

All above is a wu run on linux with the version 180 recommended driver

I am running linux mint 7 64bit and win xp pro on dual boot with amd athlon2 dual core 5600 with a GTX 260
All wus come up with the same error

I get the same error on win xp pro with the 182.06 driver ( when i run that side of the comp )

I hope that someone can help me out here !!
Thanks in advance for your time and comments

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11192 - Posted: 19 Jul 2009 | 17:04:28 UTC - in response to Message 11191.

Have you checked the temperature on the card ? I recommend staying under 85 c. Also, try CPU-Z and switch on the error checking mode. If you get any error, your GPU is ready for an RMA. I have had this happen to me (under XP64) and all my problems went away after I swapped the card.
____________
Join team Bletchley Park, the innovators.

flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11193 - Posted: 19 Jul 2009 | 22:19:50 UTC

Thanks Hydropower

The temp is normally around 72c I have now got CPU-z but cannot find any "error checking mode" in the program ?? but i will keep looking, My GPU is only a week old and works fine for graphics but i have not yet been able to "crunch" a single WU with it yet.
My XP is 32 bit and I also run linux mint 64 bit on the same comp and get the same error on both sides.

I will keep trying "things"

Thanks Again

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11197 - Posted: 20 Jul 2009 | 8:51:31 UTC - in response to Message 11193.

Hi flatron, Your temperature looks very good. I was mistaken about GPU-z, I meant OCCT, a performance measuring tool. Regards, H.
____________
Join team Bletchley Park, the innovators.

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,213,100,422
RAC: 16,159,168
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11214 - Posted: 21 Jul 2009 | 4:55:28 UTC

I'm also having problems with a new GPU card (GTX 260 (Core 216)). New today. I've had 3 wu error out already. (639130, 638991, & 638617) One of those errored out on someone else's computer. Using 186.16 driver. XP (32-bit), Q6600 CPU. New power supply (650 Watt). No overclocking. In an air conditioned room.

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,213,100,422
RAC: 16,159,168
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11215 - Posted: 21 Jul 2009 | 5:14:39 UTC - in response to Message 11214.

Addendum: getting these messages on wu completion

7/20/2009 10:11:20 PM GPUGRID Computation for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 finished
7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_1 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent
7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_2 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent
7/20/2009 10:11:20 PM GPUGRID Output file 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0_3 for task 149-KASHIF_HIVPR_sub_so_ba1-7-100-RND4022_0 absent

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11218 - Posted: 21 Jul 2009 | 9:02:13 UTC - in response to Message 11191.

Hi,
for the Linux machine it seems a driver issue, try the 182.xx driver (although 180 should work). Are you sure that you have installed it correctly?

For the Windows machine, do you have XP 64? If not then you should also install the 182 driver.

Let us know if it works.

gdf

flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11228 - Posted: 21 Jul 2009 | 12:45:46 UTC

Hi Thanx for the replies
On the linux side i now have installed the 185.18.14-pkg.run driver and will be trying it out in the near future, I have a few other issues to sort out on this OS first.

My win is 32 bit and i have just installed the 182 driver and will wind it up soon after the occt tests.

will report back shortly

Thanks for the help guys

steve

Raimund Barbeln
Send message
Joined: 19 Mar 09
Posts: 1
Credit: 87,554,752
RAC: 425,151
Level
Thr
Scientific publications
watwatwat
Message 11230 - Posted: 21 Jul 2009 | 18:31:02 UTC - in response to Message 11228.

I get the same errors right at the start of a WU with the 185 Drivers under 64 bit linux

Michael Doerner
Send message
Joined: 28 Feb 09
Posts: 37
Credit: 666,889
RAC: 0
Level
Gly
Scientific publications
watwatwatwat
Message 11238 - Posted: 22 Jul 2009 | 3:46:14 UTC - in response to Message 11230.
Last modified: 22 Jul 2009 | 3:46:50 UTC

I am using the 180.60 64-bit Linux drivers because none of the 185.X series has worked with GPUGrid.....ever.

Mike D
____________

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,213,100,422
RAC: 16,159,168
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11239 - Posted: 22 Jul 2009 | 5:59:50 UTC - in response to Message 11238.

No love with the 182.50 driver. Same type error after over 11 hours :(

999594 640725 21 Jul 2009 14:14:21 UTC 22 Jul 2009 4:36:13 UTC Error while computing 831.39 4,531.91 ---

Messages in BOINC:

7/21/2009 10:52:38 PM GPUGRID Computation for task 537-GIANNI_BINDTST1-7-100-RND1474_1 finished
7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_1 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent
7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_2 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent
7/21/2009 10:52:39 PM GPUGRID Output file 537-GIANNI_BINDTST1-7-100-RND1474_1_3 for task 537-GIANNI_BINDTST1-7-100-RND1474_1 absent

Really bummed out. I was soooo looking forward to my increased RAC with the new video card :)

flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11264 - Posted: 23 Jul 2009 | 0:26:39 UTC

Name p1685000-IBUCH_6_pYEEI_carb_2207-0-3-RND3824_0
Workunit 645477
Created 22 Jul 2009 17:33:47 UTC
Sent 23 Jul 2009 0:14:51 UTC
Received 23 Jul 2009 0:17:13 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 98 (0x62)
Computer ID 43981
Report deadline 28 Jul 2009 0:14:51 UTC
CPU time 7.671875
stderr out

<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: cufftExecR2C (gridcalc1)
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 4926.84722222222
Granted credit 0
application version 6.64

Above you can see the result of the latest fail
This is on a win xp pro 32 bit with gxt 260 amd 64 dual core 2 and the latest 190. drivers and boinc 6.6.36
I hope we can sort this out soon temps are not a problem only 72c

flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11352 - Posted: 26 Jul 2009 | 23:43:24 UTC

This is the latest fail after a complete reinstall of windows, drivers (190.xxx),
boinc 6.6.36 everything!!!
I even crunched about 60 seti wu´s a few of them failed but only about 5.
temps are good average 76c.
Any Ideas any one i want to crunch this project and not seti

Name m85000-IBUCH_random_pYEEI_kxy01start_2407-2-3-RND0764_0
Workunit 655768
Created 26 Jul 2009 17:01:27 UTC
Sent 26 Jul 2009 18:00:34 UTC
Received 26 Jul 2009 23:31:09 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 43981
Report deadline 31 Jul 2009 18:00:34 UTC
CPU time 868.922
stderr out

<core_client_version>6.6.36</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 24
# Number of cores: 192
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' in line 44 : unspecified launch failure.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 3977.21064814815
Granted credit 0
application version 6.64

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11354 - Posted: 27 Jul 2009 | 0:14:30 UTC - in response to Message 11352.

Hi, I am sorry to hear that it is not working and sympathize with your frustration. Many of us 'have been there'. Did you run the occt test (an hour or so will usually do with the error checking option on), if so what was the result ?

flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11383 - Posted: 27 Jul 2009 | 16:55:17 UTC

Hi Hydropower

Yes i ran the occt tests and got 134 errors!!!
but i did not save the csv files (silly me)
but the test are underway again now
I have no idea if 134 errors is a lot and if they
are going to stop me crunching here, As i said befor i can crunch Seti with no problem so something works
Will post results of occt when they are done.

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11385 - Posted: 27 Jul 2009 | 17:34:21 UTC - in response to Message 11383.

WOW ! ONE error is enough to RMA your card... Because it means that a certain calculation, whose results are predefined, has failed to meet the predefined result. Like 100/10 = 8. Lethal for CUDA. That Seti hasn't failed is because it analyzes noise, with your card it probably detects signals where there aren't any (or worse, detects nothing where there was a signal). I'd RMA that card with the OCCT results to the dealer.

____________
Join team Bletchley Park, the innovators.

flatron97
Send message
Joined: 16 Jul 09
Posts: 7
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 11393 - Posted: 27 Jul 2009 | 19:25:23 UTC

Hi
I have run the tests twice more with 90 and 38 errors respectivley
It looks like i will have to write a nasty letter to the dealer
and get another card,
thanks for the support

be back soon

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11398 - Posted: 27 Jul 2009 | 19:52:40 UTC

Steve,

an error after 11h means that generally your setup is fine but at some point something goes wrong. Is your cpu / RAM OC'ed? What are your GPU temps during GPU-Grid? Did you try OCCT or other stress test tools? You can also try the 190 WHQL driver, but it likely won't help if 182.50 didn't work.

MrS
____________
Scanning for our furry friends since Jan 2002

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11400 - Posted: 27 Jul 2009 | 19:58:19 UTC - in response to Message 11383.

As i said befor i can crunch Seti with no problem so something works


As you said before, seti worked except for 5 out of ~60 WUs. Seti WUs are much shorter and less stressful, so chances are that you can finish some of them despite the occasional error ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Steve Dodd
Send message
Joined: 26 Dec 08
Posts: 18
Credit: 4,213,100,422
RAC: 16,159,168
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11413 - Posted: 28 Jul 2009 | 4:19:36 UTC - in response to Message 11398.

Hi ET Apes,
This whole mess started with driver rev 190.xx
I haven't tried any stress tests yet. The computer is in an air conditioned room; GPU temps are ~52C, well under any limit. Stock card, stock clock. (PNY)

My HP running a GTX250 & VISTA 64-bit works fine.

I've upgraded BOINC to 6.6.37 for this last test. Same result after 11 hours 44 minutes - Compute error - file blah blah blah is absent (times 3).

NNT until this is sorted out.

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 11471 - Posted: 29 Jul 2009 | 14:25:36 UTC

Make a fresh copy of DATA base
because http://milkyway.cs.rpi.edu lost own base
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11522 - Posted: 30 Jul 2009 | 20:15:15 UTC

Steve, seems like you're seeing the same problems many (all?) GT200 card owners are seeing after installation of the 190 series drivers. Something is changed and just installing an older driver doesn't help.

Apart from that the message "output file absent" just means that no proper result file was written since there was an error which interrupted the calculation. It's not the cause, just a symptom.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Hydropower
Avatar
Send message
Joined: 3 Apr 09
Posts: 70
Credit: 6,003,024
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwat
Message 11530 - Posted: 31 Jul 2009 | 7:14:31 UTC - in response to Message 11522.

the same problems many (all?) GT200 card owners are seeing after installation of the 190 series drivers.

So far mine are running okay, knock on wood. Then again, I am upgrading from a 178 version driver. Maybe there is a registry setting in the 18x series that conflicts with the 190 series ?

Profile (_KoDAk_)
Avatar
Send message
Joined: 18 Oct 08
Posts: 43
Credit: 6,924,807
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwatwat
Message 11588 - Posted: 2 Aug 2009 | 5:45:59 UTC

http://www.gpugrid.net/show_host_detail.php?hostid=31714
have big problem
185.85 \ 190.38

02.08.2009 8:44:19 GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
but GPU Results ready to send 648 !!!!!
WTF ?????
yesterday many WU errors in first seconds
____________

TomaszPawel
Send message
Joined: 18 Aug 08
Posts: 121
Credit: 59,836,411
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11589 - Posted: 2 Aug 2009 | 7:12:22 UTC - in response to Message 11588.

This WU is BAD!!!

BAD WU

All hosts Fail

Check it please!
____________
POLISH NATIONAL TEAM - Join! Crunch! Win!

Post to thread

Message boards : Graphics cards (GPUs) : Problems with Wu,s

//