Failures since upgrading to 190.38

Message boards : Graphics cards (GPUs) : Failures since upgrading to 190.38
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
MarkJ
Volunteer moderator
Volunteer tester

Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11332 - Posted: 26 Jul 2009, 10:55:37 UTC

Had 6 wu fail today since machine was upgraded to 190.38. Interestingly its the only machine of the 5 running GPUgrid that seems to be having the problem. Machine is a Win XP box with dual GTX260's in it. Links to the wu:

ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104:
Cuda error: Kernel [reduce4_kernel] failed in file 'reduction.cu' in line 171 :
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11:
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104:
Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu'
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11:

In the mean time i've set it to NNW in the hope that both the download issues will go away and the cuda 2.2 app will fix things.
BOINC blog
ID: 11332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11337 - Posted: 26 Jul 2009, 13:55:08 UTC
Last modified: 26 Jul 2009, 13:57:10 UTC

I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed.

I Uninstalled the Drivers on both Box's and reinstalled them, after rebooting both Box's were running @ full speed. They may not stay running like that though because I have 1 Card now (GTX 260) being RMA'ed for the same reason. Apparently there is a fix or work around for that problem and if the 2 Box's continue to drop back to half speed I'll have to try it rather than have to try & RMA 2 more Cards.

All 4 Cards in the 2 Box's are GTX 260's FYI ... All the Wu's on both Box's got Trashed too doing the Reinstalling of the Drivers even though I suspend GPUGrid & the Wu's, Didn't matter, they were gone after BOINC Started back up & had to download fresh ones.
ID: 11337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11338 - Posted: 26 Jul 2009, 14:18:33 UTC - in response to Message 11337.  

Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design.
ID: 11338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11339 - Posted: 26 Jul 2009, 14:19:25 UTC - in response to Message 11337.  
Last modified: 26 Jul 2009, 14:25:34 UTC

Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. I thought my 1st 200 series card was bad also, but it was not. 3d performance mode has to be forced via. software, Riva Tuner is what I use.
I have been trying to get the word out as a LOT of people have been posting Boinc wide, thinking their cards are bad.
Maybe this is not your problem but it sounds just like my experience.


I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed.

I Uninstalled the Drivers on both Box's and reinstalled them, after rebooting both Box's were running @ full speed. They may not stay running like that though because I have 1 Card now (GTX 260) being RMA'ed for the same reason. Apparently there is a fix or work around for that problem and if the 2 Box's continue to drop back to half speed I'll have to try it rather than have to try & RMA 2 more Cards.

All 4 Cards in the 2 Box's are GTX 260's FYI ... All the Wu's on both Box's got Trashed too doing the Reinstalling of the Drivers even though I suspend GPUGrid & the Wu's, Didn't matter, they were gone after BOINC Started back up & had to download fresh ones.
ID: 11339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bymark
Avatar

Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11342 - Posted: 26 Jul 2009, 16:15:23 UTC - in response to Message 11339.  
Last modified: 26 Jul 2009, 16:56:18 UTC

Joining this tread:
wuid=653971
I think i's a server problem?


On one of my xp32 with 260:

<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
# Amber: readparm : Reading parm file parameters
# PARM file in AMBER 7 format
# Encounter 10-12 H-bond term
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term.
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)
called boinc_finish

</stderr_txt>
]]>

and

CPU time 280.4531
stderr out
<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3)
called boinc_finish

</stderr_txt>
]]>
"Silakka"
Hello from Turku > Åbo.
ID: 11342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11349 - Posted: 26 Jul 2009, 19:26:14 UTC - in response to Message 11338.  
Last modified: 26 Jul 2009, 20:23:49 UTC

Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design.


Yes I was just about to see if I could get that to work, somebody sent me the Link a few days ago because I was having the same problem on another 200 Series Card. I didn't try it then because the Card was already sent out for RMA'ing & I should have a different Card in the next 2 days.

I'll Post if that works for me or not later today or tomorrow if I can get it set up right. Thanks

PS: So far this don't seem to be working, I think I'm doing every thing okay but upon Reboot the Settings don't hold. 2 Cores will just read 0 Speed & the 3'rd Core on the Box I'm trying it on just defaults back to stack Speeds.

EVGA Precision Tune & GPU-Z show the same 0 Speed for 2 Cores & Stock Speed for 1 Core, so about all I've managed to do so far is lose 3 more Cores. If it's going to take all this jumping thru hoops to run the New Cuda App's I'm afraid the Grid Project will lose a lot of Participants, especially with the new CUDA Projects starting up.

I've lost 7 Cores already with the Upgrade to the new Drivers to supposedly be able to run the new Cuda App's and don't feel I can afford to lose any more.
ID: 11349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bymark
Avatar

Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11350 - Posted: 26 Jul 2009, 20:34:22 UTC
Last modified: 26 Jul 2009, 20:42:55 UTC

hostid=35303


Try to install drivers again, It seems ok now, with my trouble host and
all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time?
"Silakka"
Hello from Turku > Åbo.
ID: 11350 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11351 - Posted: 26 Jul 2009, 21:21:58 UTC - in response to Message 11350.  

hostid=35303


Try to install drivers again, It seems ok now, with my trouble host and
all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time?


Already tried that and within a few hours both Box's had Trashed 4 more Wu's each.
ID: 11351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
naja002
Avatar

Send message
Joined: 25 Sep 08
Posts: 111
Credit: 10,352,599
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwat
Message 11353 - Posted: 27 Jul 2009, 0:11:48 UTC

I've installed 190.38 on 2 dual gpu rigs.

The Q6600--no problems.

The i7 920 was nothing but problems. I upgraded to 6.6.37 and that seemed to fix the issue. I reinstalled the driver. I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free.

6.6.37

Force Driver in RT--Post #9

Force 3D Performance



ID: 11353 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11355 - Posted: 27 Jul 2009, 0:19:07 UTC - in response to Message 11353.  

I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free.

6.6.37

Force Driver in RT--Post #9

Force 3D Performance





I've tried Mark's Fix on 4 Box's just a few hours ago so I won't really know if it worked or not until in the morning probably. If the Speeds of the Cards don't drop back by then at least it will be longer than they have been holding the Speeds. Usually within a few hr's the Wu's will error because of the Speed Drop.

I didn't do the Fix on 3 other Box's because so far I haven't been having any problems with them & it seems any time I do a Settings change that requires a Reboot the Wu's gets Trashed when BOINC restarts again after the Reboot so I didn't want to Trash any more Wu's today than I already had.
ID: 11355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bymark
Avatar

Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11357 - Posted: 27 Jul 2009, 6:26:20 UTC - in response to Message 11351.  

hostid=35303


Try to install drivers again, It seems ok now, with my trouble host and
all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time?


Already tried that and within a few hours both Box's had Trashed 4 more Wu's each.


Nope did't help... get errors in 2-3 hours.

<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
MDIO ERROR: cannot open file "restart.coor"
# Using CUDA device 0
# Device 0: "GeForce GTX 260"
# Clock rate: 1242000 kilohertz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 27
# Number of cores: 216
Cuda error in file '..\cuda/cutil.h' in line 968 : unspecified launch failure.
Memory usage: host: bytes device: bytes
Assertion failed: 0, file ..\cuda/cutil.h, line 968

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>


"Silakka"
Hello from Turku > Åbo.
ID: 11357 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bymark
Avatar

Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11408 - Posted: 27 Jul 2009, 22:07:14 UTC - in response to Message 11357.  
Last modified: 27 Jul 2009, 22:17:39 UTC

hostid=35303

Almost 10 years ago I started with seti, with a Pentium MMX 166 MHz, and It seems like my boinc career will end with seti, this 260 won't work with gpugrid anymore after update to 190.38.

"not yet format the hard drive, anything else is done"
"Silakka"
Hello from Turku > Åbo.
ID: 11408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11409 - Posted: 27 Jul 2009, 22:49:32 UTC - in response to Message 11408.  

hostid=35303

Almost 10 years ago I started with seti, with a Pentium MMX 166 MHz, and It seems like my boinc career will end with seti, this 260 won't work with gpugrid anymore after update to 190.38.

"not yet format the hard drive, anything else is done"


I don't think I'll finish My BOINC Career with SETI but the way my Cards keep dropping like flies it probably won't be with GPUGrid either. Had 6 Cards down already for errors & found 2 more this afternoon that for all practical purposes they may as well be down.

It's a Dual GTX 275 Setup that hasn't turned in but 3 Wu's in the last 50 Hr's, it's not turning in errors but it's not really turning in anything because it's slowed to a crawl I guess. BFG's not going to say oh sure send us the 8 Cards you can't crunch with anymore and we'll send you 8 shiny new ones so I'm pretty much stuck with them I figure for better or worse.
ID: 11409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mark Henderson

Send message
Joined: 21 Dec 08
Posts: 51
Credit: 26,320,167
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 11410 - Posted: 27 Jul 2009, 22:58:11 UTC - in response to Message 11409.  
Last modified: 27 Jul 2009, 23:01:57 UTC

Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers?
I would suggest this if it has't already been tried.
I take this long route and seldom have problems.

Don't uninstall Physx before Nvidia drivers, I messed up doing that. Nvidia first and then Physx.
ID: 11410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 14 Mar 07
Posts: 1958
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 11422 - Posted: 28 Jul 2009, 8:39:07 UTC - in response to Message 11410.  

Roll back to 185.xx, there seem to be problems with 190.xx over some hardware.
185.xx will be fine for gpugrid for quite a while.

gdf
ID: 11422 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bymark
Avatar

Send message
Joined: 23 Feb 09
Posts: 30
Credit: 5,897,921
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwat
Message 11431 - Posted: 28 Jul 2009, 13:49:41 UTC - in response to Message 11422.  
Last modified: 28 Jul 2009, 14:33:23 UTC

Tried that yesterday, same result. Hostid=35303 is now on NNW.
Seti is running fine on that host. (one error in 24h)
Something is strange with that computer, now it's running one seti gpu and 2 mc on double core amd 5600. Nice :), nothing is oc.
http://setiathome.berkeley.edu/results.php?hostid=4914727
and
Hostid=35303 cpuz.txt



http://personal.inet.fi/surf/tbymark/boinc/cpuz.txt
Tried that too:
Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers?
I would suggest this if it has't already been tried.
I take this long route and seldom have problems.

Don't uninstall Physx before Nvidia drivers, I messed up doing that. Nvidia first and then Physx.

Roll back to 185.xx, there seem to be problems with 190.xx over some hardware.
185.xx will be fine for gpugrid for quite a while.

gdf

"Silakka"
Hello from Turku > Åbo.
ID: 11431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 18 Sep 08
Posts: 368
Credit: 4,174,624,885
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 11436 - Posted: 28 Jul 2009, 15:09:44 UTC - in response to Message 11422.  

Roll back to 185.xx, there seem to be problems with 190.xx over some hardware.
185.xx will be fine for gpugrid for quite a while.

gdf


I tried that on 4 Cards & still got the Errors and Down-clocking with them, re-installed the 190.38 Drivers & am Processing the Collatz Wu's just fine with no errors or Down-clocking, even re-overclocked them again and they still ran fine.

I'll run that for awhile and keep an eye on the Forum here for a real fix with the 190.38's or or try a new Driver Version or Client as they come out & see if that fixes the Cards that went South on the Grid Project.
ID: 11436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11482 - Posted: 29 Jul 2009, 19:03:27 UTC

Am I right that there's not a single reported failure with G9x cards, only G200 are affected? But some of them still run fine with 190.xx?

MrS
Scanning for our furry friends since Jan 2002
ID: 11482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Zydor

Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 11483 - Posted: 29 Jul 2009, 19:25:00 UTC - in response to Message 11482.  

My 9800GTX+ has been ok with 190.38 - no failures or blips of any kind.

Regards
Zy
ID: 11483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 26 Dec 08
Posts: 19
Credit: 4,622,334,506
RAC: 140,836
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 11484 - Posted: 29 Jul 2009, 20:39:34 UTC

Question, possibly for PoorBoy. The GTX 260 cards - are they the Core 216 version? I'm having the same problems as everyone else getting GPUGRID wu to run on this card (XP Home 32-bit, Q6600, stock everything). I've tried to roll back to previous versions of the driver (currently running 185.XX) with no positive results. I'm not showing a downclocking problem using GPU-Z.
ID: 11484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Graphics cards (GPUs) : Failures since upgrading to 190.38

©2026 Universitat Pompeu Fabra