Message boards :
Graphics cards (GPUs) :
Failures since upgrading to 190.38
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Had 6 wu fail today since machine was upgraded to 190.38. Interestingly its the only machine of the 5 running GPUgrid that seems to be having the problem. Machine is a Win XP box with dual GTX260's in it. Links to the wu: ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: Cuda error: Kernel [reduce4_kernel] failed in file 'reduction.cu' in line 171 : ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: Cuda error: Kernel [fft_data_swizzle_in] failed in file 'c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu' ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 11: In the mean time i've set it to NNW in the hope that both the download issues will go away and the cuda 2.2 app will fix things. BOINC blog |
|
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed. I Uninstalled the Drivers on both Box's and reinstalled them, after rebooting both Box's were running @ full speed. They may not stay running like that though because I have 1 Card now (GTX 260) being RMA'ed for the same reason. Apparently there is a fix or work around for that problem and if the 2 Box's continue to drop back to half speed I'll have to try it rather than have to try & RMA 2 more Cards. All 4 Cards in the 2 Box's are GTX 260's FYI ... All the Wu's on both Box's got Trashed too doing the Reinstalling of the Drivers even though I suspend GPUGrid & the Wu's, Didn't matter, they were gone after BOINC Started back up & had to download fresh ones. |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. I thought my 1st 200 series card was bad also, but it was not. 3d performance mode has to be forced via. software, Riva Tuner is what I use. I have been trying to get the word out as a LOT of people have been posting Boinc wide, thinking their cards are bad. Maybe this is not your problem but it sounds just like my experience. I checked both the Box's I mentioned in the other Thread and they both were running at 1/2 speed which I know from past experience leads to continual errors until fixed. |
BymarkSend message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Joining this tread: wuid=653971 I think i's a server problem? On one of my xp32 with 260: <core_client_version>6.4.7</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 # Amber: readparm : Reading parm file parameters # PARM file in AMBER 7 format # Encounter 10-12 H-bond term WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. WARNING: parameters.cu, line 568: Found zero 10-12 H-bond term. MDIO ERROR: cannot open file "restart.coor" ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) called boinc_finish </stderr_txt> ]]> and CPU time 280.4531 stderr out <core_client_version>6.4.7</core_client_version> <![CDATA[ <message> - exit code 98 (0x62) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" ERROR: c:\cygwin\home\speechserver\gpumd2\src\pme\CPME_cufft.cu, line 104: cufftExecC2R (gridcalc3) called boinc_finish </stderr_txt> ]]> "Silakka" Hello from Turku > Åbo. |
|
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Poorboy, read my post. " Link to prevent Nvidia 200 Downclocking" I believe that could be the problem. The 200 goes into power saving mode by design. Yes I was just about to see if I could get that to work, somebody sent me the Link a few days ago because I was having the same problem on another 200 Series Card. I didn't try it then because the Card was already sent out for RMA'ing & I should have a different Card in the next 2 days. I'll Post if that works for me or not later today or tomorrow if I can get it set up right. Thanks PS: So far this don't seem to be working, I think I'm doing every thing okay but upon Reboot the Settings don't hold. 2 Cores will just read 0 Speed & the 3'rd Core on the Box I'm trying it on just defaults back to stack Speeds. EVGA Precision Tune & GPU-Z show the same 0 Speed for 2 Cores & Stock Speed for 1 Core, so about all I've managed to do so far is lose 3 more Cores. If it's going to take all this jumping thru hoops to run the New Cuda App's I'm afraid the Grid Project will lose a lot of Participants, especially with the new CUDA Projects starting up. I've lost 7 Cores already with the Upgrade to the new Drivers to supposedly be able to run the new Cuda App's and don't feel I can afford to lose any more. |
BymarkSend message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
hostid=35303 Try to install drivers again, It seems ok now, with my trouble host and all others 4 gpu computers are working fine with 190.38. It seems like over gigs of drivers can go wrong some time? "Silakka" Hello from Turku > Åbo. |
|
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
hostid=35303 Already tried that and within a few hours both Box's had Trashed 4 more Wu's each. |
|
Send message Joined: 25 Sep 08 Posts: 111 Credit: 10,352,599 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've installed 190.38 on 2 dual gpu rigs. The Q6600--no problems. The i7 920 was nothing but problems. I upgraded to 6.6.37 and that seemed to fix the issue. I reinstalled the driver. I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free. 6.6.37 Force Driver in RT--Post #9 Force 3D Performance |
|
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I've switched back to RT, forced the driver and forced 3D performance. It has been running for a couple of days now error free. I've tried Mark's Fix on 4 Box's just a few hours ago so I won't really know if it worked or not until in the morning probably. If the Speeds of the Cards don't drop back by then at least it will be longer than they have been holding the Speeds. Usually within a few hr's the Wu's will error because of the Speed Drop. I didn't do the Fix on 3 other Box's because so far I haven't been having any problems with them & it seems any time I do a Settings change that requires a Reboot the Wu's gets Trashed when BOINC restarts again after the Reboot so I didn't want to Trash any more Wu's today than I already had. |
BymarkSend message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
hostid=35303 Nope did't help... get errors in 2-3 hours. <core_client_version>6.4.7</core_client_version> <![CDATA[ <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 MDIO ERROR: cannot open file "restart.coor" # Using CUDA device 0 # Device 0: "GeForce GTX 260" # Clock rate: 1242000 kilohertz # Total amount of global memory: 939196416 bytes # Number of multiprocessors: 27 # Number of cores: 216 Cuda error in file '..\cuda/cutil.h' in line 968 : unspecified launch failure. Memory usage: host: bytes device: bytes Assertion failed: 0, file ..\cuda/cutil.h, line 968 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. </stderr_txt> ]]> "Silakka" Hello from Turku > Åbo. |
BymarkSend message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
hostid=35303 Almost 10 years ago I started with seti, with a Pentium MMX 166 MHz, and It seems like my boinc career will end with seti, this 260 won't work with gpugrid anymore after update to 190.38. "not yet format the hard drive, anything else is done" "Silakka" Hello from Turku > Åbo. |
|
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
hostid=35303 I don't think I'll finish My BOINC Career with SETI but the way my Cards keep dropping like flies it probably won't be with GPUGrid either. Had 6 Cards down already for errors & found 2 more this afternoon that for all practical purposes they may as well be down. It's a Dual GTX 275 Setup that hasn't turned in but 3 Wu's in the last 50 Hr's, it's not turning in errors but it's not really turning in anything because it's slowed to a crawl I guess. BFG's not going to say oh sure send us the 8 Cards you can't crunch with anymore and we'll send you 8 shiny new ones so I'm pretty much stuck with them I figure for better or worse. |
|
Send message Joined: 21 Dec 08 Posts: 51 Credit: 26,320,167 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers? I would suggest this if it has't already been tried. I take this long route and seldom have problems. Don't uninstall Physx before Nvidia drivers, I messed up doing that. Nvidia first and then Physx. |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Roll back to 185.xx, there seem to be problems with 190.xx over some hardware. 185.xx will be fine for gpugrid for quite a while. gdf |
BymarkSend message Joined: 23 Feb 09 Posts: 30 Credit: 5,897,921 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Tried that yesterday, same result. Hostid=35303 is now on NNW. Seti is running fine on that host. (one error in 24h) Something is strange with that computer, now it's running one seti gpu and 2 mc on double core amd 5600. Nice :), nothing is oc. http://setiathome.berkeley.edu/results.php?hostid=4914727 and Hostid=35303 cpuz.txt http://personal.inet.fi/surf/tbymark/boinc/cpuz.txt Tried that too: Are you all uninstalling the old Nvidia driver first from add remove programs, then uninstall Pxysx, and then running " Driver Sweeper " in safe mode after reboot to remove all the old remnants before updating Nvidia drivers? I would suggest this if it has't already been tried. I take this long route and seldom have problems. Don't uninstall Physx before Nvidia drivers, I messed up doing that. Nvidia first and then Physx. Roll back to 185.xx, there seem to be problems with 190.xx over some hardware. "Silakka" Hello from Turku > Åbo. |
|
Send message Joined: 18 Sep 08 Posts: 368 Credit: 4,174,624,885 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Roll back to 185.xx, there seem to be problems with 190.xx over some hardware. I tried that on 4 Cards & still got the Errors and Down-clocking with them, re-installed the 190.38 Drivers & am Processing the Collatz Wu's just fine with no errors or Down-clocking, even re-overclocked them again and they still ran fine. I'll run that for awhile and keep an eye on the Forum here for a real fix with the 190.38's or or try a new Driver Version or Client as they come out & see if that fixes the Cards that went South on the Grid Project. |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Am I right that there's not a single reported failure with G9x cards, only G200 are affected? But some of them still run fine with 190.xx? MrS Scanning for our furry friends since Jan 2002 |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
My 9800GTX+ has been ok with 190.38 - no failures or blips of any kind. Regards Zy |
Steve DoddSend message Joined: 26 Dec 08 Posts: 19 Credit: 4,622,334,506 RAC: 140,836 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Question, possibly for PoorBoy. The GTX 260 cards - are they the Core 216 version? I'm having the same problems as everyone else getting GPUGRID wu to run on this card (XP Home 32-bit, Q6600, stock everything). I've tried to roll back to previous versions of the driver (currently running 185.XX) with no positive results. I'm not showing a downclocking problem using GPU-Z. |
©2026 Universitat Pompeu Fabra