Message boards :
Graphics cards (GPUs) :
Lot of computation errors on all CUDA apps
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hi guys, the title says it all, all my Cuda apps are crashing while computing. I checked all my temps and i benchmarked my graphic card to see if i had a temperatures problem. No problem with my 9800 GX2 (running 80°C at 100 %usage), all is ok but boinc cuda computation. I'm running Seven 64 bits and 196.21 Gpu Drivers under 6.10.18 Boinc Manager. If someone could give me some help, thank you by advance (sorry for my poor english). Apps which crash : - Seti Cuda 100% crashing (no wu finished) - Einstein Cuda 100% crashing (crash at the moment a second Gpu is active) - GpuGrid 85% crashing Some wu 100% completed, not able to compute on both gpu, if i do so i get compute errors on all wus. |
liveoncSend message Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
Did you enable SLI+ I don't own either a an 9800 GX2 or GTX295, but I tried running my 2 260GTX in SLI once before I read that it doesn't support that. I got at that time 3 errors out of every 4 WU.
|
|
Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I disabled the Sli Switch and i'll give it a try for the next 48 Hours, thx for reply and help. |
|
Send message Joined: 15 Feb 09 Posts: 55 Credit: 3,542,733 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Your CPU overclock is a bit aggressive. Does it pass Linpack/Intel Burn Test stress testing? I had issues with my overclock that only showed in GPU apps for some reason. Double check your system stability if the SLi switch doesn't do anything for you. You need to be able to run FurMark and Intel Burn Test at the same time for 10-15 minutes without errors in either. Errors may not show in games or other BOINC apps, but it'll show in the CUDA apps. I had to throw a little more voltage at my CPU to get the whole thing 100% stable. C2Q, GTX 660ti |
|
Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank you Jeremy, my system is not as much overclocked than displayed (3.6Ghz), and all is tested "rockstable" since a lot of time. I did CPU burn tests, Gpu Burn tests, memory tests. All seems to be ok OCed or not, same computation errors with non overclocked system. I just don't understand what happen, never had this problem before switching from vista 64 to seven 64. Maybe Gpu drivers related, i just don't know. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Perhaps one of your settings has changed, or needs to change? Check that you have disabled "Use GPU When Computer is in Use", and make sure you use "Leave Applications in Memory While Suspended". - Boinc Manager, Advanced View, Advanced Preferences, Processor Usage, and then Disk and Memory Usage. Most of your failures are with Beta tasks, and many are after a short time, so it looks worse than it is. Are you running any other GPU tasks? If so, stick to one at a time (Especially for Beta tests)! You might even want to disable Betas in your online preferences. One last thing to try. Set Boinc to use 75% of the CPU's rather than 100% (only applicable if you are performing CPU tasks as well as GPUGrid tasks). Good Luck, |
ZydorSend message Joined: 8 Feb 09 Posts: 252 Credit: 1,309,451 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I noticed you have 196.21 loaded. NVidia acknowledged a bug in 196.21 within days of release, that prevented overclocking - it certainly stopped my 9800GTX in its tracks, although it did not seem to affect everyone, it was (is) widespread. It maybe that some sideffect of that is causing you issues, pure guess of course, but given the nature of the bug affecting o/c, its not impossible. They released 196.34 days afterwards, a Beta release, but the only change being to fix the o/c bug 196.21 WQHL driver. Maybe worth a shot. Regards Zy |
|
Send message Joined: 15 Feb 09 Posts: 55 Credit: 3,542,733 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just to clarify, that bug only affects software overclocks. If the overclock is burned into the BIOS the card won't be affected by that bug. C2Q, GTX 660ti |
liveoncSend message Joined: 1 Jan 10 Posts: 292 Credit: 41,567,650 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]()
|
I must agree that 3.6Ghz is a bit agressive on a Q6600. Especially for 24/7 use. I only have mine up to 3.0Ghz. I've OC'd Q6600 for 24/7 up to 3.2Ghz, but unless UR liquid, I don't see how it's going to be a good idea to OC so much & the 9800 GX2. I got errors & lots of them, when I OC'd a 8800GT to more than Core Clock 700MHz (vs. 600MHz standard) Shader Clock 1728MHz (vs. 1500MHz standard) & 2000MHz (vs. 1800MHz standard).
|
|
Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thank You all for answers. Liveonc, my GPU isn't OCed, and my CPU is very well watercooled with 1st choice watercooling parts. With OCCt test that makes your CPU really burn (lingo test i think), my CPU T° stays, for the hotest core, at 57°C, wich is very well under such burning test. I really don't think that's it's CPU related, but i made a new machine for test purpose, i investigate where is the problem. At this moment, i can say that my doubts about graphic drivers issue under Seven 64 seem to be the problem. I reinstall a ghost of vista and no more errors with this OS. |
|
Send message Joined: 27 Oct 08 Posts: 27 Credit: 3,211,916 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I found the answer to my question about such compute errors. Lesson carefully owners of double gpu video cards. The problem seems to be drivers related. GPU compute apps and boinc detect two gpu but seem to give same gpu two jobs, this makes crash one of the two jobs. To prevent such crash, go to nvidia panel and deactive Sli switch, then, Boinc still recognize two GPUs and driver will give each job on each gpu chip instead of two jobs on same gpu. |
|
Send message Joined: 25 Oct 08 Posts: 42 Credit: 42,812,268 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have always : 11/03/2010 16:00:16 GPUGRID Output file g106-TONI_CAPBIND99SB-16-100-RND9786_0_1 for task g106-TONI_CAPBIND99SB-16-100-RND9786_0 absent |
|
Send message Joined: 4 Sep 08 Posts: 44 Credit: 3,685,033 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
I have the same problem here, but starting with collartz wu. I have two GPU (GTX260, 8800GT and so no SLI) and I get this error for all GPU-tasks from every project. It seems that BOINC coudn´t find a CUDA device. This happened without any interaction with the computer (user was miles away :-)). Is it BOINC- or driver related? A reboot will solve the problem! BOINC 6.10.36 NVIDIA 196.21 WIN7 64 Bit |
|
Send message Joined: 15 Feb 10 Posts: 9 Credit: 16,891,220 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]()
|
Hi. I was told to post here, but ive seen many other people in this forum who have had similar problems and aparently never got it resolved... Every work unit that downloads fails in 10 seconds or less. None of my other cuda apps crash or fail, and I have tried just about every version of my video card driver that I can find--notta. If anyone has any ideas, PLEASE let me know. Here are my system specs: OS : Windows 7/Windows Server 2008 R2 Version 6.1 Build 7600 Number of processors : 2 Processor type : x86 Intel Pentium, Level 6, Revision 5898 CPU speed: 2898 MHz Total physical memory: 4124988 KB Available physical memory: 3037680 KB Total virtual memory: 2097024 KB Available virtual memory: 1940944 KB Number of CUDA devices : 1 Device 0 : GeForce 9800 GT |
GDFSend message Joined: 14 Mar 07 Posts: 1958 Credit: 629,356 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() |
Try to install the 196.75 driver. gdf |
|
Send message Joined: 24 Dec 08 Posts: 738 Credit: 200,909,904 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had a couple die on me this morning. # There are 2 devices supporting CUDA # Device 0: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 30 # Number of cores: 240 # Device 1: "GeForce GTX 295" # Clock rate: 1.24 GHz # Total amount of global memory: 939524096 bytes # Number of multiprocessors: 30 # Number of cores: 240 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Cuda driver error 3e7 in file '../swan/swanlib_nv.cpp' in line 186. GDF have you guys tried the 197.13 drivers? Is there any point in updating to them? Currently I am running 196.21. Now the NDA with nvidia has expired is it possible to use the cuda 3.0 DLL's in the faint hope they will fix something? BOINC blog |
©2026 Universitat Pompeu Fabra