Advanced search

Message boards : Graphics cards (GPUs) : Awful lot of xxx__cuda.exe has stopped working

Author Message
MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17232 - Posted: 23 May 2010 | 6:17:25 UTC

I've been getting an awful lot of these since the switch to the 6.05 app. On 2 different machines too. Mainly 6.05 but have also been getting 6.72 tasks as well. All machines have 196.21 drivers under Win7 x64. Interestingly the GTX275 seems immune to these, only the GTX295's seem to be getting them.

Both suggest a bug with the code/driver/hardware combination.

From 6.05's I get the following errors:

The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# Using device 1
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel [mshake_position_kernel_1] [2] [10,1,1][64,1,1]
Assertion failed: 0, file swanlib_nv.cpp, line 281


For the 6.72's I tend to get the following, but I also sometimes get the same as the 6.05's
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 1
# There are 2 devices supporting CUDA
# Device 0: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce GTX 295"
# Clock rate: 1.24 GHz
# Total amount of global memory: 939524096 bytes
# Number of multiprocessors: 30
# Number of cores: 240
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.

____________
BOINC blog

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17234 - Posted: 23 May 2010 | 15:47:30 UTC - in response to Message 17232.
Last modified: 23 May 2010 | 16:03:34 UTC

This error appears many times for your, my and everyone else's failing cards (when running 6.72 tasks):
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
called boinc_finish

For 6.05 failures you are getting this Error:
Assertion failed: 0, file swanlib_nv.cpp, line 281
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.


I was getting the same 6.72 errors for my GTX260 on Win7, so I pulled the card from the project. It had been fine for 6.03 tasks, but I could not find a driver that worked for any 6.72 tasks or the 6.05 tasks.

Some errors (6.72) seem to be RAM error variants from the earlier errors that appeared for 6.72 tasks, and were corrected using driver 196.21 for Vista, for example. So some drivers currently work for some of these tasks but not others.

I would add operating system to your code/driver/hardware possible error combo.
XP seems to be more stable than Vista or Win7.

For people with one GPU this is a matter of just trying different drivers until they find one that works for All tasks, but when you have several GPUs on several platforms it is more difficult (very slow).
Mark, at least you are using the same operating system across your systems.

By the way your GTX275 is using a different driver (197.13), but I don’t think that means this is a good driver for all cards, just that exact card by the exact manufacturer.

I would suggest you try to go back as far as a 195.xx driver for GT240 cards under Win7.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17255 - Posted: 24 May 2010 | 12:33:56 UTC
Last modified: 25 May 2010 | 12:36:03 UTC

Well today they all seem to work. I haven't changed anything, so maybe just a bad batch of wu.

I was holding off on a driver upgrade, waiting for cuda 3.1 to be released. Then it might be worthwhile updating for some stability improvements if nothing else.

[edit]Spoke too soon. A couple failed after posting this[/edit]
____________
BOINC blog

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17290 - Posted: 25 May 2010 | 12:42:01 UTC

Well upgraded one of GTX295's to 197.45 drivers. Its more stable, but still gets errors. The GTX295 still running 196.21 drivers just had 4 fail in a row (complete with popups). Nope make that 6.

Anyone tried the 257 drivers? They are beta ones and seem to be where nvidia is going after the 197 series. Apparently they support cuda 3.1 as well.
____________
BOINC blog

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17296 - Posted: 25 May 2010 | 14:40:49 UTC - in response to Message 17290.
Last modified: 25 May 2010 | 15:19:42 UTC


Anyone tried the 257 drivers? They are beta ones and seem to be where nvidia is going after the 197 series. Apparently they support cuda 3.1 as well.


Yes.
Tried it on my GTX260 (Win 7 x64, Boinc 6.10.56). No change on that card; still fails tasks after about 9sec. It was failing all 6.05 and all 6.72 tasks. Up to a week or so ago it was doing well on the 6.03 WU's.

Also tried it on a GTX470 Win XP x86 SP3.
Works fine on that card.

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17332 - Posted: 26 May 2010 | 12:11:34 UTC - in response to Message 17296.
Last modified: 26 May 2010 | 12:12:33 UTC


Anyone tried the 257 drivers? They are beta ones and seem to be where nvidia is going after the 197 series. Apparently they support cuda 3.1 as well.


Yes.
Tried it on my GTX260 (Win 7 x64, Boinc 6.10.56). No change on that card; still fails tasks after about 9sec. It was failing all 6.05 and all 6.72 tasks. Up to a week or so ago it was doing well on the 6.03 WU's.

Also tried it on a GTX470 Win XP x86 SP3.
Works fine on that card.


It seems (in my opinion) to be bugs with the app rather than the hardware or driver. Hopefully the guys can track it down and fix it, Gets rather annoying having all the tasks fail and the popups every time.
____________
BOINC blog

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 17338 - Posted: 26 May 2010 | 13:19:57 UTC - in response to Message 17332.

It seems (in my opinion) to be bugs with the app rather than the hardware or driver. Hopefully the guys can track it down and fix it, Gets rather annoying having all the tasks fail and the popups every time.

It is why I stopped running GPU Grid on the GTX295 card ... running all MW on it now ... only run GPU Grid on the GTX260 which still seems to run the tasks just fine ...

Siegfried Niklas
Avatar
Send message
Joined: 23 Feb 09
Posts: 39
Credit: 144,654,294
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 17343 - Posted: 26 May 2010 | 14:29:06 UTC

The "Never Ending Story" continues.

My GTX295 doing well for days (weeks) but immediately the fault series starts.

I analyse my "Everest" logs - no temperature problems.
Doing restarts, change clocking, change drivers, change BM-version, change from xp32-host to vista64-host (and back) - the fault series continues.

Ok - I say to myself - it's a faulty WU-series.

But one look at the long series of valid WUs - crunched by my two high overclocked GTX260 - and I'm back at square one.

Wearily fell asleep and next day the GTX295 doing well again for days (weeks).

It's GPUGrid crunchers KARMA...

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17424 - Posted: 29 May 2010 | 1:45:59 UTC

I noticed that the GTX295's would always fail the 2nd wu. Turning off SLI (or as they now call it under 197.45 drivers - Multi-GPU mode) fixes it. That accounts for why my GTX275 seemed immune.

This all seems to have started with the 6.05/6.72 apps. Prior to that everything was fine. Upgrading drivers (I was using 196.21, now on 197.45) doesn't make any difference.
____________
BOINC blog

Profile Mad Matt
Send message
Joined: 29 Aug 09
Posts: 28
Credit: 101,584,171
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 17632 - Posted: 16 Jun 2010 | 9:23:41 UTC - in response to Message 17424.

I noticed that the GTX295's would always fail the 2nd wu. Turning off SLI (or as they now call it under 197.45 drivers - Multi-GPU mode) fixes it. That accounts for why my GTX275 seemed immune.


Cheers for the hint, that did the trick for me as well. Additionally it seems to me that WUs are running slightly faster now. Could anyone confirm or deny this observation? Last not least: could you please add this to an FAQ? I just was happy to stumble upon the information here.

Host: http://www.gpugrid.net/results.php?hostid=73368



____________

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17636 - Posted: 16 Jun 2010 | 12:29:15 UTC - in response to Message 17632.

How can i put it down with the new driver 257.21 for the gtx295?
The choice is automatical or gtx295 A or B or cpu!
Multi-gpu mode is on or off, but after changing it is allways on.

With the old drivers it works also better for seti@home!


____________
Ton (ftpd) Netherlands

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17637 - Posted: 16 Jun 2010 | 14:00:30 UTC - in response to Message 17636.

Ton, I see you upgraded all your cards to the latest driver.
Hopefully we will soon see an improvement in performance for the Fermi cards (CUDA 3010), but I am not sure you will see any performance gain in the non Fermi cards. If you cannot configure the GTX295 to work in non-sli mode, you might want to roll back the driver, but perhaps you can disable Sli in NVidia Control Panel.
New Driver, new problems!

Good luck,

Profile Mad Matt
Send message
Joined: 29 Aug 09
Posts: 28
Credit: 101,584,171
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 17644 - Posted: 16 Jun 2010 | 22:22:58 UTC - in response to Message 17636.

How can i put it down with the new driver 257.21 for the gtx295?
The choice is automatical or gtx295 A or B or cpu!
Multi-gpu mode is on or off, but after changing it is allways on.

With the old drivers it works also better for seti@home!


Using 197.45 here on XP-64. Everything is running perfectly.But I guess you need some other drivers because of a Fermi present?
____________

Post to thread

Message boards : Graphics cards (GPUs) : Awful lot of xxx__cuda.exe has stopped working

//