Abrupt computer restart - Tasks stuck - Kernel not found

Message boards : Number crunching : Abrupt computer restart - Tasks stuck - Kernel not found
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33398 - Posted: 7 Oct 2013, 17:26:16 UTC - in response to Message 33397.  
Last modified: 7 Oct 2013, 17:27:09 UTC

I believe your issue is a separate issue.

Mine occurs as outlined in the first post within this thread:
If a GPUGrid task is in the middle of being processed, and BOINC is shutdown abnormally (like a power outage, or the computer froze without user issuing the shutdown command)...

Then when the computer/BOINC/task restarts, it can get into a loop where it crashes the driver, tries to start again (I see the "elapsed" time back off a few seconds indicating it is retrying), crash the driver again, etc. etc. It keeps crashing the driver until I abort the task. It does not affect other tasks.

I've captured a copy of the data directory when this was happening, and submitted some files to MJH, to hopefully figure out what is happening.

If you have a different issue, please consider opening a separate thread.

Thanks,
Jacob
ID: 33398 · Rating: 0 · rate: Rate + / Rate - Report as offensive
wiyosaya

Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33501 - Posted: 14 Oct 2013, 18:54:09 UTC

Was there a resolution to this?

I ran several WUs this past weekend on my 580 machine, which is the one that had the problem, and I did not see this issue again.
ID: 33501 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33502 - Posted: 14 Oct 2013, 19:52:40 UTC - in response to Message 33501.  

There has been no recent contact from MJH, and so no resolution.

I believe the issue only happens when the computer (running a GPUGrid.net task) is interrupted (or freezes completely) without being able to shutdown cleanly. I haven't seen it happen recently, because I usually shutdown/restart normally, instead of an abrupt power shutoff.

Regards,
Jacob
ID: 33502 · Rating: 0 · rate: Rate + / Rate - Report as offensive
wiyosaya

Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33504 - Posted: 15 Oct 2013, 2:20:52 UTC

Thanks. I had no problems this past weekend. However, I did not experience any abnormal shutdowns or freezes.
ID: 33504 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33520 - Posted: 16 Oct 2013, 18:19:43 UTC

Are we still collecting these? I had a sticking task - multiple driver restarts after a forced reboot - with 23x6-SANTI_RAP74wtCUBIC-18-34-RND6543_0

The std_err txt follows: I'll preserve the rest of the slot contents before aborting the task, in case anyone wants them.

# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
# GPU 0 : 74C
# GPU 1 : 55C
# GPU 0 : 75C
# GPU 1 : 56C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# GPU 0 : 79C
# GPU 1 : 57C
# GPU 0 : 80C
# GPU 0 : 81C
# GPU 1 : 58C
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
ID: 33520 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33521 - Posted: 16 Oct 2013, 18:22:29 UTC - in response to Message 33520.  

I sent MJH some files, but haven't heard from him :/
ID: 33521 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33524 - Posted: 16 Oct 2013, 21:37:25 UTC

And it's just happened again, this time with potx108-NOELIA_INS1P-0-14-RND5839_0

# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 1	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:08:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
# GPU 0 : 76C
# GPU 1 : 56C
# GPU 1 : 57C
# GPU 1 : 58C
# GPU 1 : 59C
# GPU 1 : 60C
# GPU 1 : 61C
# GPU 0 : 77C
# GPU 1 : 62C
# GPU 1 : 63C
# GPU 0 : 78C
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
22:21:16 (5824): Can't acquire lockfile (32) - waiting 35s
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert 0
# GPU [GeForce GTX 670] Platform [Windows] Rev [3203] VERSION [55]
# SWAN Device 0	:
#	Name		: GeForce GTX 670
#	ECC		: Disabled
#	Global mem	: 2048MB
#	Capability	: 3.0
#	PCI ID		: 0000:07:00.0
#	Device clock	: 1084MHz
#	Memory clock	: 3054MHz
#	Memory width	: 256bit
#	Driver version	: r331_00 : 33140

I seem to see similarities in

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1963.
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.

in both reports. And in both cases, the first error occurs after the first restart.

Interestingly, this was running in the same slot directory as the previous one (slot 4), and part of my bug report to BOINC (apart from the non-report of stderr_txt) was that the slot directory wasn't cleaned after an abort. I'll make sure that's done properly before I risk another one.
ID: 33524 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33527 - Posted: 16 Oct 2013, 22:11:09 UTC - in response to Message 33521.  

Sorry guys, I've been (and still am) very busy. Jacob, thanks for the files, they were useful and I know how to fix the problem. Unfortunately, I'll not have opportunity to do any more work on the application for a while. Will keep you posted.

MJH
ID: 33527 · Rating: 0 · rate: Rate + / Rate - Report as offensive
zombie67 [MM]

Send message
Joined: 16 Jul 07
Posts: 209
Credit: 5,616,860,456
RAC: 313,890
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33543 - Posted: 18 Oct 2013, 15:31:38 UTC

I have also been experiencing this problem. Over the past several weeks at least. Also glad to see the cause has been identified by the project. Now just waiting for a fix.

FWIW, this is the trick I use to be able to get to the BOINC GUI controls before crashing. I add this line to the cc_config.xml:

<cc_config>
<options>
<start_delay>60</start_delay>
</options>
</cc_config>

"Specify a number of seconds to delay running applications after client startup. List-add.pngNew in 6.1.6"

No fiddling with safe mode or any of that.

http://boinc.berkeley.edu/wiki/Client_configuration

Reno, NV
Team: SETI.USA
ID: 33543 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33621 - Posted: 26 Oct 2013, 7:41:06 UTC
Last modified: 26 Oct 2013, 7:44:08 UTC

Suspect I had the same problem: Driver resetting in loop eventually blue screen and memory dump. Managed to stop the gpu and spotted MD5 checksum error message associated with some gpugrid logo png file. Probably more to it than a bad logofile download so I reset the project and stopped future work. Problem disappeared on this gtx570 system. Other systems are running gpugrid ok.

Upgraded from 327 to 331 drivers before deciding to reset the project.


EDIT - JUST REALIZED I HAD A POWER OUTAGE RIGHT BEFORE THE PROBLEM.
ID: 33621 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33623 - Posted: 26 Oct 2013, 11:22:15 UTC - in response to Message 33621.  

The cause:
Happens when Windows is shutdown unexpectedly, ie: from freezing up, from user pulling plug, or from power outage.

The problem:
The driver resets continuously, GPUGrid tasks do not progress normally, and sometimes Windows will BSOD because of the driver resets.

The solution:
Find a way to abort any GPUGrid tasks that are causing the problem. If Windows gives you enough time to stop BOINC when you login, then do that. Stop/suspend BOINC, abort the GPUGrid tasks, restart/resume BOINC. If Windows doesn't give you enough time, then utilizing the <start_delay> option in cc_config.xml is a good choice, but you would have to start in safe mode (to prevent BOINC from starting) in order to create/edit that file, then start in regular mode, and while BOINC is in the startup delay, stop/suspend BOINC, abort the GPUGrid tasks, restart/resume BOINC.

This is a GPUGrid problem, and I hope MJH fixes it!
He says he knows how to, it's just a matter of his limited time.

Regards,
Jacob
ID: 33623 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Operator

Send message
Joined: 15 May 11
Posts: 108
Credit: 297,176,099
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33624 - Posted: 26 Oct 2013, 12:59:18 UTC - in response to Message 33623.  


The solution:
If Windows doesn't give you enough time, then utilizing the <start_delay> option in cc_config.xml is a good choice, but you would have to start in safe mode (to prevent BOINC from starting) in order to create/edit that file, then start in regular mode, and while BOINC is in the startup delay, stop/suspend BOINC, abort the GPUGrid tasks, restart/resume BOINC.



I have edited the cc_config file to include the startup delay and now that delay (60 seconds in my case) is initiated everytime I start BOINC up, whether I had a problem before it was shutdown or not.

So now I don't have to try and 'catch' BOINC to abort tasks, or go into safe mode or anything else. I can just abort tasks that I know will fail due to the power interruption issues I occasionally have to deal with here (mostly on my GTX590 box).

Operator
ID: 33624 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33705 - Posted: 1 Nov 2013, 14:47:57 UTC
Last modified: 1 Nov 2013, 14:52:41 UTC

My computer abruptly restarted a couple times today, and I had to deal with this problem again.

A "I505-SANTI_baxbim2-18-32" task got stuck into an infinite driver reset loop, and I had to suspend GPU to get to that task to abort it. A "23x5-SANTI_RAP74wtCUPIC-20-34" task did not get stuck in the loop, and so I didn't have to abort that one.

So... this is still an ongoing problem for me.
MJH?
ID: 33705 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile MJH

Send message
Joined: 12 Nov 07
Posts: 696
Credit: 27,266,655
RAC: 0
Level
Val
Scientific publications
watwat
Message 33769 - Posted: 4 Nov 2013, 10:58:50 UTC - in response to Message 33705.  

Jacob,

I'll probably get a fix for this problem out next week.

Matt
ID: 33769 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33772 - Posted: 4 Nov 2013, 14:00:46 UTC - in response to Message 33769.  

Jacob,
I'll probably get a fix for this problem out next week.
Matt


Thanks. I'm looking forward to the fix. And testing the fix should be fun too muaahahahahaha (don't get to yank power cord out of this machine very often!)
ID: 33772 · Rating: 0 · rate: Rate + / Rate - Report as offensive
mwgiii

Send message
Joined: 22 Jan 09
Posts: 8
Credit: 988,332,833
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33784 - Posted: 5 Nov 2013, 19:59:20 UTC

Please hurry MJH.
ID: 33784 · Rating: 0 · rate: Rate + / Rate - Report as offensive
ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33820 - Posted: 10 Nov 2013, 13:52:46 UTC - in response to Message 33772.  

And testing the fix should be fun too muaahahahahaha (don't get to yank power cord out of this machine very often!)

LOL! I recommend using a switch instead (power switch or at the PSU) as these are "debounced" (not sure this is the correct electrical engineering term.. sounds wrong).

It could also work to just kill BOINC via task manager - maybe try this before the fix is out :)

MrS
Scanning for our furry friends since Jan 2002
ID: 33820 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Chilean
Avatar

Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33841 - Posted: 12 Nov 2013, 6:50:25 UTC - in response to Message 33820.  

And testing the fix should be fun too muaahahahahaha (don't get to yank power cord out of this machine very often!)

LOL! I recommend using a switch instead (power switch or at the PSU) as these are "debounced" (not sure this is the correct electrical engineering term.. sounds wrong).

It could also work to just kill BOINC via task manager - maybe try this before the fix is out :)

MrS


You got it right.
ID: 33841 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [PUGLIA] Riccardo

Send message
Joined: 27 Feb 12
Posts: 2
Credit: 3,410,838
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 33845 - Posted: 12 Nov 2013, 16:58:27 UTC
Last modified: 12 Nov 2013, 16:59:49 UTC

Exactly the same for me.

3 SANTI WU corrupted after a power outage (and about to be dismissed PSU!!!)

Actually are the 7443155, 7456552 and 7457465 of my current WUs: http://www.gpugrid.net/results.php?hostid=155107

Drivers crashing and Win7 rebooting until I've been so fast to suspend work and abort GPUGRID's wus
Mio Dio, รจ pieno di stelle!
ID: 33845 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 33903 - Posted: 16 Nov 2013, 11:25:06 UTC - in response to Message 33845.  

I didn't have a power outage, but the computer did restart (the WU's caused the system to restart).
On reboot the driver kept crashing while trying to run the same tasks.

43x1-SANTI_RAP74wtCUBIC-22-34-RND5480_0

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1963.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 33903 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Abrupt computer restart - Tasks stuck - Kernel not found

©2026 Universitat Pompeu Fabra