Abrupt computer restart - Tasks stuck - Kernel not found

Message boards : Number crunching : Abrupt computer restart - Tasks stuck - Kernel not found
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile (retired account)

Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34155 - Posted: 8 Dec 2013, 16:25:11 UTC - in response to Message 34153.  
Last modified: 8 Dec 2013, 17:04:44 UTC

I will try to run some other projects now in SP mode on the Titan to see if the card and the nvidia driver installation is still fine.


Collatz seems to run fine on the Titan with heavy load through config file. Nothing validated yet, but no obvious errors. Will try to catch a new v8.15 short run.

EDIT: Got one. Looks good so far, now at 25%. http://www.gpugrid.net/workunit.php?wuid=4978432

I will also test if the same problem occurs with the GT 650M card.


Not yet. A v8.14 short runs fine and is at 25% now.
ID: 34155 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile (retired account)

Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34156 - Posted: 8 Dec 2013, 17:49:51 UTC - in response to Message 34155.  
Last modified: 8 Dec 2013, 18:23:49 UTC

Will try to catch a new v8.15 short run.

EDIT: Got one. Looks good so far, now at 25%. http://www.gpugrid.net/workunit.php?wuid=4978432


I699-SANTI_baxbimSPW2-12-62-RND9134

Nope, another failure. Sudden reboot at 43%. After restart some POEM OpenCL kicked in, hence the nvidia driver and the GPUGRID workunit had no chance this time to crash and I could suspend the workunit in question. The WUProp workunit was killed again, too. This shows at least, that WUProp is killed by the sudden reboot not by the video driver crashing (makes sense to me).

If you would like to receive the content of the slot, pls. PM an email address.

EDIT: The two POEM workunits finished ok. No indication of a hardware fault or faulty driver, yet.
ID: 34156 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34159 - Posted: 8 Dec 2013, 20:11:18 UTC - in response to Message 34156.  

I'm also using 331.40 (which is a Beta). Probably worth updating to 331.82 (the most recent WHQL driver), but for me this wasn't happening at the beginning of last week or before that (same drivers).
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34159 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile (retired account)

Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34161 - Posted: 8 Dec 2013, 22:35:28 UTC - in response to Message 34159.  

but for me this wasn't happening at the beginning of last week or before that (same drivers).


Yes, same here. Might consider an update, though...
ID: 34161 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34163 - Posted: 9 Dec 2013, 0:45:05 UTC - in response to Message 34161.  

I did the suggested update and I'm still getting stung.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 34163 · Rating: 0 · rate: Rate + / Rate - Report as offensive
FoldingNator

Send message
Joined: 1 Dec 12
Posts: 24
Credit: 60,122,950
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 34164 - Posted: 9 Dec 2013, 1:42:18 UTC

Something else, or maybe the same...
I was watching my taskslist, and I find out now that after the BSOD/install older driver all new tasks do have a shorter CPU time.

Before I had mostly CPU runtimes of 9.000-10.000 seconds and now it is again more like normal: 2.000-3.000 seconds.

=======================================

I'm back now to my computer and I have searched in the logfiles for the drivercrash and it says from 5 December:

Can not find the description of Event ID 1 from source NvStreamSvc. The component that started the event may not be installed on the local computer or the installation is corrupted. You can install the component on the local computer or restore.

The following information is included in the event:
NvStreamSvc
NvVAD initialization failed [6]


The computer restarts after a bug check. The bug check is 0x00000116 (0xfffffa801270d010, 0xfffff88006940010, 0x0000000000000000, 0x000000000000000d). A dump was saved in: C: \ Windows \ Minidump \ 120513-6286-01.dmp. Report ID: 120513-6286-01.
ID: 34164 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile (retired account)

Send message
Joined: 22 Dec 11
Posts: 38
Credit: 28,606,255
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 34166 - Posted: 9 Dec 2013, 2:30:19 UTC - in response to Message 34164.  
Last modified: 9 Dec 2013, 2:31:01 UTC

Thanks, skgiven, for sharing the info. I guess I will refrain from it then, at least for the time being.

after the BSOD/install older driver all new tasks do have a shorter CPU time.


Is this v314.22 you are using now, FoldingNator? Unfortunately us Titan/GTX 780 users have to stick with v331.40 or higher I'm afraid.
ID: 34166 · Rating: 0 · rate: Rate + / Rate - Report as offensive
FoldingNator

Send message
Joined: 1 Dec 12
Posts: 24
Credit: 60,122,950
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwat
Message 34169 - Posted: 9 Dec 2013, 10:51:37 UTC - in response to Message 34166.  

Yes you're right!

Before I had installed v320.18. and after the clean install of a new driver I've choosen for the always stable driver (for GF Fermi 400-500 cards) v314.22. I don't know whether the lower cpu times are coming by the other driver or it's only a coincidence.

Maybe bad news for users with recent high-end cards. :(
ID: 34169 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TheFiend

Send message
Joined: 26 Aug 11
Posts: 100
Credit: 2,889,109,686
RAC: 424,927
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34172 - Posted: 9 Dec 2013, 17:05:34 UTC

Just suffered the effects of this bug today after a power failure, but it only affected my Win 7 Pro 64 machine, my WIN XP Home 32 restarted the tasks OK.
ID: 34172 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34216 - Posted: 11 Dec 2013, 22:52:11 UTC

Power glitch caused fatal gpugrid restarts on 5 systems a few hours ago. This is a PITA. These systems are headless and when I bring a monitor to fix the problem (reset of gpugrid) the BM program can be off the edge of the screen and by the time I get it down to where I can select gpugrid and reset the project, the damn display has reset 3 times and hung up and I start the process all over again.

I tried setting BAM! to where all the gpugrid projects are suspended but there is a timeing problem and there are 2 systems that start gpugrid before BM realize it was supposed to suspend the project.

This is really crappy but rather than complain anymore and I am just going to switch to prime and check this thread occasionally to see if the problem has been fixed. Once I get thru the cold spell, probably march, there should not be an more power glitches and I can put gpugrid back online.

Maybe someone here can some up with a script to reset the project following reboot on power out. Windows knows when the power goes out so there should be some API or whatever that gpugrid could use.
ID: 34216 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jacob Klein

Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34217 - Posted: 11 Dec 2013, 22:58:20 UTC - in response to Message 34216.  

Beemer:

The problem is fixed on the Short Queue, with v8.15, I believe.
It has not yet been moved to the Long Queue.

You might be able to edit your GPUGrid preferences, to only do Short-Queue, for now.
ID: 34217 · Rating: 0 · rate: Rate + / Rate - Report as offensive
TJ

Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34219 - Posted: 11 Dec 2013, 23:18:14 UTC - in response to Message 34216.  

If you put a not to expensive UPS behind it, then you can safely switch the rigs down on a power outage (if you are near the PC's)?
Greetings from TJ
ID: 34219 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Richard Haselgrove

Send message
Joined: 11 Jul 09
Posts: 1639
Credit: 10,159,968,649
RAC: 2
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34220 - Posted: 11 Dec 2013, 23:41:51 UTC - in response to Message 34219.  

If you put a not to expensive UPS behind it, then you can safely switch the rigs down on a power outage (if you are near the PC's)?

If you put a quality UPS behind it, it will come with software that can switch the rigs down safely whether you are nearby or not.
ID: 34220 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Dagorath

Send message
Joined: 16 Mar 11
Posts: 509
Credit: 179,005,236
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34222 - Posted: 12 Dec 2013, 4:20:46 UTC - in response to Message 34220.  

Check out Cyber Power brand UPS. They're more reasonably priced than the APC brand and they even provide an app for Linux that has options to (I assume their Windows app is even better):

1) on power failure send apps the shutdown signal, then wait a few secs then shutdown

2) not shutdown immediately because the power might return very soon so wait until the backup battery approaches minimum operational voltage then shutdown gracefully

3) not shutdown immediately rather run a user designated script that might, for example, suspend power hungry apps like BOINC then wait until the battery approaches minimum operational voltage before shutting down, send the admin an email, send the power company a nasty email, whatever you want the script to do

4) when power returns run a second script that could, for example, resume power hungry apps like BOINC, send the admin an email saying the power has returned

UPS saves a lot of grief, highly recommended.
BOINC <<--- credit whores, pedants, alien hunters
ID: 34222 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34228 - Posted: 12 Dec 2013, 9:07:10 UTC - in response to Message 34222.  

Check out Cyber Power brand UPS. They're more reasonably priced than the APC brand and they even provide an app for Linux that has options to (I assume their Windows app is even better):

I like the CyberPower "Pure sine wave" series, which gives a better sine wave output than the others, which give only a stepped-sine wave approximation. The latter can cause trouble with some of the new high-efficiency PC power supplies that have power-factor correction (PFC).

I have just replaced an APC UPS 750 with a "CyberPower CP1350PFCLCD UPS 1350VA/810W PFC compatible Pure sine wave" (my second one), since the APC causes an occasional alarm fault with the 90% efficient power supply in that PC.
ID: 34228 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile JStateson
Avatar

Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,578,903,157
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 34236 - Posted: 12 Dec 2013, 13:34:11 UTC
Last modified: 12 Dec 2013, 14:11:42 UTC

I have a small APC that runs the cable modem, switch and WIFI but it cannot be used with the AC powerline ethernet adapter as it filters out the ethernet signal. My systems are not all in one place where there could be serviced by a single backup.

I am in the middle of installing splashtop streamer on all system. Supposedly there is a limit of 5 systems but I have gone past that and it has not complained. While that gives me access to the desktop while CUDA is running, it does not provide a solution to resetting the project before the work unit causes a crash. If all the systems had honored the suspension, then I could easily command BoincTasks to reset and resume this project on all systems.

I think boinc.exe should have noticed the suspension request that I put at BAM! and done that before starting the project. I will ask this at the boinc forum.
ID: 34236 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 36445 - Posted: 19 Apr 2014, 16:49:05 UTC - in response to Message 36419.  
Last modified: 19 Apr 2014, 16:49:44 UTC

This thread was regarding a specific problem, as detailed in the first post.
The problem was a bug in the 8.14 version of the app.
The problem was fixed with the 8.15 version of the app, so the thread has been closed.
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help
ID: 36445 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Abrupt computer restart - Tasks stuck - Kernel not found

©2026 Universitat Pompeu Fabra