Message boards :
Number crunching :
Abrupt computer restart - Tasks stuck - Kernel not found
Message board moderation
Previous · 1 · 2 · 3 · 4
| Author | Message |
|---|---|
(retired account)Send message Joined: 22 Dec 11 Posts: 38 Credit: 28,606,255 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I will try to run some other projects now in SP mode on the Titan to see if the card and the nvidia driver installation is still fine. Collatz seems to run fine on the Titan with heavy load through config file. Nothing validated yet, but no obvious errors. Will try to catch a new v8.15 short run. EDIT: Got one. Looks good so far, now at 25%. http://www.gpugrid.net/workunit.php?wuid=4978432 I will also test if the same problem occurs with the GT 650M card. Not yet. A v8.14 short runs fine and is at 25% now. |
(retired account)Send message Joined: 22 Dec 11 Posts: 38 Credit: 28,606,255 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Will try to catch a new v8.15 short run. I699-SANTI_baxbimSPW2-12-62-RND9134 Nope, another failure. Sudden reboot at 43%. After restart some POEM OpenCL kicked in, hence the nvidia driver and the GPUGRID workunit had no chance this time to crash and I could suspend the workunit in question. The WUProp workunit was killed again, too. This shows at least, that WUProp is killed by the sudden reboot not by the video driver crashing (makes sense to me). If you would like to receive the content of the slot, pls. PM an email address. EDIT: The two POEM workunits finished ok. No indication of a hardware fault or faulty driver, yet. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I'm also using 331.40 (which is a Beta). Probably worth updating to 331.82 (the most recent WHQL driver), but for me this wasn't happening at the beginning of last week or before that (same drivers). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
(retired account)Send message Joined: 22 Dec 11 Posts: 38 Credit: 28,606,255 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
but for me this wasn't happening at the beginning of last week or before that (same drivers). Yes, same here. Might consider an update, though... |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I did the suggested update and I'm still getting stung. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 1 Dec 12 Posts: 24 Credit: 60,122,950 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Something else, or maybe the same... I was watching my taskslist, and I find out now that after the BSOD/install older driver all new tasks do have a shorter CPU time. Before I had mostly CPU runtimes of 9.000-10.000 seconds and now it is again more like normal: 2.000-3.000 seconds. ======================================= I'm back now to my computer and I have searched in the logfiles for the drivercrash and it says from 5 December: Can not find the description of Event ID 1 from source NvStreamSvc. The component that started the event may not be installed on the local computer or the installation is corrupted. You can install the component on the local computer or restore. The computer restarts after a bug check. The bug check is 0x00000116 (0xfffffa801270d010, 0xfffff88006940010, 0x0000000000000000, 0x000000000000000d). A dump was saved in: C: \ Windows \ Minidump \ 120513-6286-01.dmp. Report ID: 120513-6286-01. |
(retired account)Send message Joined: 22 Dec 11 Posts: 38 Credit: 28,606,255 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Thanks, skgiven, for sharing the info. I guess I will refrain from it then, at least for the time being. after the BSOD/install older driver all new tasks do have a shorter CPU time. Is this v314.22 you are using now, FoldingNator? Unfortunately us Titan/GTX 780 users have to stick with v331.40 or higher I'm afraid. |
|
Send message Joined: 1 Dec 12 Posts: 24 Credit: 60,122,950 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]()
|
Yes you're right! Before I had installed v320.18. and after the clean install of a new driver I've choosen for the always stable driver (for GF Fermi 400-500 cards) v314.22. I don't know whether the lower cpu times are coming by the other driver or it's only a coincidence. Maybe bad news for users with recent high-end cards. :( |
|
Send message Joined: 26 Aug 11 Posts: 100 Credit: 2,889,109,686 RAC: 424,927 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Just suffered the effects of this bug today after a power failure, but it only affected my Win 7 Pro 64 machine, my WIN XP Home 32 restarted the tasks OK. |
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Power glitch caused fatal gpugrid restarts on 5 systems a few hours ago. This is a PITA. These systems are headless and when I bring a monitor to fix the problem (reset of gpugrid) the BM program can be off the edge of the screen and by the time I get it down to where I can select gpugrid and reset the project, the damn display has reset 3 times and hung up and I start the process all over again. I tried setting BAM! to where all the gpugrid projects are suspended but there is a timeing problem and there are 2 systems that start gpugrid before BM realize it was supposed to suspend the project. This is really crappy but rather than complain anymore and I am just going to switch to prime and check this thread occasionally to see if the problem has been fixed. Once I get thru the cold spell, probably march, there should not be an more power glitches and I can put gpugrid back online. Maybe someone here can some up with a script to reset the project following reboot on power out. Windows knows when the power goes out so there should be some API or whatever that gpugrid could use. |
|
Send message Joined: 11 Oct 08 Posts: 1127 Credit: 1,901,927,545 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Beemer: The problem is fixed on the Short Queue, with v8.15, I believe. It has not yet been moved to the Long Queue. You might be able to edit your GPUGrid preferences, to only do Short-Queue, for now. |
|
Send message Joined: 26 Jun 09 Posts: 815 Credit: 1,470,385,294 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you put a not to expensive UPS behind it, then you can safely switch the rigs down on a power outage (if you are near the PC's)? Greetings from TJ |
|
Send message Joined: 11 Jul 09 Posts: 1639 Credit: 10,159,968,649 RAC: 2 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
If you put a not to expensive UPS behind it, then you can safely switch the rigs down on a power outage (if you are near the PC's)? If you put a quality UPS behind it, it will come with software that can switch the rigs down safely whether you are nearby or not. |
|
Send message Joined: 16 Mar 11 Posts: 509 Credit: 179,005,236 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Check out Cyber Power brand UPS. They're more reasonably priced than the APC brand and they even provide an app for Linux that has options to (I assume their Windows app is even better): 1) on power failure send apps the shutdown signal, then wait a few secs then shutdown 2) not shutdown immediately because the power might return very soon so wait until the backup battery approaches minimum operational voltage then shutdown gracefully 3) not shutdown immediately rather run a user designated script that might, for example, suspend power hungry apps like BOINC then wait until the battery approaches minimum operational voltage before shutting down, send the admin an email, send the power company a nasty email, whatever you want the script to do 4) when power returns run a second script that could, for example, resume power hungry apps like BOINC, send the admin an email saying the power has returned UPS saves a lot of grief, highly recommended. BOINC <<--- credit whores, pedants, alien hunters |
|
Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Check out Cyber Power brand UPS. They're more reasonably priced than the APC brand and they even provide an app for Linux that has options to (I assume their Windows app is even better): I like the CyberPower "Pure sine wave" series, which gives a better sine wave output than the others, which give only a stepped-sine wave approximation. The latter can cause trouble with some of the new high-efficiency PC power supplies that have power-factor correction (PFC). I have just replaced an APC UPS 750 with a "CyberPower CP1350PFCLCD UPS 1350VA/810W PFC compatible Pure sine wave" (my second one), since the APC causes an occasional alarm fault with the 90% efficient power supply in that PC. |
JStatesonSend message Joined: 31 Oct 08 Posts: 186 Credit: 3,578,903,157 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
I have a small APC that runs the cable modem, switch and WIFI but it cannot be used with the AC powerline ethernet adapter as it filters out the ethernet signal. My systems are not all in one place where there could be serviced by a single backup. I am in the middle of installing splashtop streamer on all system. Supposedly there is a limit of 5 systems but I have gone past that and it has not complained. While that gives me access to the desktop while CUDA is running, it does not provide a solution to resetting the project before the work unit causes a crash. If all the systems had honored the suspension, then I could easily command BoincTasks to reset and resume this project on all systems. I think boinc.exe should have noticed the suspension request that I put at BAM! and done that before starting the project. I will ask this at the boinc forum. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This thread was regarding a specific problem, as detailed in the first post. The problem was a bug in the 8.14 version of the app. The problem was fixed with the 8.15 version of the app, so the thread has been closed. FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2026 Universitat Pompeu Fabra