Message boards :
Graphics cards (GPUs) :
All WU fail after resuming computation
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 16 Aug 12 Posts: 2 Credit: 2,257,335 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
Hi, i have upgraded my PC to Debian 7 wheezy, and i have installed the nvidia proprietary driver for GTX 560. After this, all WU start correctly, but if they are suspended fail to restart with this output: <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> MDIO: cannot open file "output.restart.coor" SIGSEGV: segmentation violation Stack trace (12 frames): ../../projects/www.gpugrid.net/acemd.2868(boinc_catch_signal+0x4d)[0x56709d] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7f44fa258030] /lib/x86_64-linux-gnu/libc.so.6(fwrite+0x34)[0x7f44f950b034] ../../projects/www.gpugrid.net/acemd.2868[0x47f9c7] ../../projects/www.gpugrid.net/acemd.2868[0x4813a0] ../../projects/www.gpugrid.net/acemd.2868[0x492d74] ../../projects/www.gpugrid.net/acemd.2868[0x47f18a] ../../projects/www.gpugrid.net/acemd.2868[0x422c27] ../../projects/www.gpugrid.net/acemd.2868[0x408c04] ../../projects/www.gpugrid.net/acemd.2868[0x407bc9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7f44f94c0ead] ../../projects/www.gpugrid.net/acemd.2868[0x407a39] Exiting... </stderr_txt> ]]> Other project using GPU work correcty. Do you have any idea? |
Carlesa25Send message Joined: 13 Nov 10 Posts: 328 Credit: 72,619,453 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Hello: Have you checked that is not marked - Leave aplicacionbes in memory to adjourn - usually cause problems. Greetings. |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Marco, I agree that LAIM should be on and use GPU when system is in use should also be on. What driver is it and are the recommended lib files installed? Don't know all the details, but I suggest you do two things, Use fan control settings to set the fan speed and reduce the GPU temperature. Don't use all 8 CPU threads to crunch on, use 7 at most. The following FAQ might help, http://www.gpugrid.net/forum_thread.php?id=2123&nowrap=true#20169 FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
|
Send message Joined: 17 Aug 08 Posts: 2705 Credit: 1,311,122,549 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
"Leave applications in memory" should not apply to GPU tasks anyway, they're always exited to avoid problems. But the driver version could cause such issues. MrS Scanning for our furry friends since Jan 2002 |
|
Send message Joined: 16 Aug 12 Posts: 2 Credit: 2,257,335 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]()
|
I suspended the project until the release of a new version of the driver... |
|
Send message Joined: 5 Mar 13 Posts: 348 Credit: 0 RAC: 0 Level ![]() Scientific publications ![]() |
Is this maybe related to this problem? http://www.gpugrid.net/forum_thread.php?id=3333 |
skgivenSend message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level ![]() Scientific publications ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Possibly the same thing that is triggering the driver restart issue on Windows but the Win phenomenon seen from Vista onwards is WDDM related. Marco is using Linux - Debian 7 wheezy. Don't know what driver he is using on his system, as its not reported on Linux rigs. My guess is missing libs, bad driver or an upgrade issue (but from what to what I don't know, possibly to 7.1 as it came out 15th June). The SIGSEGV: segmentation violation suggests an access/security issue but it could be lots of things causing this, including hardware. Might be worth checking the user and Boinc has the correct folder security (read and write). FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help |
©2025 Universitat Pompeu Fabra