KASHIF_HIVPR Errors?

Author	Message
Fred J. Verster Send message Joined: 1 Apr 09 Posts: 58 Credit: 35,833,978 RAC: 0 Level Scientific publications	Message 18639 - Posted: 12 Sep 2010, 12:08:11 UTC - in response to Message 18637. Last modified: 12 Sep 2010, 12:21:14 UTC Since the 9800GTX+ started making 'trouble', like overheating, which resulted in faults, I first got a GTX470 which I traded for repairing an PII (Compaq). Then I could buy a 'show-model', from which I've seen it work. (All kinds of simulations), I bought it for €275 .(€485 normal+BTW) I found out that these 'monsters', need a 650W(minimal), 850W is better, PSU It draws 17A from it's 8 pin and 17A from it's 6 pin connectors and an additionel ~6 - 10A from the Mainboard. (ASUS P5E). Now I have to find a way to get the 470 to work.......... But I'm glad I made the change, for GPUGrid it's working like a charm and on SETI@Home, I now can run 3 MultiBeam's (0.04CPU+0.33GPU), at a time, so sometimes BOINC 6.10.58, 64BIT, runs 7 SETI tasks and/or a mix of Einstein and other project. I use driver 258.96 and CUDA 3.1. And it looks like those KASHIF_HIVPR WU's, need to have compute capabillity 2.0. (2.1?) Knight Who Says Ni N! ID: 18639 · Rating: 0 · rate: / Reply Quote

mwgiii Send message Joined: 22 Jan 09 Posts: 8 Credit: 988,332,833 RAC: 0 Level Scientific publications	Message 18663 - Posted: 13 Sep 2010, 23:48:11 UTC - in response to Message 18639. All of the KASHIF_HIVPR are generating errors on both of my machines. Out of the first two pages of my Tasks (40 work units), I have had 24 work units error out, all KASHIF_HIVPR. It is killing my contributions as ftpd said, the GPU crunching halts until I notice the error message. ID: 18663 · Rating: 0 · rate: / Reply Quote

skgiven Volunteer moderator Volunteer tester Send message Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level Scientific publications	Message 18664 - Posted: 14 Sep 2010, 0:13:36 UTC - in response to Message 18663. Probably best to do a system restart and then abort the download of any KASHIF_HIVPR tasks that you pick up. Hopefully you will pick up other work units. ID: 18664 · Rating: 0 · rate: / Reply Quote

mwgiii Send message Joined: 22 Jan 09 Posts: 8 Credit: 988,332,833 RAC: 0 Level Scientific publications	Message 18665 - Posted: 14 Sep 2010, 2:01:32 UTC - in response to Message 18664. I reboot around every other day. If I see anymore KASHIF, I will abort immediately. ID: 18665 · Rating: 0 · rate: / Reply Quote

ralle030583 Send message Joined: 19 Aug 10 Posts: 19 Credit: 830,540 RAC: 0 Level Scientific publications	Message 18693 - Posted: 15 Sep 2010, 18:45:12 UTC - in response to Message 18665. Last modified: 15 Sep 2010, 18:46:53 UTC seems also that all KASHIF.. task fail at my Geforce 9800 GT :-/ (ok currently evething failed cause a OC attemp, but KASHIF task didnt work before OC ^^) ID: 18693 · Rating: 0 · rate: / Reply Quote

zenitur Send message Joined: 25 Sep 10 Posts: 2 Credit: 285,845 RAC: 0 Level Scientific publications	Message 18781 - Posted: 29 Sep 2010, 10:17:57 UTC I have same error: http://www.gpugrid.net/result.php?resultid=3030293 http://www.gpugrid.net/result.php?resultid=3028306 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700] acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char, int3, int3, size_t, ...): Assertion `0' failed. SIGABRT: abort called Stack trace (17 frames): ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d] /lib/libc.so.6(+0x324c0)[0x7f4e7810d4c0] /lib/libc.so.6(gsignal+0x35)[0x7f4e7810d445] /lib/libc.so.6(abort+0x180)[0x7f4e7810e860] /lib/libc.so.6(__assert_fail+0xf1)[0x7f4e781064e1] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45feae] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x46032f] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45db09] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45b400] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f4e780f9d2d] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569] Exiting... </stderr_txt> ]]> <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce 9800 GT" # Clock rate: 1.50 GHz # Total amount of global memory: 536543232 bytes # Number of multiprocessors: 14 # Number of cores: 112 MDIO ERROR: cannot open file "restart.coor" SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [700] acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char, int3, int3, size_t, ...): Assertion `0' failed. SIGABRT: abort called Stack trace (14 frames): ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d] /lib/libc.so.6(+0x324c0)[0x7f1c49b544c0] /lib/libc.so.6(gsignal+0x35)[0x7f1c49b54445] /lib/libc.so.6(abort+0x180)[0x7f1c49b55860] /lib/libc.so.6(__assert_fail+0xf1)[0x7f1c49b4d4e1] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45d3f9] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b] /lib/libc.so.6(__libc_start_main+0xfd)[0x7f1c49b40d2d] ../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569] Exiting... </stderr_txt> ]]> Only on KASHIF tasks. TONI always work fine. ID: 18781 · Rating: 0 · rate: / Reply Quote

zenitur Send message Joined: 25 Sep 10 Posts: 2 Credit: 285,845 RAC: 0 Level Scientific publications	Message 18807 - Posted: 2 Oct 2010, 19:26:01 UTC - in response to Message 18781. I found a reason of my error. This is automatic suspend. After restart KASHIF tasks make an error. ID: 18807 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 20 Jul 08 Posts: 134 Credit: 23,657,183 RAC: 0 Level Scientific publications	Message 18916 - Posted: 11 Oct 2010, 12:30:47 UTC I just had this one wrecked: stderr out <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> # There is 1 device supporting CUDA # Device 0: "GeForce GT 240" # Clock rate: 1.34 GHz # Total amount of global memory: 536150016 bytes # Number of multiprocessors: 12 # Number of cores: 96 MDIO ERROR: cannot open file "restart.coor" </stderr_txt> ]]> I don't have the faintest idea why it was restarted (or what "restart.coor" is good for at all), I don't run other projects on the GPU in parallel, and I wasn't doing anything on the machine at that time. Gruesse vom Saenger For questions about Boinc look in the BOINC-Wiki ID: 18916 · Rating: 0 · rate: / Reply Quote