KASHIF_HIVPR Errors?

Message boards : Number crunching : KASHIF_HIVPR Errors?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Fred J. Verster

Send message
Joined: 1 Apr 09
Posts: 58
Credit: 35,833,978
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18639 - Posted: 12 Sep 2010, 12:08:11 UTC - in response to Message 18637.  
Last modified: 12 Sep 2010, 12:21:14 UTC

Since the 9800GTX+ started making 'trouble', like overheating, which resulted
in faults, I first got a GTX470 which I traded for repairing an PII (Compaq).
Then I could buy a 'show-model', from which I've seen it work.
(All kinds of simulations), I bought it for €275 .(€485 normal+BTW)
I found out that these 'monsters', need a 650W(minimal), 850W is better, PSU
It draws 17A from it's 8 pin and 17A from it's 6 pin connectors and an additionel ~6 - 10A from the Mainboard. (ASUS P5E).
Now I have to find a way to get the 470 to work..........
But I'm glad I made the change, for GPUGrid it's working like a charm and on
SETI@Home, I now can run 3 MultiBeam's (0.04CPU+0.33GPU), at a time, so sometimes
BOINC 6.10.58, 64BIT, runs 7 SETI tasks and/or a mix of Einstein and other project.

I use driver 258.96 and CUDA 3.1.
And it looks like those KASHIF_HIVPR WU's, need to have compute capabillity
2.0. (2.1?)

Knight Who Says Ni N!
ID: 18639 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mwgiii

Send message
Joined: 22 Jan 09
Posts: 8
Credit: 988,332,833
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18663 - Posted: 13 Sep 2010, 23:48:11 UTC - in response to Message 18639.  

All of the KASHIF_HIVPR are generating errors on both of my machines.

Out of the first two pages of my Tasks (40 work units), I have had 24 work units error out, all KASHIF_HIVPR. It is killing my contributions as ftpd said, the GPU crunching halts until I notice the error message.
ID: 18663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18664 - Posted: 14 Sep 2010, 0:13:36 UTC - in response to Message 18663.  

Probably best to do a system restart and then abort the download of any KASHIF_HIVPR tasks that you pick up.
Hopefully you will pick up other work units.
ID: 18664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mwgiii

Send message
Joined: 22 Jan 09
Posts: 8
Credit: 988,332,833
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 18665 - Posted: 14 Sep 2010, 2:01:32 UTC - in response to Message 18664.  

I reboot around every other day. If I see anymore KASHIF, I will abort immediately.
ID: 18665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ralle030583

Send message
Joined: 19 Aug 10
Posts: 19
Credit: 830,540
RAC: 0
Level
Gly
Scientific publications
watwatwat
Message 18693 - Posted: 15 Sep 2010, 18:45:12 UTC - in response to Message 18665.  
Last modified: 15 Sep 2010, 18:46:53 UTC

seems also that all KASHIF.. task fail at my Geforce 9800 GT :-/
(ok currently evething failed cause a OC attemp, but KASHIF task didnt work before OC ^^)
ID: 18693 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zenitur

Send message
Joined: 25 Sep 10
Posts: 2
Credit: 285,845
RAC: 0
Level

Scientific publications
watwat
Message 18781 - Posted: 29 Sep 2010, 10:17:57 UTC

I have same error:

http://www.gpugrid.net/result.php?resultid=3030293
http://www.gpugrid.net/result.php?resultid=3028306

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [transpose_float2] [700]
acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char*, int3, int3, size_t, ...): Assertion `0' failed.
SIGABRT: abort called
Stack trace (17 frames):
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d]
/lib/libc.so.6(+0x324c0)[0x7f4e7810d4c0]
/lib/libc.so.6(gsignal+0x35)[0x7f4e7810d445]
/lib/libc.so.6(abort+0x180)[0x7f4e7810e860]
/lib/libc.so.6(__assert_fail+0xf1)[0x7f4e781064e1]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45feae]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x46032f]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45db09]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45b400]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f4e780f9d2d]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569]

Exiting...

</stderr_txt>
]]>

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce 9800 GT"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536543232 bytes
# Number of multiprocessors: 14
# Number of cores: 112
MDIO ERROR: cannot open file "restart.coor"
SWAN : FATAL : Failure executing kernel sync [PmeRealSpace_compute_forces_kernel] [700]
acemd2_6.04_x86_64-pc-linux-gnu__cuda: ../swan/swanlib_nv.cpp:203: void swanRunKernel(const char*, int3, int3, size_t, ...): Assertion `0' failed.
SIGABRT: abort called
Stack trace (14 frames):
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(boinc_catch_signal+0x4d)[0x46438d]
/lib/libc.so.6(+0x324c0)[0x7f1c49b544c0]
/lib/libc.so.6(gsignal+0x35)[0x7f1c49b54445]
/lib/libc.so.6(abort+0x180)[0x7f1c49b55860]
/lib/libc.so.6(__assert_fail+0xf1)[0x7f1c49b4d4e1]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x459c20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45d3f9]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x45a864]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x428e20]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda[0x41253c]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0xab0)[0x407f10]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sin+0x2bb)[0x40771b]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7f1c49b40d2d]
../../projects/www.gpugrid.net/acemd2_6.04_x86_64-pc-linux-gnu__cuda(sinh+0x49)[0x407569]

Exiting...

</stderr_txt>
]]>

Only on KASHIF tasks. TONI always work fine.
ID: 18781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zenitur

Send message
Joined: 25 Sep 10
Posts: 2
Credit: 285,845
RAC: 0
Level

Scientific publications
watwat
Message 18807 - Posted: 2 Oct 2010, 19:26:01 UTC - in response to Message 18781.  

I found a reason of my error. This is automatic suspend. After restart KASHIF tasks make an error.
ID: 18807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 20 Jul 08
Posts: 134
Credit: 23,657,183
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 18916 - Posted: 11 Oct 2010, 12:30:47 UTC

I just had this one wrecked:
stderr out
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.34 GHz
# Total amount of global memory:                 536150016 bytes
# Number of multiprocessors:                     12
# Number of cores:                               96
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>


I don't have the faintest idea why it was restarted (or what "restart.coor" is good for at all), I don't run other projects on the GPU in parallel, and I wasn't doing anything on the machine at that time.

Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki
ID: 18916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : KASHIF_HIVPR Errors?

©2025 Universitat Pompeu Fabra