Advanced search

Message boards : Number crunching : 2sec NATHAN_FAX4 failures on Linux Exit status 193 (0xc1)

Author Message
Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24278 - Posted: 6 Apr 2012 | 16:29:56 UTC
Last modified: 6 Apr 2012 | 16:30:49 UTC

process exited with code 193 (0xc1, -63)

Recently any *NATHAN_FAX4* tasks that I have run on this Linux system have failed after ~2seconds.
All other tasks run well (no failures), including a NATHAN_CB1 task.

On the 1st and 2nd April I ran two NATHAN_FAX4 tasks successfully, but since the 3rd they have all failed. Last night I restarted but another NATHAN_FAX4 failed today. I keep at least one free CPU thread. The computer was not in use when the failures occurred.

Outcome Computation error
Client state Compute error
Exit status 193 (0xc1)

Stderr output

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 470"
# Clock rate: 1.22 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 14
# Number of cores: 112
SIGABRT: abort called
Stack trace (13 frames):
../../projects/www.gpugrid.net/acemd.linux64.2352(boinc_catch_signal+0x4d)[0x482bed]
/lib/x86_64-linux-gnu/libc.so.6(+0x36420)[0x7ffcad27d420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7ffcad27d3a5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7ffcad280b0b]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4935db]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x434dd0]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4312d6]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4309e7]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x414ef9]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407c9a]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x40857e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7ffcad26830d]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407a19]

Exiting...

</stderr_txt>
]]>

From http://boincfaq.mundayweb.com:

    Code 193 is a segmentation violation error.

    You either have problems with your memory or swap file, or the application attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

    Use a memory checking program like memtest86+ to rigorously test your memory.
    And always when you have this error, report it on the forums of the application it happens with. It may well be an error in the application's code.



Boinc 6.12.33, GTX 470 (ref., fan control on and temp's <70degC), repo drivers 280.13, Ubuntu 11.10, i7-2600, 8GB DDR3, 60GB SSD
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 24289 - Posted: 6 Apr 2012 | 23:01:33 UTC

Out for the Easter holidays so I'll look into it when I can. I'd ask if you've changed anything recently but I doubt you're one to overlook such things. Hopefully this isn't a broader problem.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24335 - Posted: 9 Apr 2012 | 17:11:26 UTC
Last modified: 9 Apr 2012 | 17:17:23 UTC

I get that problem with the same card each time. It's clocked a bit higher than the others on that host so I suspect it's got a bit fussy in it's old age. It's failed six times after 2 secs so I'm not too bothered. Odd that it's just started doing it, like yours. 295.33 driver.

<core_client_version>7.0.23</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# Using device 2
# There are 3 devices supporting CUDA
# Device 0: "GeForce GTX 580"
# Clock rate: 1.64 GHz
# Total amount of global memory: 1610285056 bytes
# Number of multiprocessors: 16
# Number of cores: 128
# Device 1: "GeForce GTX 570"
# Clock rate: 1.66 GHz
# Total amount of global memory: 1341849600 bytes
# Number of multiprocessors: 15
# Number of cores: 120
# Device 2: "GeForce GTX 570"
# Clock rate: 1.70 GHz
# Total amount of global memory: 1341718528 bytes
# Number of multiprocessors: 15
# Number of cores: 120
SWAN: Using synchronization method 0
SIGABRT: abort called
Stack trace (13 frames):
../../projects/www.gpugrid.net/acemd.linux64.2352(boinc_catch_signal+0x4d)[0x482bed]
/lib/x86_64-linux-gnu/libc.so.6(+0x36420)[0x7f905e0fe420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f905e0fe3a5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f905e101b0b]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4935db]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x434dd0]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4312d6]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x4309e7]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x414ef9]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407c9a]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x40857e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f905e0e930d]
../../projects/www.gpugrid.net/acemd.linux64.2352[0x407a19]

Exiting...

</stderr_txt>
]]>

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24336 - Posted: 9 Apr 2012 | 18:44:08 UTC - in response to Message 24335.
Last modified: 9 Apr 2012 | 19:10:43 UTC

Just had two more failed on that card. Perhaps I'd better swap it out before the evil server backlists it ............
Just swapped it for one that's clocked even higher. I really must mark these cards up, lol

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 0
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24337 - Posted: 9 Apr 2012 | 19:47:29 UTC - in response to Message 24336.

What if you simply downclock this problematic card? To make it the simple way, you should put it in a WinXP machine of yours, and use MSI Afterburner to do the job.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 24342 - Posted: 9 Apr 2012 | 22:50:32 UTC - in response to Message 24336.

I was thinking my issue was due to some Linux system update, possibly security; the card ran NATHAN_FAX4 tasks without issue before the updates and also when I had Windows installed on the same rig.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Post to thread

Message boards : Number crunching : 2sec NATHAN_FAX4 failures on Linux Exit status 193 (0xc1)

//