[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Strange bind crash on PPC G5 PowerMac running Squeeze plus 2.6.39 backports kernel



Hi all,

I'm not sure whether this is the right place to mention this or not, however we're experiencing a problem on one of our PPC G5 Mac servers and was wondering if someone could point me in the right direction?

Basically we have an old PPC G5 Mac which we have re-purposed to become a wireless access point for our office. It's currently running Debian Squeeze with a 2.6.39 backports kernel. The problem is that every so often (perhaps once every 1-2 weeks) the bind daemon, which is configured to forward requests to another nameserver, locks hard and can only be removed by kill -9 followed by a restart. AFAICT no other daemons on the server seem to be affected.

In order to try and debug the issue, I've rebuilt the PPC bind .deb with debug/nostrip, but unfortunately it has happened again and I still can't see any symbols in the resulting process, e.g.


root@cheeseburger:~# /etc/init.d/bind9 stop
Stopping domain name service...: bind9^Crndc: recv failed: operation canceled
root@cheeseburger:~# ps -ef | grep named
bind     17451     1  0 Dec01 ?        00:00:05 /usr/sbin/named -u bind
root     22050 21760  0 14:26 pts/0    00:00:00 grep named
root@cheeseburger:~# file /usr/sbin/named
/usr/sbin/named: ELF 32-bit MSB shared object, PowerPC or cisco 4500, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, with unknown capability 0x41000000 = 0x13676e75, with unknown capability 0x10000 = 0xb0401, not stripped
root@cheeseburger:~# gdb -p 17451
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 17451
Reading symbols from /usr/sbin/named...done.
0x1f8c1d7c in ?? ()
(gdb) thread apply all bt full

Thread 1 (process 17451):
#0  0x1f8c1d7c in ?? ()
No symbol table info available.
#1  0x1f8c1d68 in ?? ()
No symbol table info available.
#2  0x1fe4d36c in ?? ()
No symbol table info available.
#3  0x1fe4d468 in ?? ()
No symbol table info available.
#4  0x2032d6b4 in ?? ()
No symbol table info available.
#5  0x1f8a963c in ?? ()
No symbol table info available.
#6  0x1f8a9800 in ?? ()
No symbol table info available.
#7  0x00000000 in ?? ()
No symbol table info available.
(gdb)

root@cheeseburger:~# ldd /usr/sbin/named
        linux-vdso32.so.1 =>  (0x00100000)
        liblwres.so.60 => /usr/lib/liblwres.so.60 (0x6ff04000)
        libdns.so.69 => /usr/lib/libdns.so.69 (0x6fd58000)
        libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x6fd02000)
        libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x6fb58000)
        libbind9.so.60 => /usr/lib/libbind9.so.60 (0x6fb2b000)
        libisccfg.so.62 => /usr/lib/libisccfg.so.62 (0x6faf2000)
        libisccc.so.60 => /usr/lib/libisccc.so.60 (0x6fac9000)
        libisc.so.62 => /usr/lib/libisc.so.62 (0x6fa4e000)
        libdb-4.6.so => /usr/lib/libdb-4.6.so (0x6f8cf000)
        libldap_r-2.4.so.2 => /usr/lib/libldap_r-2.4.so.2 (0x6f85f000)
        liblber-2.4.so.2 => /usr/lib/liblber-2.4.so.2 (0x6f831000)
        libcap.so.2 => /lib/libcap.so.2 (0x6f80d000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x6f7d2000)
        libxml2.so.2 => /usr/lib/libxml2.so.2 (0x6f653000)
        libc.so.6 => /lib/libc.so.6 (0x6f4c2000)
        libGeoIP.so.1 => /usr/lib/libGeoIP.so.1 (0x6f45e000)
        libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x6f374000)
        libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x6f32c000)
        libcom_err.so.2 => /lib/libcom_err.so.2 (0x6f309000)
        libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0x6f2e1000)
        libdl.so.2 => /lib/libdl.so.2 (0x6f2bd000)
        libkeyutils.so.1 => /lib/libkeyutils.so.1 (0x6f29b000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x6f264000)
        libz.so.1 => /usr/lib/libz.so.1 (0x6f22e000)
        libsasl2.so.2 => /usr/lib/libsasl2.so.2 (0x6f1f2000)
        libgnutls.so.26 => /usr/lib/libgnutls.so.26 (0x6f122000)
        libattr.so.1 => /lib/libattr.so.1 (0x6f0fd000)
        /lib/ld.so.1 (0x205bf000)
        libm.so.6 => /lib/libm.so.6 (0x6f02f000)
        libtasn1.so.3 => /usr/lib/libtasn1.so.3 (0x6f00e000)
        libgcrypt.so.11 => /usr/lib/libgcrypt.so.11 (0x6ef6b000)
        libgpg-error.so.0 => /usr/lib/libgpg-error.so.0 (0x6ef46000)


Looking at the above address mappings, I'm wondering if something is getting wedged in the kernel somewhere - if so, can anyone point me towards some debug symbols that I can load into the relevant backports kernel for more information?

root@cheeseburger:~# uname -a
Linux cheeseburger 2.6.39-bpo.2-powerpc64 #1 SMP Thu Aug 4 12:38:28 UTC 2011 ppc64 GNU/Linux

Note that I can probably leave the process in this state for a short while before the wireless is needed again, but at some point over the next day or so I will have to kill -9 the bind process and restart it in order to facilitate wireless access until it happens once again.


Many thanks,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs


Reply to: