Strange bind crash on PPC G5 PowerMac running Squeeze plus 2.6.39 backports kernel
Hi all,
I'm not sure whether this is the right place to mention this or not,
however we're experiencing a problem on one of our PPC G5 Mac servers
and was wondering if someone could point me in the right direction?
Basically we have an old PPC G5 Mac which we have re-purposed to become
a wireless access point for our office. It's currently running Debian
Squeeze with a 2.6.39 backports kernel. The problem is that every so
often (perhaps once every 1-2 weeks) the bind daemon, which is
configured to forward requests to another nameserver, locks hard and can
only be removed by kill -9 followed by a restart. AFAICT no other
daemons on the server seem to be affected.
In order to try and debug the issue, I've rebuilt the PPC bind .deb with
debug/nostrip, but unfortunately it has happened again and I still can't
see any symbols in the resulting process, e.g.
root@cheeseburger:~# /etc/init.d/bind9 stop
Stopping domain name service...: bind9^Crndc: recv failed: operation
canceled
root@cheeseburger:~# ps -ef | grep named
bind 17451 1 0 Dec01 ? 00:00:05 /usr/sbin/named -u bind
root 22050 21760 0 14:26 pts/0 00:00:00 grep named
root@cheeseburger:~# file /usr/sbin/named
/usr/sbin/named: ELF 32-bit MSB shared object, PowerPC or cisco 4500,
version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux
2.6.18, with unknown capability 0x41000000 = 0x13676e75, with unknown
capability 0x10000 = 0xb0401, not stripped
root@cheeseburger:~# gdb -p 17451
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 17451
Reading symbols from /usr/sbin/named...done.
0x1f8c1d7c in ?? ()
(gdb) thread apply all bt full
Thread 1 (process 17451):
#0 0x1f8c1d7c in ?? ()
No symbol table info available.
#1 0x1f8c1d68 in ?? ()
No symbol table info available.
#2 0x1fe4d36c in ?? ()
No symbol table info available.
#3 0x1fe4d468 in ?? ()
No symbol table info available.
#4 0x2032d6b4 in ?? ()
No symbol table info available.
#5 0x1f8a963c in ?? ()
No symbol table info available.
#6 0x1f8a9800 in ?? ()
No symbol table info available.
#7 0x00000000 in ?? ()
No symbol table info available.
(gdb)
root@cheeseburger:~# ldd /usr/sbin/named
linux-vdso32.so.1 => (0x00100000)
liblwres.so.60 => /usr/lib/liblwres.so.60 (0x6ff04000)
libdns.so.69 => /usr/lib/libdns.so.69 (0x6fd58000)
libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x6fd02000)
libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8 (0x6fb58000)
libbind9.so.60 => /usr/lib/libbind9.so.60 (0x6fb2b000)
libisccfg.so.62 => /usr/lib/libisccfg.so.62 (0x6faf2000)
libisccc.so.60 => /usr/lib/libisccc.so.60 (0x6fac9000)
libisc.so.62 => /usr/lib/libisc.so.62 (0x6fa4e000)
libdb-4.6.so => /usr/lib/libdb-4.6.so (0x6f8cf000)
libldap_r-2.4.so.2 => /usr/lib/libldap_r-2.4.so.2 (0x6f85f000)
liblber-2.4.so.2 => /usr/lib/liblber-2.4.so.2 (0x6f831000)
libcap.so.2 => /lib/libcap.so.2 (0x6f80d000)
libpthread.so.0 => /lib/libpthread.so.0 (0x6f7d2000)
libxml2.so.2 => /usr/lib/libxml2.so.2 (0x6f653000)
libc.so.6 => /lib/libc.so.6 (0x6f4c2000)
libGeoIP.so.1 => /usr/lib/libGeoIP.so.1 (0x6f45e000)
libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x6f374000)
libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x6f32c000)
libcom_err.so.2 => /lib/libcom_err.so.2 (0x6f309000)
libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0x6f2e1000)
libdl.so.2 => /lib/libdl.so.2 (0x6f2bd000)
libkeyutils.so.1 => /lib/libkeyutils.so.1 (0x6f29b000)
libresolv.so.2 => /lib/libresolv.so.2 (0x6f264000)
libz.so.1 => /usr/lib/libz.so.1 (0x6f22e000)
libsasl2.so.2 => /usr/lib/libsasl2.so.2 (0x6f1f2000)
libgnutls.so.26 => /usr/lib/libgnutls.so.26 (0x6f122000)
libattr.so.1 => /lib/libattr.so.1 (0x6f0fd000)
/lib/ld.so.1 (0x205bf000)
libm.so.6 => /lib/libm.so.6 (0x6f02f000)
libtasn1.so.3 => /usr/lib/libtasn1.so.3 (0x6f00e000)
libgcrypt.so.11 => /usr/lib/libgcrypt.so.11 (0x6ef6b000)
libgpg-error.so.0 => /usr/lib/libgpg-error.so.0 (0x6ef46000)
Looking at the above address mappings, I'm wondering if something is
getting wedged in the kernel somewhere - if so, can anyone point me
towards some debug symbols that I can load into the relevant backports
kernel for more information?
root@cheeseburger:~# uname -a
Linux cheeseburger 2.6.39-bpo.2-powerpc64 #1 SMP Thu Aug 4 12:38:28 UTC
2011 ppc64 GNU/Linux
Note that I can probably leave the process in this state for a short
while before the wireless is needed again, but at some point over the
next day or so I will have to kill -9 the bind process and restart it in
order to facilitate wireless access until it happens once again.
Many thanks,
Mark.
--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
Reply to: