[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: bind9, openswan crashes wheezy VPS



Gregory Nowak wrote:
> I have a VPS running a fresh install of wheezy, installed by me from
> scratch (including kernel). Everything seems to be running fine,
> except for bind9 and openswan which literally crash the vps as
> explained below.

I don't know anything about why you are having system crashes.  But no
one else responded and so I decided to jump in.

I run a handful of VMs full time and let me assure you that they are
stable and reliable and don't crash.  Your crashes are not
intrinsically part of the Linux kernel, Debian, or anything else.
They are something unique to your environment.  And they should not be
happening.

> I'll start with bind9, since I have more info there. It's setup as a
> name server authoritative for two zones. Querying both zones works fine
> from localhost and the internet over ipv4, and ipv6. The problem comes
> up when I try to use bind9 to resolve other domains from
> localhost. When resolving certain domains, the vps literally
> crashes. I have to send it a boot request, and it boots up again

Very bizarre!  I can't guess as to any reason why.  But I can't
believe the problem is related to bind code itself.  It is simply a
user space program the same as any other.  The problem is in the
kernel.

> When using the stock wheezy kernel, the machine would sometimes crash
> during boot right after printing "starting bind9," before the ok that
> comes after. This was true especially if starting named without the -4
> flag to disable ipv6. There were also random crashes every couple of
> days or so when I wasn't logged into the machine watching for
> them. All this seems to have gone away after I upgraded to
> linux 3.9 from wheezy-backports, and just the query crashes remain.

I have no idea.  However if the kernel is crashing then the problem is
in the kernel or kernel+kernel-drivers which are also part of the
kernel.  All else are just symptoms of the disease.

> I know someone who is with the same VPS provider and runs fedora 16 in
> his VPS. I have a shell account on his system, and have been able to
> verify for myself by using dig that it's possible to query all the
> domains I listed above using his local bind9 on his machine with no
> crashes. As far as I can tell (lspci, /proc/cpuinfo), his vps is
> configured exactly like mine as far as hardware, except for RAM and HD
> capacity. That's all the info I have on the bind9 problem.

I would contact your VPS provider support.  If you are paying for the
service and it isn't working then you should get help to get it going.

> As far as openswan, it's setup with one connection, configured as
> ...
> The machine crashes when I try to initiate a connection from a win7
> client. Nothing gets written to the logs here, so the output below is

Again, very bizarre.  But openswan won't be the problem either.  This
is just another symptom of the kernel problem.

> That's all the info I have on the openswan issue. This vps is of
> course running lots more than just bind9 and openswan. Apache,
> proftpd, postfix, spamassassin, clamav, opendkim, just to name a
> few. All of those appear to be running without problems.

Since all of your crashes appear to be network related I imagine the
problem is in the kernel network driver stack somewhere.

> As far as the vps itself, it is based on KVM/QEMU with one cpu, and
> one gig of RAM. The network card uses the virtio_net module, and the
> HD shows up as /dev/vda (I assume using the virtio_blk module, which
> is also automatically loaded).

Seems reasonable to me.  I have several that are similar.  All run fine.

> Based on the login banner I get when using out of band access, the
> host seems to be running openbsd. I'm not sure if the machine
> providing the out of band account and the host my vps is running on
> are actually one and the same though. According to /proc/cpu, the
> KVM/QEMU version seems to be 0.9.1.

That seems to be quite an old version of qemu.

  Squeeze 6 - 0.12.5
  Wheezy 7 - 1.1.2
  Unstable - 1.5.0
  Experimental - 1.6.0~rc0

Part of me says that if it worked reliably way back when 0.9.1 was
current then it should still be working reliably today.  But another
part worries that the Linux 3.9 that you are running is tickling some
bug in the qemu 0.9.1 and that upgrading to something more recent will
probably fix it.  Because your problem seems pretty severe and so it
would almost certainly be a bug that would have already been fixed.

> Any help in at least figuring out what is causing this, if not
> actually having a fully functional bind9 and openswan is much
> appreciated. If more info is necessary, I'll see what I can do.

To me it "feels" like an interaction between your very new Linux
kernel version 3.9 and your quite old qemu version 0.9.1.  I would try
the *oldest* stock Debian kernel you can find that still supports your
libc and other libs and see if that fixes things.  (At some point your
old kernel won't support the newer userland.  I don't know where the
compatibility lines are drawn though.)

I would get your VPS support involved.  If there are no other ideas
then I would have them move you to a host with a newer qemu 1.x
installed.  The VPS provider should be able to do this relatively
easily.  Hopefully that will work better with the newer Linux kernel.

Good luck!
Bob

Attachment: signature.asc
Description: Digital signature


Reply to: