[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: bind9, openswan crashes wheezy VPS



On Mon, Aug 12, 2013 at 02:44:35PM -0600, Bob Proulx wrote:
> I don't know anything about why you are having system crashes.  But no
> one else responded and so I decided to jump in.

Thank you for doing so. I actually went ahead and opened bugs against
openswan and bind9 after getting no responses here in almost 24
hours. I was somewhat reluctant to do that, but if nobody here seems
to have ideas on how to farther troubleshoot this, I figured the people
who build those packages and are probably more familiar with how said
package works than general
users, would have ideas on how to proceed.

> 
> I run a handful of VMs full time and let me assure you that they are
> stable and reliable and don't crash.  Your crashes are not
> intrinsically part of the Linux kernel, Debian, or anything else.
> They are something unique to your environment.  And they should not be
> happening.

Yes, but I figured that if at least one of these programs works fine
on fedora 16 in the same type of environment, then there must be
something with how wheezy interacts with that environment which is
causing this. So, while it is fair to say the problem is unique to my
environment, I also think it's fair, and more precise to say that it's
something having to do with how wheezy specifically interacts with
that environment.

> 
> Very bizarre!  I can't guess as to any reason why.  But I can't
> believe the problem is related to bind code itself.  It is simply a
> user space program the same as any other.  The problem is in the
> kernel.

Yes, that occurred to me as well. However, given that only two packages
are doing this so far out of a bunch of them, I thought it would be
better and more obvious to focus on those first, until I can actually
trace the problem to the kernel itself.

> I would contact your VPS provider support.  If you are paying for the
> service and it isn't working then you should get help to get it going.

Yes, I plan to do that, once I've verified as much as possible the
problem isn't exclusively on my end of things. Perhaps I've reached
that point already.

> Since all of your crashes appear to be network related I imagine the
> problem is in the kernel network driver stack somewhere.

I've thought of that as well, especially since research indicates that
the virtio_net module has had problems in the past. In fact, the most
recent batch of these seems to have been fixed earlier this month in
linux 3.4.56 (more on that below). On the other hand, if it's
something in the network stack, why am I for example able to query my
VPS provider's servers for the same domains without crashes? If it's
in the network stack, then I think it's reasonable to conclude I'd be
seeing crashes regardless of what name servers I queried for those
domains. Right?

> To me it "feels" like an interaction between your very new Linux
> kernel version 3.9 and your quite old qemu version 0.9.1.  I would try
> the *oldest* stock Debian kernel you can find that still supports your
> libc and other libs and see if that fixes things.  (At some point your
> old kernel won't support the newer userland.  I don't know where the
> compatibility lines are drawn though.)

I actually did do something along the same lines. I tried linux 3.10
from unstable, and then my own build of linux 3.10.5. Same results as
with 3.9 from wheezy-backports. I then tried my own builds of 3.4.56,
3.0.89, and 2.6.32 from squeeze. My builds were done using the sources
from kernel.org. I was really hoping that 3.4.56 would be the magic
fix, because of the virtio_net fixes I mentioned above that went into
it. Everything from 3.4.56 down behaved the same way as 3.2.0 in
wheezy (I.E. crashes during boot when starting bind9, and crashes on
resolving the domains that make it crash). The exception was 2.6.32
from squeeze which crashed the machine when I attempted to query my
local bind for even the domains that work on higher kernels. So, I
didn't go lower than that.

There is one thing that's been bothering me on and off through all
this, which I forgot to mention in my original post. The fedora
machine with the same VPS provider. I noticed there is no
virtio_ring.ko module, it simply doesn't exist on that machine. All
the kernels I tried have virtio_ring built as module, and I couldn't
find a .config option to disable it anywhere when I was doing my build
of 3.10.5. I did a bit of research, but couldn't find a clear answer
on what exactly virtio_ring does. I keep wondering on and off what would happen if I
could find a way to black list it in the initrd image. Would all this
suddenly go away, or would I end up with an unbootable system, because
virtio_blk couldn't load with virtio_ring black listed. I would prefer not to risk the second
alternative, so it would be best if I can simply find a debian kernel,
or build my own without virtio_ring altogether.

> 
> I would get your VPS support involved.  If there are no other ideas
> then I would have them move you to a host with a newer qemu 1.x
> installed.  The VPS provider should be able to do this relatively
> easily.  Hopefully that will work better with the newer Linux kernel.

Yeah, it looks like I did as much as I can to troubleshoot things on
my end. I'll contact them I guess.

> 
> Good luck!
> Bob

Thanks again for your reply and suggestions! Two or more heads are
better than one, on the debian angle so far anyway.

Greg


-- 
web site: http://www.gregn..net
gpg public key: http://www.gregn..net/pubkey.asc
skype: gregn1
(authorization required, add me to your contacts list first)

--
Free domains: http://www.eu.org/ or mail dns-manager@EU.org


Reply to: