Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic

To: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Cc: 640941@bugs.debian.org
Subject: Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic
From: Ian Campbell <ijc@hellion.org.uk>
Date: Thu, 15 Sep 2011 10:25:57 +0200
Message-id: <[🔎] 1316075170.25935.10.camel@cthulhu.hellion.org.uk>
Reply-to: Ian Campbell <ijc@hellion.org.uk>, 640941@bugs.debian.org
In-reply-to: <[🔎] 4E7119E6.5000402@mendix.com>
References: <[🔎] 4E68F796.4040709@mendix.com> <[🔎] 4E6BF051.3040904@mendix.com> <[🔎] 1315755440.5182.16.camel@dagon.hellion.org.uk> <[🔎] 4E7119E6.5000402@mendix.com>

On Wed, 2011-09-14 at 23:17 +0200, Hans van Kranenburg wrote:
> Hi Ian,
> 
> On 09/11/2011 05:37 PM, Ian Campbell wrote:
> > On Sun, 2011-09-11 at 01:18 +0200, Hans van Kranenburg wrote:
> >> On 09/08/2011 07:12 PM, Hans van Kranenburg wrote:

> > In the meantime disabling sendpage sounds like the best workaround.
> 
> So, we set the disable_sendpage option, did a domU reboot with drbdadm 
> down/up of the drbd devices (just to be sure, don't know where/when this 
> option is read by drbd), and after some days of hitting the disks and 
> the network with data, no kernel panics happened anymore. Yay!

Glad to hear it!

> In the post you reference with [1] you write: "I expect that other block 
> and filesystem users of the network subsystem (e.g. iSCSI) would also 
> benefit from this functionality since they will suffer from the same 
> class of issue.".
>   Part of my work in the near future is doing lenny->squeeze upgrades of 
> a couple of systems where we use lvm backed block devices for domU's 
> which are on dm-multipath on iSCSI.
>   Should I be concerned about the same issues that can happen when using 
> iSCSI on squeeze? If so, or if unknown, do you recommend specific 
> (stress)tests that we can do at the test-upgrade environment?

The strange thing is that this class of issue has always been present
AFAIK, it seems that it just takes a very particular confluence of
circumstances (involving heavy load, bad luck etc) before anything goes
wrong, although it does seem that if a particular setup is susceptible
it will see it quite a lot.

DRDB's use of sendpage is pretty recent (I think) which is why you only
just started seeing it. I think if you've been using iSCSI up to now
without problem I wouldn't expect you to start seeing problems now,
although you should obviously keep this issue in mind if you see
anything weird. 

> 
> [1] http://marc.info/?l=linux-netdev&m=131072801125521&w=2
> 
> What should be done with this bug report? Should I close it, as there's 
> a workaround, and there's no simple fix that can be done in squeeze, or 
> should it be hanging around to be closed when the work on this is done 
> and included in the kernel?

I think there's no harm in keeping it around as a reminder (to me) that
something needs to be done. We can close it when the fixed upstream
kernel hits Sid.

Ian.
-- 
Ian Campbell

Heuristics are bug ridden by definition.  If they didn't have bugs,
then they'd be algorithms.

Reply to:

References:
- Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic
  - From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
- Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic
  - From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
- Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic
  - From: Ian Campbell <ijc@hellion.org.uk>
- Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic
  - From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

Prev by Date: Bug#640115: linux-latest-2.6: [INTL:nl] Dutch translation of debconf templates
Next by Date: Bug#641429: /usr/bin/sensors: sensors report the wrong cpu temperature on intel atom 330
Previous by thread: Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic
Next by thread: Incomplete upload found in Debian upload queue
Index(es):
- Date
- Thread