[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#640941: xen dom0 crash: unable to handle kernel paging request / Oops / Kernel panic



On Wed, 2011-09-14 at 23:17 +0200, Hans van Kranenburg wrote:
> Hi Ian,
> 
> On 09/11/2011 05:37 PM, Ian Campbell wrote:
> > On Sun, 2011-09-11 at 01:18 +0200, Hans van Kranenburg wrote:
> >> On 09/08/2011 07:12 PM, Hans van Kranenburg wrote:

> > In the meantime disabling sendpage sounds like the best workaround.
> 
> So, we set the disable_sendpage option, did a domU reboot with drbdadm 
> down/up of the drbd devices (just to be sure, don't know where/when this 
> option is read by drbd), and after some days of hitting the disks and 
> the network with data, no kernel panics happened anymore. Yay!

Glad to hear it!

> In the post you reference with [1] you write: "I expect that other block 
> and filesystem users of the network subsystem (e.g. iSCSI) would also 
> benefit from this functionality since they will suffer from the same 
> class of issue.".
>   Part of my work in the near future is doing lenny->squeeze upgrades of 
> a couple of systems where we use lvm backed block devices for domU's 
> which are on dm-multipath on iSCSI.
>   Should I be concerned about the same issues that can happen when using 
> iSCSI on squeeze? If so, or if unknown, do you recommend specific 
> (stress)tests that we can do at the test-upgrade environment?

The strange thing is that this class of issue has always been present
AFAIK, it seems that it just takes a very particular confluence of
circumstances (involving heavy load, bad luck etc) before anything goes
wrong, although it does seem that if a particular setup is susceptible
it will see it quite a lot.

DRDB's use of sendpage is pretty recent (I think) which is why you only
just started seeing it. I think if you've been using iSCSI up to now
without problem I wouldn't expect you to start seeing problems now,
although you should obviously keep this issue in mind if you see
anything weird. 

> 
> [1] http://marc.info/?l=linux-netdev&m=131072801125521&w=2
> 
> What should be done with this bug report? Should I close it, as there's 
> a workaround, and there's no simple fix that can be done in squeeze, or 
> should it be hanging around to be closed when the work on this is done 
> and included in the kernel?

I think there's no harm in keeping it around as a reminder (to me) that
something needs to be done. We can close it when the fixed upstream
kernel hits Sid.

Ian.
-- 
Ian Campbell

Heuristics are bug ridden by definition.  If they didn't have bugs,
then they'd be algorithms.




Reply to: