[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#636797: linux-image-2.6.32-5-amd64: avoid divide-by-zero ("divide error: 0000") in scheduler



On Fri, 2011-08-05 at 18:36 -0400, Daniel Kahn Gillmor wrote:
> Package: linux-2.6
> Version: 2.6.32-35
> Tags: patch
> 
> We've now seen multiple crashes during periods of heavy IO on amd64
> architecture machines running 2.6.32-5-amd64 from stock squeeze
> installs.
[...]
> This seems to be related to the kernel's upstream bug report:
> 
>   https://bugzilla.kernel.org/show_bug.cgi?id=16991
> 
> It looks like ubuntu has done something to try to address the same bug
> in their linux-ec2 package in march:
> 
>  https://bugs.launchpad.net/linux/+bug/614853

Right.

> We've applied the attached patch (a simple workaround to ensure no
> division-by-zero) to the debian packages for several weeks in production
> (over a month on some machines) and haven't seen a recurrence of the
> problem.
> 
> I recommend this patch for inclusion in debian's next bugfix release.  I
> welcome feedback on it, of course.
[...]

This doesn't really fix the bug - division by zero is just a symptom of
a more fundamental problem which has yet to be identified.  As a result,
it hasn't been accepted upstream and won't be accepted in Debian.

That said, I would consider applying a variant that WARNs before 'fixing
up' the zero divisor, as a *temporary* measure to aid in understanding
the bug (more like
<https://bugzilla.kernel.org/show_bug.cgi?id=16991#c13>).

I notice your 'oops' messages show 'Tainted: G W' which indicates there
was an earlier kernel warning.  What was the previous warning?

Ben.

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: