Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

To: paul.szabo@sydney.edu.au
Cc: 695182@bugs.debian.org
Subject: Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
From: Ben Hutchings <ben@decadent.org.uk>
Date: Sun, 16 Dec 2012 00:58:47 +0000
Message-id: <[🔎] 1355619527.18807.95.camel@deadeye.wl.decadent.org.uk>
Reply-to: Ben Hutchings <ben@decadent.org.uk>, 695182@bugs.debian.org
In-reply-to: <[🔎] 201212160014.qBG0E961019744@como.maths.usyd.edu.au>
References: <[🔎] 201212160014.qBG0E961019744@como.maths.usyd.edu.au>

On Sun, 2012-12-16 at 11:14 +1100, paul.szabo@sydney.edu.au wrote:
> Dear Ben,
> 
> > ... I think the initial overflow occurs when calculating x
> > ...
> > setpoint and dirty are numbers of pages and are declared as long, so on
> > a system with enough memory they can presumably differ by 2^21 or more
> > (2^21 pages = 8 GB).  Shifting left by RATELIMIT_CALC_SHIFT = 10 can
> > then change the sign bit.
> > 
> > Does the attached patch fix this?
> > 
> > ...
> > 
> > Most variables in bdi_position_ratio() are declared long, which is
> > enough for a page count.  However, when converting (setpoint - dirty)
> > to a fixed-point number we left-shift by 10, and on a 32-bit system
> > with PAE it is possible to have enough dirty pages that the shift
> > overflows into the sign bit.  We need to cast to s64 before the
> > left-shift.
> > 
> > Reported-by: Paul Szabo <paul.szabo@sydney.edu.au>
> > Reference: http://bugs.debian.org/695182
> > Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> > ---
> >  mm/page-writeback.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 50f0824..8b5600e 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(struct backing_dev_i
> > nfo *bdi,
> >  	 *     => fast response on large errors; small oscillation near setpoint
> >  	 */
> >  	setpoint = (freerun + limit) / 2;
> > -	x = div_s64((setpoint - dirty) << RATELIMIT_CALC_SHIFT,
> > +	x = div_s64((s64)(setpoint - dirty) << RATELIMIT_CALC_SHIFT,
> >  		    limit - setpoint + 1);
> >  	pos_ratio = x;
> >  	pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> 
> Thanks for the quick patch. I am about to test it (in a day or so).
> Initial (blind, off-the-cuff, uneducated) comments:
>  - I had "BUG_ON(x<0)" in my code, so unlikely x changed sign.

If I understand correctly, x is a ratio in the range [-1, 1].  I would
expect it to become negative before wrapping around to become positive,
but it's also conceivable that it went all the way round between two
calls to this function.  (I don't know how often it's likely to be
called.)

>  - Why not use float instead of infinite-precision integer arithmetic?

We cannot assume the presence of an FPU.  (The kernel handles FPU
emulation for userland, but not for itself.)

>  - Do we need a "smooth" function, or would an easy-to-calculate
>    step function suffice?

I don't know.

>  - Is there a check that the returned s64 pos_ratio fits into u32?

It seems to be limited to the range [0, 2] (with 10 fractional bits) but
I didn't check all the other calculations.

Ben.

-- 
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Bug#695182: [wheezy] Write couple of 1GB files for OOM crash
  - From: Jonathan Nieder <jrnieder@gmail.com>

References:
- Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
  - From: paul.szabo@sydney.edu.au

Prev by Date: Bug#695634: linux-image-3.6-trunk-amd64: fails to execute action (hibernate) on critical battery condition - HP Folio 13-2000
Next by Date: Bug#695886: still a problem with 3.6-trunk-amd64
Previous by thread: Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
Next by thread: Bug#695182: [wheezy] Write couple of 1GB files for OOM crash
Index(es):
- Date
- Thread