Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)

To: "Luigi P. Bai" <lpb+debian@kandl.houston.tx.us>, 901420@bugs.debian.org
Subject: Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
From: Ben Hutchings <ben@decadent.org.uk>
Date: Thu, 14 Jun 2018 00:39:47 +0100
Message-id: <[🔎] 78fb658e4b3e9e0c7150c5b98bfb17f2f18b9923.camel@decadent.org.uk>
Reply-to: Ben Hutchings <ben@decadent.org.uk>, 901420@bugs.debian.org
In-reply-to: <[🔎] 5e66e76f-c4a0-8b38-29cf-c34ac26aa1c2@kandl.houston.tx.us>
References: <[🔎] 20180613010430.4543.10151.reportbug@qnap01.internal.kandl.houston.tx.us> <[🔎] 05806545ee0bea9c215401163f1a92e51b5ca2b1.camel@decadent.org.uk> <[🔎] 20180613010430.4543.10151.reportbug@qnap01.internal.kandl.houston.tx.us> <[🔎] 5e66e76f-c4a0-8b38-29cf-c34ac26aa1c2@kandl.houston.tx.us> <[🔎] 20180613010430.4543.10151.reportbug@qnap01.internal.kandl.houston.tx.us>

Control: tag -1 - moreinfo
Control: severity -1 normal

On Wed, 2018-06-13 at 07:01 -0500, Luigi P. Bai wrote:
> Hi Ben,
> 
> Thank you for looking at this report.
> 
> You asked what is running on the machine. The first thing I want to
> point out is that it's the same stack of processes that were running on
> the 3.x version of the kernel; the kernel was the only upgrade. The
> system is "mostly jessie" with "apt-get -t stretch install
> linux-image-orion5x".
>
> This is a QNAP NAS, so it mostly runs a steady state of apache, nagios,
> smb, cups, dnscache, mysql, postfix, slapd, and the other "regular"
> daemons. In the middle of the night it'll catch an rsync request backing
> up another machine on the network. These OOM reports seem to happen both
> at night and during the "steady state" during the day when I'm at work
> and away from the machines (this is happening on my other QNAP too).

This seems like quite a lot of services to run on a 256 MB system.

> I seem to remember the (very rare) OOM reports I'd seen in the past also
> listing the process name, number, and backtrace when the process was
> killed. Has the OOM reporting changed? Has page allocation changed from
> 3.x to 4.x to cause this? I'm not noticing that long running processes
> are being killed, and I'm not seeing any other reports in the log files
> of processes being killed.

When handling received network packets, the kernel cannot wait for
memory to be freed up (that's what the "GFP_ATOMIC" indicates), so it
relies on the kernel memory manager keeping some memory free at all
times.  I think that the "OOM killer" will only be triggered by
allocation requests that can wait to free up memory, but I'm not sure.

This error was triggered by a request for 2 adjacent pages of memory,
when there were only single pages of memory free.  It's possible that
the change in behaviour is due to a kernel structure growing to occupy
2 pages where it previously fit into 1.

You might be able to reduce the likelihood of this error by increasing
the vm.min_free_kbytes sysctl.  Or by running fewer services.

Ben.

-- 
Ben Hutchings
The most exhausting thing in life is being insincere.
                                                 - Anne Morrow Lindberg

Attachment: signature.asc
Description: This is a digitally signed message part

Reply to:

Follow-Ups:
- Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
  - From: "Luigi P. Bai" <lpb+debian@kandl.houston.tx.us>

References:
- Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
  - From: "Luigi P. Bai" <lpb+debian@kandl.houston.tx.us>
- Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
  - From: Ben Hutchings <ben@decadent.org.uk>
- Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
  - From: "Luigi P. Bai" <lpb+debian@kandl.houston.tx.us>

Prev by Date: Processed: reassign 901389 to src:linux
Next by Date: Processed: Re: Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
Previous by thread: Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
Next by thread: Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)
Index(es):
- Date
- Thread