[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#901420: ksoftirqd/0: page allocation failure: order:1, mode:0x2284020(GFP_ATOMIC|__GFP_COMP|__GFP_NOTRACK)



Control: tag -1 - moreinfo
Control: severity -1 normal

On Wed, 2018-06-13 at 07:01 -0500, Luigi P. Bai wrote:
> Hi Ben,
> 
> Thank you for looking at this report.
> 
> You asked what is running on the machine. The first thing I want to
> point out is that it's the same stack of processes that were running on
> the 3.x version of the kernel; the kernel was the only upgrade. The
> system is "mostly jessie" with "apt-get -t stretch install
> linux-image-orion5x".
>
> This is a QNAP NAS, so it mostly runs a steady state of apache, nagios,
> smb, cups, dnscache, mysql, postfix, slapd, and the other "regular"
> daemons. In the middle of the night it'll catch an rsync request backing
> up another machine on the network. These OOM reports seem to happen both
> at night and during the "steady state" during the day when I'm at work
> and away from the machines (this is happening on my other QNAP too).

This seems like quite a lot of services to run on a 256 MB system.

> I seem to remember the (very rare) OOM reports I'd seen in the past also
> listing the process name, number, and backtrace when the process was
> killed. Has the OOM reporting changed? Has page allocation changed from
> 3.x to 4.x to cause this? I'm not noticing that long running processes
> are being killed, and I'm not seeing any other reports in the log files
> of processes being killed.

When handling received network packets, the kernel cannot wait for
memory to be freed up (that's what the "GFP_ATOMIC" indicates), so it
relies on the kernel memory manager keeping some memory free at all
times.  I think that the "OOM killer" will only be triggered by
allocation requests that can wait to free up memory, but I'm not sure.

This error was triggered by a request for 2 adjacent pages of memory,
when there were only single pages of memory free.  It's possible that
the change in behaviour is due to a kernel structure growing to occupy
2 pages where it previously fit into 1.

You might be able to reduce the likelihood of this error by increasing
the vm.min_free_kbytes sysctl.  Or by running fewer services.

Ben.

-- 
Ben Hutchings
The most exhausting thing in life is being insincere.
                                                 - Anne Morrow Lindberg

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: