[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#628690: sshd children inherit OOM adj from context sshd is started from



I just hit this problem with sshd randomly starting child processes
with oom_adj_score of either -1000 or 0, resulting in unpredictable
oom killer behaviour on my test rigs.

Turns out the problem is a combination of openssh behaviour and a
race in the debian startup scripts.

Firstly, the startup race is described in #502444 - ifup will
restart the ssh server when an interface comes up due to the
allow-hotplug rule. This appeaers to have been changed from a reload
because of problems with the start-stop-daemon handling concurrent
start/reload races sanely.

Regardless of whether that was the right fix, ssh sessions -
children of the sshd process - inherit whatever oom_adj_score the
context that the sshd process was started in. That is, if you start
it from the startup scripts, it has a value of 0. If it is started
from udev (due to allow-hotplug and dhcp), then it will be started
with a value of -1000, as udev modifies it's own value to avoid oom
killer invocations on it. If you run 'service sshd restart" from a
context with a value of X, then all future sshd session will run
with a oom_adj_score of X.

This is simply wrong behaviour - sshd children should always start
with a consistent oom_adj_score, not that inherited from it's
startup context. That, I think, is the source of the bug, and what
needs to be corrected. It seems the kernel folk and the aopenssh
folk are right - this is a distro problem.

This consistent startup environment problem be handled entirely in
the /etc/init.d/sshd startup script via reseting the oom_adj_score
to zero. If admins what a custom value, this can easily be done via a
variable in /etc/default/sshd. It's a simple fix, and then sshd will
always start child processes with a consistent environment
regardless of the context it was started from....

Further details of the analysis of the problem is here:

https://lkml.org/lkml/2011/12/1/563

And the full thread starts here:

https://lkml.org/lkml/2011/12/1/96


Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com



Reply to: