Re: cgroup OOM killer loop causes system to lockup (possible fix included)

To: debian-kernel@lists.debian.org, debian-user@lists.debian.org, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org
Subject: Re: cgroup OOM killer loop causes system to lockup (possible fix included)
From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>
Date: Mon, 30 May 2011 01:24:05 +0100
Message-id: <[🔎] 4DE2E3A5.4060208@simplicitymedialtd.co.uk>
In-reply-to: <[🔎] 4DE2CB80.8040101@simplicitymedialtd.co.uk>
References: <[🔎] 4DE2CB80.8040101@simplicitymedialtd.co.uk>

Another quick update on this...

Something really really really strange has happened this time.. I just did a completely fresh re-install (exactly like last time) and the problem went away.

So, I'm going to build up the lxc / kernel support step by step (and check for the oom_adj problem after each change) and will report back once I've found the step it kicks in at.

Hopefully, my hours of frustration will prevent someone else from going through the same thing :/

Cal

On 29/05/2011 23:41, Cal Leeming [Simplicity Media Ltd] wrote:

After submitting this problem to Kernel mailing list, I found a strange (and even more curious problem).

I attempted to run both kernels on two seperate installs of Debian (both 6.0), but slightly different in packages.. One install did not have the OOM problem, yet the other one did.. And they both ran the same kernel, the same LXC container etc..

I have a feeling this may be a Debian bug..? Further to back up these claims:

Working server:
root@courtney.internal [/mnt/encstore/lxc/github01.rootfs] > for x in `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print $3}'` ; do ps -p $x u --no-headers ; done
grep: /proc/3090/task/3090/oom_adj: No such file or directory
grep: /proc/3090/oom_adj: No such file or directory
root       804 0.0 0.0 16992 1112 ?        S<s Mar17   0:00 udevd --daemon
root       804 0.0 0.0 16992 1112 ?        S<s Mar17   0:00 udevd --daemon
root     25536 0.0 0.0 49164 1100 ?        Ss   Apr05   0:00 /usr/sbin/sshd
root     25536 0.0 0.0 49164 1100 ?        Ss   Apr05   0:00 /usr/sbin/sshd
root     32173 0.0 0.0 16988   960 ?        S<   21:53   0:00 udevd --daemon
root     32173 0.0 0.0 16988   960 ?        S<   21:53   0:00 udevd --daemon
root     32174 0.0 0.0 16988   916 ?        S<   21:53   0:00 udevd --daemon
root     32174 0.0 0.0 16988   916 ?        S<   21:53   0:00 udevd --daemon

Broken server:
root@vicky [/var] > for x in `find /proc -iname 'oom_adj' | xargs grep "\-17" | awk -F '/' '{print $3}'` ; do ps -p $x u --no-headers ; done
grep: /proc/4597/task/4597/oom_adj: No such file or directory
grep: /proc/4597/oom_adj: No such file or directory
root       889 0.0 0.0 16788   860 ?        S<s 09:10   0:00 udevd --daemon
root       889 0.0 0.0 16788   860 ?        S<s 09:10   0:00 udevd --daemon
root      1104 0.0 0.0 17228 1200 ?        S<   09:10   0:00 udevd --daemon
root      1104 0.0 0.0 17228 1200 ?        S<   09:10   0:00 udevd --daemon
root      1141 0.0 0.0 17120 1128 ?        S<   09:10   0:00 udevd --daemon
root      1141 0.0 0.0 17120 1128 ?        S<   09:10   0:00 udevd --daemon
root      2745 0.0 0.0 49168 1168 ?        Ss   09:10   0:00 /usr/sbin/sshd
root      2745 0.0 0.0 49168 1168 ?        Ss   09:10   0:00 /usr/sbin/sshd
root      2750 0.0 0.0 76672 3360 ?        Ss   09:11   0:00 sshd: foxx [priv]
root      2750 0.0 0.0 76672 3360 ?        Ss   09:11   0:00 sshd: foxx [priv]
foxx      2754 0.0 0.0 76672 1636 ?        S    09:11   0:00 sshd: foxx@pts/0
foxx      2754 0.0 0.0 76672 1636 ?        S    09:11   0:00 sshd: foxx@pts/0
foxx      2755 0.0 0.0 123984 1984 pts/0    Ss   09:11   0:00 -bash
foxx      2755 0.0 0.0 123984 1984 pts/0    Ss   09:11   0:00 -bash
root      2760 0.0 0.0 127176 1244 pts/0    S    09:11   0:00 sudo su
root      2760 0.0 0.0 127176 1244 pts/0    S    09:11   0:00 sudo su
root      2761 0.0 0.0 147820 1328 pts/0    S    09:11   0:00 su
root      2761 0.0 0.0 147820 1328 pts/0    S    09:11   0:00 su
root      2762 0.0 0.0 124084 2152 pts/0    S    09:11   0:00 bash
root      2762 0.0 0.0 124084 2152 pts/0    S    09:11   0:00 bash

Notice how the sshd user instances on the broken server are set to -17, yet the sshd user instances on the working server, are not :S

Any input from anyone on the below would be VERY much appreciated.

Cal

-------- Original Message --------

Subject:	Re: Fwd: cgroup OOM killer loop causes system to lockup (possible fix included)
Date:	Sun, 29 May 2011 23:24:07 +0100
From:	Cal Leeming [Simplicity Media Ltd] <cal.leeming@simplicitymedialtd.co.uk>
Organization:	Simplicity Media Ltd
To:	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org

Some further logs:
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283


On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>  First of all, my apologies if I have submitted this problem to the 
> wrong place, spent 20 minutes trying to figure out where it needs to 
> be sent, and was still none the wiser.
>
> The problem is related to applying memory limitations within a cgroup. 
> If the OOM killer kicks in, it gets stuck in a loop where it tries to 
> kill a process which has an oom_adj of -17. This causes an infinite 
> loop, which in turn locks up the system.
>
> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
>
>
>  root@vicky [/home/foxx] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>
> When this is encountered, the memory usage across the whole server is 
> still within limits (not even hitting swap).
>
> The memory configuration for the cgroup/lxc is:
> lxc.cgroup.memory.limit_in_bytes = 3000M
> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>
> Now, what is even more strange, is that when running under the 
> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't 
> happen. However, there is a slight difference between the two kernels. 
> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, 
> where as the 2.6.32.41 gives a default of -17. I suspect this is the 
> root cause of why it's showing in the later kernel, but not the earlier.
>
> To test this theory, I started up the lxc on both servers, and then 
> ran a one liner which showed me all the processes with an oom_adj of -17:
>
> (the below is the older/working kernel)
> root@courtney.internal [/mnt/encstore/lxc] > uname -a
> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
> 2011 x86_64 GNU/Linux
> root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc 
> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do 
> ps -p $x --no-headers ; done
> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> grep: /proc/1411/oom_adj: No such file or directory
>   804 ?        00:00:00 udevd
>   804 ?        00:00:00 udevd
> 25536 ?        00:00:00 sshd
> 25536 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 32173 ?        00:00:00 udevd
> 32173 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
>
> (the below is the newer/broken kernel)
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in 
> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print 
> $3}'` ; do ps -p $x --no-headers ; done
> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> grep: /proc/3118/oom_adj: No such file or directory
>   895 ?        00:00:00 udevd
>   895 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  2596 ?        00:00:00 sshd
>  2596 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2614 pts/0    00:00:00 bash
>  2614 pts/0    00:00:00 bash
>  2620 pts/0    00:00:00 sudo
>  2620 pts/0    00:00:00 sudo
>  2621 pts/0    00:00:00 su
>  2621 pts/0    00:00:00 su
>  2622 pts/0    00:00:00 bash
>  2622 pts/0    00:00:00 bash
>  2685 ?        00:00:00 lxc-start
>  2685 ?        00:00:00 lxc-start
>  2699 ?        00:00:00 init
>  2699 ?        00:00:00 init
>  2939 ?        00:00:00 rc
>  2939 ?        00:00:00 rc
>  2942 ?        00:00:00 startpar
>  2942 ?        00:00:00 startpar
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2980 ?        00:00:00 startpar
>  2980 ?        00:00:00 startpar
>  2981 ?        00:00:00 ctlscript.sh
>  2981 ?        00:00:00 ctlscript.sh
>  3016 ?        00:00:00 cron
>  3016 ?        00:00:00 cron
>  3025 ?        00:00:00 mysqld_safe
>  3025 ?        00:00:00 mysqld_safe
>  3032 ?        00:00:00 sshd
>  3032 ?        00:00:00 sshd
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3113 ?        00:00:00 ctl.sh
>  3113 ?        00:00:00 ctl.sh
>  3115 ?        00:00:00 sleep
>  3115 ?        00:00:00 sleep
>  3116 ?        00:00:00 .memcached.bin
>  3116 ?        00:00:00 .memcached.bin
>
>
> As you can see, it is clear that the newer kernel is setting -17 by 
> default, which in turn is causing the OOM killer loop.
>
> So I began to try and find what may have caused this problem by 
> comparing the two sources...
>
> I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
> both code sets, but found no obvious differences:
> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e 
> oom_adj
>
> Then I checked for references to "-17" in all .c and .h files, and 
> found a couple of matches, but only one obvious one:
> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>
> But again, a search for OOM_DISABLE came up with nothing obvious...
>
> In a last ditch attempt, I did a search for all references to 'oom' 
> (case-insensitive) in both code bases, then compared the two:
>  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
> > /tmp/annabelle.oom_adj
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
> "oom" . | sort -n > /tmp/vicky.oom_adj
>
> and this brought back (yet again) nothing obvious..
>
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>
>
> Could anyone please shed some light as to why the default oom_adj is 
> set to -17 now (and where it is actually set)? From what I can tell, 
> the fix for this issue will either be:
>
>   1. Allow OOM killer to override the decision of ignoring oom_adj ==
>      -17 if an unrecoverable loop is encountered.
>   2. Change the default back to 0.
>
> Again, my apologies if this bug report is slightly unorthodox, or 
> doesn't follow usual procedure etc. I can assure you I have tried my 
> absolute best to give all the necessary information though.
>
> Cal
>

Reply to:

Follow-Ups:
- Re: cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>

References:
- cgroup OOM killer loop causes system to lockup (possible fix included)
  - From: "Cal Leeming [Simplicity Media Ltd]" <cal.leeming@simplicitymedialtd.co.uk>

Prev by Date: Bug#627837: linux-2.6: Aufs apparently silently dropped, breaking debian-live
Next by Date: Bug#627837: linux-2.6: Aufs apparently silently dropped, breaking debian-live
Previous by thread: cgroup OOM killer loop causes system to lockup (possible fix included)
Next by thread: Re: cgroup OOM killer loop causes system to lockup (possible fix included)
Index(es):
- Date
- Thread