Also looks like an Ubuntu bug was raised about this a while back: 
     
    https://lists.ubuntu.com/archives/ubuntu-server-bugs/2009-June/013957.html 
     
    https://bugs.launchpad.net/ubuntu/+source/dhcp/+bug/390556 
     
    On 30/05/2011 21:03, Cal Leeming [Simplicity Media Ltd] wrote:
    
      
      More strangeness.. 
       
      If I keep the kernel module loaded, but disable the entry in
      /etc/network/interfaces for eth0, the oom_adj problem disappears.
      But then ofc, I'm left with no network interface. I then tried
      binding /etc/ssh/sshd_config to only listen on 127.0.0.1..
      effectively bypassing the eth0 interface, whilst still allowing it
      to be loaded. But the problem still happens. 
       
      As a last ditch attempt, I tried restarting the ssh daemon from
      the command line. 
       
      I have pretty much explored almost every single option I can think
      of.. I now have absolutely no idea why this is happening. 
       
      So, I've opted for the hacky fix of putting the following into
      /etc/profile and /etc/bash.bashrc: 
       
      echo 0 > /proc/self/oom_adj 
       
      This ensures all shell sessions start off with an oom_adj of 0.
      This is *EXTREMELY* hacky, and I'd certainly hope to see that the
      debian devs fix this issue in the future. 
       
      Cal 
       
       
      On 30/05/2011 20:27, Cal Leeming [Simplicity Media Ltd] wrote:
      
        
        Bit more info: 
         
        root@vicky:~# ethtool -i eth0 
        driver: bnx2 
        version: 2.0.2 
        firmware-version: 5.0.11 NCSI 2.0.5 
        bus-info: 0000:02:00.0 
        root@vicky:~# 
         
        Btw, to make 100% sure it was the bnx2 module which was causing
        the issue, I removed bnx2 from auto loading at boot time (on the
        broken server), ran the same check, and the issue had been
        resolved. But this meant I had no network card lol. 
         
        I also tried disabling MSI by using kernel boot param:
        pci=nomsi, and this didn't seem to do anything. 
         
        Cal 
         
         
        On 30/05/2011 18:37, Cal Leeming [Simplicity Media Ltd] wrote:
        
          
           
          Here is an strace of the SSH process (which is somehow
          inheriting the -17 oom_adj on all forked user instances) 
           
          (broken server - with bnx2 module loaded) 
          [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj",
          O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...> 
          [pid  2120] [    7f13a09c9f00] write(7,
          "\0\0\2\240\n\n\n\nPort 22\n\n\n\nProtocol 2\n\nH"..., 680
          <unfinished ...> 
          [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9 
          [pid  2120] [    7f13a09c9f00] <... write resumed> ) =
          680 
          [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...> 
          [pid  2200] [    7f13a09c9844] fstat(9,  <unfinished
          ...> 
          [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0 
          [pid  2200] [    7f13a09c9844] <... fstat resumed>
          {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 
          [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...> 
          [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096,
          PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0
          <unfinished ...> 
          [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0 
          [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) =
          0x7f13a25a6000 
          [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...> 
          [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4
          <unfinished ...> 
           
           
          (working server - with bnx2 module unloaded) 
          [pid  1323] [    7fae577fbe40] close(7) = 0 
          [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj",
          O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...> 
          [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5
          <unfinished ...> 
          [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10 
          [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5 
          [pid  1323] [    7fae577fbf00] write(8,
          "\0\0\2\206\n\n\n\nPort 22\n\n\n\nProtocol 2\n\nH"..., 654
          <unfinished ...> 
          [pid  1631] [    7fae577fb844] fstat(10,  <unfinished
          ...> 
          [pid  1323] [    7fae577fbf00] <... write resumed> ) =
          654 
          [pid  1631] [    7fae577fb844] <... fstat resumed>
          {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 
          [pid  1323] [    7fae577fbe40] close(8) = 0 
          [pid  1631] [    7fae57804a2a] mmap(NULL, 4096,
          PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0
          <unfinished ...> 
          [pid  1323] [    7fae577fbe40] close(9 <unfinished ...> 
          [pid  1631] [    7fae57804a2a] <... mmap resumed> ) =
          0x7fae593d9000 
          [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0 
          [pid  1323] [    7fae577fbe40] close(5 <unfinished ...> 
          [pid  1631] [    7fae577fbf00] write(10, "0\n", 2
          <unfinished ...> 
           
          The two servers are *EXACT* duplicates of each other,
          completely fresh Debian installs, with exactly the same
          packages installed. 
           
          As you can see, the working server sends "0" into the oom_adj
          and the broken one sends "-17". 
           
           
          On 30/05/2011 15:25, Cal Leeming [Simplicity Media Ltd] wrote:
          
            
            To eliminate the possibility of this being hardware related,
            or ISO related, I performed two new fresh installs. One on
            the (previously working) server, and  
             
            another one (using the firmware modified ISO - from the
            debian 'unofficial' downloads page) into a VirtualBox
            machine. I then installed openssh-server on both  
             
            servers, rebooted, and checked the results of both. 
             
            Firmware modified ISO + openssh-server - OOM bug not
            present. 
            Non firmware modified ISO + openssh-server - OOM bug not
            present. 
             
            So, a fresh install on a VirtualBox machine (using the exact
            same install CD ISO as the broken server), and the old
            machine (using the official release),  
             
            shows that the bug is not happening..  The only thing that I
            can think is different, is that the broken server uses
            'firmware-bnx2'.. so I purged the bnx2  
             
            package, removed the bnx*.ko files from /lib/modules, ran
            update-initramfs, and then rebooted (i then confirmed it was
            removed by checking ifconfig and  
             
            lsmod). 
             
            And guess what.. IT FUCKING WORKED. 
             
            So, this problem seems to be caused by the firmware-bnx2
            module being loaded.. some how, that module is causing -17
            oom_adj to be set for everything..
            WTF!!!!!!!!!!!!!!!!!!!!!!! 
             
             
            On 30/05/2011 14:09, Cal Leeming [Simplicity Media Ltd]
            wrote:
            
              
              
              Please note, the way I am detecting if the oom_adj bug is
              present, is by checking if the sshd user processes are
              -17. If they are, then the bug is present. 
               
              ATTEMPT 1: 
               
              apt-get install hashalot cryptsetup make g++ cpp automake
              ncurses-dev bison flex curl build-essential kernel-package
              locales locales-all 
               - oom problem not found 
               
              (pre-built from previous install) 
              dpkg -i
              linux-headers-2.6.32.41-grsec_2.6.32.41-grsec-10.00.Custom_amd64.deb
linux-image-2.6.32.41-grsec_2.6.32.41-grsec-10.00.Custom_amd64.deb 
               - oom problem not found 
               
              (booted into new kernel) 
              shutdown -r now 
               - oom problem DETECTED!!! 
               
              (booted into old kernel) 
              dpkg --purge linux-image-2.6.32.41-grsec
              linux-headers-2.6.32.41-grsec 
              shutdown -r now 
               - oom problem DETECTED!!! Hmmmm.... 
               
              apt-get purge hashalot cryptsetup make g++ cpp automake
              ncurses-dev bison flex curl build-essential kernel-package
              locales locales-all 
              shutdown -r now 
               - oom problem STILL DETECTED!!! WTF!!!!!!!!!!!!!!! 
               
               
              So, it would appear that either the kernel install or the
              "apt-get" caused irreversible changes on the system... 
               
              REINSTALL ATTEMPT 2: 
              (this time, I'm going to use tripwire to detect filesystem
              changes between each reboot) 
               
              Installed openssh-server and configured tripwire. 
              Made test modification to /usr/sbin and ran tripwire, to
              ensure its logging correctly. 
               - oom problem not found 
               
              rebooted 
               - oom problem DETECTED!! 
               
              Purged openssh-server 
               - oom problem not found 
               
              Reinstalled openssh-server 
               - oom problem not found 
               
              rebooted 
               - oom problem DETECTED 
               
              So, we've now pinpointed it down to the openssh package...
              This makes sense, because the oom_adj is inherited from
              whatever binary the process forks from.. In  
               
              this case, the LXCs are started up from an ssh session,
              which is why it inherits the ssh -17 oom_adj. 
               
              Here is what tripwire is reporting has changed: 
               
              Modified: 
              "/usr/sbin" 
              "/usr/sbin/sshd" 
              "/usr/lib" 
              "/usr/lib/openssh" 
              "/usr/lib/openssh/sftp-server" 
              "/usr/lib/sftp-server" 
              "/var/run/acpid.pid" 
              "/var/run/acpid.socket" 
              "/var/run/atd.pid" 
              "/var/run/crond.pid" 
              "/var/run/crond.reboot" 
              "/var/run/exim4/exim.pid" 
              "/var/run/mdadm/monitor.pid" 
              "/var/run/portmap.pid" 
              "/var/run/portmap_mapping" 
              "/var/run/rpc.statd.pid" 
              "/var/run/rsyslogd.pid" 
              "/var/run/sm-notify.pid" 
              "/var/log/dmesg.0" 
              "/etc" 
              "/etc/adjtime" 
              "/etc/default" 
              "/etc/default/ssh" 
              "/etc/lvm/cache" 
              "/etc/lvm/cache/.cache" 
              "/etc/mtab" 
              "/etc/network/if-up.d" 
              "/etc/network/if-up.d/openssh-server" 
              "/etc/network/run" 
              "/etc/network/run/ifstate" 
              "/etc/pam.d" 
              "/etc/pam.d/sshd" 
              "/etc/passwd-" 
              "/etc/shadow-" 
              "/etc/ssh" 
              "/etc/ssh/ssh_host_dsa_key" 
              "/etc/ssh/ssh_host_dsa_key.pub" 
              "/etc/ssh/ssh_host_rsa_key" 
              "/etc/ssh/ssh_host_rsa_key.pub" 
              "/etc/ssh/sshd_config" 
              "/etc/tripwire" 
              "/etc/ufw" 
              "/etc/ufw/applications.d" 
              "/etc/ufw/applications.d/openssh-server" 
              "/etc/init.d" 
              "/etc/init.d/.depend.boot" 
              "/etc/init.d/.depend.start" 
              "/etc/init.d/.depend.stop" 
              "/etc/init.d/ssh" 
              "/etc/rc2.d" 
              "/etc/rc2.d/S18ssh" 
              "/etc/rc3.d" 
              "/etc/rc3.d/S18ssh" 
              "/etc/rc4.d" 
              "/etc/rc4.d/S18ssh" 
              "/etc/rc5.d" 
              "/etc/rc5.d/S18ssh" 
              "/etc/passwd" 
              "/etc/shadow" 
              "/root" 
              "/root/.nano_history" 
               
              Added: 
              "/var/log/dmesg.1.gz" 
              "/var/log/dmesg.2.gz" 
              "/root/.bash_history" 
               
              Removed: 
              "/etc/nologin" 
               
              I thought maybe the reason this is happening is because
              ssh (when being started from rc.d) is inheriting the -17
              from init.. but it appears init doesn't have  
               
              -17: 
               
              root@vicky:~# cat /proc/1/oom_adj 
              0 
               
              Then I thought maybe its because the ssh versions are
              diff, but they are not: 
               
              root@vicky:/home/foxx# dpkg -l | grep openssh-server 
              ii  openssh-server                     
              1:5.5p1-6                    secure shell (SSH) server,
              for secure access from remote machines 
               
               root@courtney.internal
              [/home/foxx] > dpkg -l | grep openssh-server 
              ii  openssh-server                     
              1:5.5p1-6                    secure shell (SSH) server,
              for secure access from remote machines 
               
              foxx@vicky:~$ md5sum /usr/sbin/sshd 
              f8c11462e8f2a7bf80e212e06041492b  /usr/sbin/sshd 
               
               root@courtney.internal
              [/home/foxx] > md5sum /usr/sbin/sshd 
              f8c11462e8f2a7bf80e212e06041492b  /usr/sbin/sshd 
               
              Then I made sure that the sshd_config's matched, and that
              I was using the same login process for both (shared key
              only)... they both matched, yet the problem  
               
              still happens.. 
               
              Then I thought maybe it's inheriting from the sshd server
              process, but turns out that isn't it either: 
               
              (broken server) 
              root      1583  0.0  0.0  49168  1140 ?        Ss  
              12:42   0:00 /usr/sbin/sshd 
              root@vicky:~# cat /proc/1583/oom_adj 
              -17 
               
              (working server) 
              root      2105  0.0  0.0  49184  1152 ?        Ss  
              00:47   0:00 /usr/sbin/sshd 
               root@courtney.internal
              [/home/foxx] > cat /proc/2105/oom_adj 
              -17 
               
              So, I looked through the process tree to see where it was
              inheriting from at what stage.. 
               
              (working server) 
               root@courtney.internal
              [/home/foxx] > ps faux | grep sshd 
              (-17) root      2105  0.0  0.0  49184  1152 ?        Ss  
              00:47   0:00 /usr/sbin/sshd 
              (0)   root      4735  0.0  0.0  76668  3356 ?        Ss  
              12:47   0:00  \_ sshd: foxx [priv] 
                    foxx      4740  0.0  0.0  76668  1644 ?        S   
              12:47   0:00      \_ sshd: foxx@pts/0 
                    root      4757  0.0  0.0 112344   876 pts/0    S+  
              12:48   0:00                          \_ grep sshd 
               
              (broken server) 
              foxx@vicky:~$ ps faux | grep sshd 
              (-17) root      1583  0.0  0.0  49168  1140 ?        Ss  
              12:42   0:00 /usr/sbin/sshd 
              (-17) root      1616  0.0  0.0  70488  3376 ?        Ss  
              12:43   0:00  \_ sshd: root@pts/0 
              (-17) root      1685  0.2  0.0  70488  3292 ?        Ss  
              12:50   0:00  \_ sshd: foxx [priv] 
                    foxx      1688  0.0  0.0  70488  1576 ?        S   
              12:50   0:00      \_ sshd: foxx@pts/1 
                    foxx      1715  0.0  0.0   7544   840 pts/1    S+  
              12:50   0:00              \_ grep sshd 
               
              As you can see, the line where it says "sshd: foxx [priv]"
              is causing the -17 oom_adj. Accoridng to the
              documentation, this appears to be where the privilege  
               
              seperation takes place.  
               
              So, now I started to check the ssh packages themselves,
              and make sure the repos are exactly the same on both
              servers. At this point, I realise that the  
               
              working server is slightly out of date on the following
              packages: 
               
               root@courtney.internal
              [/home/foxx] > md5sum /etc/apt/sources.list 
              00bcf3cf28e2994f9b512f0a8ffb0765  /etc/apt/sources.list 
               
              root@vicky:/etc# md5sum /etc/apt/sources.list 
              00bcf3cf28e2994f9b512f0a8ffb0765  /etc/apt/sources.list 
               
               root@courtney.internal
              [/home/foxx] > apt-get upgrade 
              The following packages will be upgraded: 
                bind9-host dnsutils exim4 exim4-base exim4-config
              exim4-daemon-light host isc-dhcp-client isc-dhcp-common
              libbind9-60 libdns69 libisc62 libisccc60 
                libisccfg62 liblwres60 linux-base
              linux-image-2.6.32-5-amd64 linux-libc-dev sd-agent 
               
              The one that springs immediately to my attention is
              linux-base.  
               
              (working server) 
               root@courtney.internal
              [/home/foxx] > dpkg -l | grep linux-base 
              ii  linux-base                         
              2.6.32-31                    Linux image base package 
               
              (broken server) 
              root@vicky:/etc# dpkg -l | grep linux-base 
              ii  linux-base                         
              2.6.32-34squeeze1            Linux image base package 
               
              Sooooo, I bite the bullet, and perform an upgrade of
              linux-base on the working server... 
               root@courtney.internal
              [/home/foxx] > apt-get install linux-base 
              Setting up linux-base (2.6.32-34squeeze1) ... 
               
              I then re-run the dryrun upgrade command this to make sure
              its upgraded: 
                bind9-host dnsutils exim4 exim4-base exim4-config
              exim4-daemon-light host isc-dhcp-client isc-dhcp-common
              libbind9-60 libdns69 libisc62 libisccc60 
                libisccfg62 liblwres60 linux-image-2.6.32-5-amd64
              linux-libc-dev sd-agent 
              (as you can see its disappeared from the list) 
               
              I then reboot the server.. and wait for the longest 3
              minutes of my life.. But guess what... it didn't break :/
              So, I bite another bullet, and upgrade the  
               
              remaining packages on the server: 
                bind9-host dnsutils exim4 exim4-base exim4-config
              exim4-daemon-light host isc-dhcp-client isc-dhcp-common
              libbind9-60 libdns69 libisc62 libisccc60 
                libisccfg62 liblwres60 linux-image-2.6.32-5-amd64
              linux-libc-dev sd-agent 
               
              I then make sure both servers are running the exact same
              stock kernel from Debian (as the working server was using
              an old kernel). 
               
              root@vicky:/etc# dpkg -l | grep linux | grep image 
              ii  linux-base                         
              2.6.32-34squeeze1            Linux image base package 
              ii  linux-image-2.6-amd64              
              2.6.32+29                    Linux 2.6 for 64-bit PCs
              (meta-package) 
              ii  linux-image-2.6.32-5-amd64         
              2.6.32-34squeeze1            Linux 2.6.32 for 64-bit PCs 
               
               root@courtney.internal
              [/home/foxx] > dpkg -l | grep linux | grep image 
              ii  linux-base                         
              2.6.32-34squeeze1            Linux image base package 
              ii  linux-image-2.6-amd64              
              2.6.32+29                    Linux 2.6 for 64-bit PCs
              (meta-package) 
              ii  linux-image-2.6.32-5-amd64         
              2.6.32-34squeeze1            Linux 2.6.32 for 64-bit PCs 
               
              root@vicky:/etc# uname -a 
              Linux vicky 2.6.32-5-amd64 #1 SMP Wed May 18 23:13:22 UTC
              2011 x86_64 GNU/Linux 
               
               root@courtney.internal
              [/home/foxx] > uname -a 
              Linux courtney.internal 2.6.32-5-amd64 #1 SMP Wed May 18
              23:13:22 UTC 2011 x86_64 GNU/Linux 
               
              After another long 3 minute wait, I test for the oom
              bug... and guess what.. it's STILL NOT DOING IT!!! :( 
               
              So now I check for differences in the /etc/pam.d... I
              notice pam_cap.so is missing in common-auth on the broken
              server: 
               
               root@courtney.internal
              [/etc/pam.d] > dpkg -l | grep cap 
              ii  libcap2                            
              1:2.19-3                     support for getting/setting
              POSIX.1e capabilities 
              ii  libcap2-bin                        
              1:2.19-3                     basic utility programs for
              using capabilities 
               
              Broken server: 
                â PAM profiles to
              enable:                                                                                                                          
              â 
               
              â                                                                                                                                                  
              â 
                â    [*] Unix
              authentication                                                                                                                       
              â 
               
              Working server: 
                â    [*] Unix
              authentication                                                                                                                       
              â 
                â    [*] Inheritable Capabilities
              Management                                                                                                       
              â 
               
              So, I install 'libcap2-bin' on the broken server, reboot..
              and still no god damn luck. At this point /etc/pam.d on
              both servers are matching (md5sum matches  
               
              up on all files).. So now, I decide to check all files
              relating to openssh-server.. again, all matches up fine. 
               
              Then I start to get really pissed off, and check the
              md5sum for all files in /etc: 
               
              (left is working server, right is broken) 
               root@courtney.internal
              [/etc/pam.d] > diff /tmp/etcmd5-courtney
              /tmp/etcmd5-vicky -y --suppress-common-lines 
              a81fbd39142e18d5ed1fb8a7b3ecce71 
              /etc/adjtime                |
              fa9192c6cdaab85ec952576ab3139fd1  /etc/adjtime 
              7fcee51274f69cdf5d4c8b7be799637b 
              /etc/apt/trustdb.gpg        |
              1319acca28ae6475a915ca0684d0cd62  /etc/apt/trustdb.gpg 
              a3710991fcce0b1574586450c81095e1 
              /etc/apt/trusted.gpg        |
              d802712c9255f13bea3bea87b83180b1  /etc/apt/trusted.gpg 
              366d165a9f5de024d3a21b9dc51de057 
              /etc/bash.bashrc            |
              5b3c3bc73d236e4e1b6f9b6c1ed5964e  /etc/bash.bashrc 
              109789e71a8cf8e21302bf44b5b716f7 
              /etc/blkid.tab              |
              aa0de4c3c85ae212f6c59b6b89b21b9a  /etc/blkid.tab 
              2de357d9da09d738c179e8d269696b9c 
              /etc/blkid.tab.old          |
              aa0de4c3c85ae212f6c59b6b89b21b9a  /etc/blkid.tab.old 
              22133b5bd4023d48b50863d2b1c7b25e 
              /etc/console-setup/cached.k |
              bdd92d16a8172f9e7ea3f866b59b4fc4 
              /etc/console-setup/cached.k 
              b88b0f0a4d3b4beec0cc4b09b6c2aaea 
              /etc/cron.daily/ntp         < 
              4e5aa59f38b520c5a45d3fdc7cdec46c 
              /etc/cron.daily/sysstat     |
              d1e8b20b26a33a6a0781d59bc312442e  /etc/cron.daily/tripwire 
              455c3c071b6daabb4e4490828975034c 
              /etc/cron.d/sysstat         < 
              1cffe509bba095a0f7ece99a971e6e9a 
              /etc/crypttab               < 
              756141f7eacf1a272a2f6e51646b3aa4 
              /etc/default/cryptdisks     < 
              6bba39eb6c39aef755f1fadb48ded5a5 
              /etc/default/lxc            < 
              cd7a62fbb18fa8fe5893dee93064b328 
              /etc/default/ntp            < 
              e0d7efac23e911c65f44b08de446e837 
              /etc/default/rsync          < 
              21614b7a3d91ee06750feedbfdaec898 
              /etc/default/sysstat        < 
              fbc234ecd0f7e8bc1c394bbde5867be1 
              /etc/dhcp/dhclient-exit-hoo < 
              1a2b9d0a869e2aa885ae3621c557fb95 
              /etc/dpkg/shlibs.default    < 
              84b1e69080569cc5c613a50887af5200 
              /etc/dpkg/shlibs.override   < 
              297521889d690871ec9d89c5eeff745a 
              /etc/emacs/site-start.d/50a < 
              c10e4cfb6fb21c281f04105d9482736a 
              /etc/exim4/update-exim4.con |
              2109f7e59c5d7ab2431aad0f095e2e34 
              /etc/exim4/update-exim4.con 
              0bec0044c716f14083f72c42af543d16 
              /etc/fstab                  |
              ab91b08889eb01fa6cd2364eba136ae8  /etc/fstab 
              c9457cf5b2196da67d5ac816d1c86a4f 
              /etc/fuse.conf              < 
              e030dc891a3af1f7779429b5c0554c98 
              /etc/gdb/gdbinit            < 
              5d151dd5c443ed7b2a5ded95740bf00d 
              /etc/glusterd/glusterd.info < 
              f8ab4b0d63d43e8385e8e0a7b0b0fdba 
              /etc/group                  |
              6c8ccd77ad88953d80a0c8230feb43b0  /etc/group 
              b2eb12a6eb86aa16730fcc78f3856f99 
              /etc/group-                 |
              f8366cb252dab0be81d315dbb0bfd54d  /etc/group- 
              1721fcab19363ee66aa66829b9876f0e 
              /etc/gshadow                |
              9a53c91ec405b6c2589b257d1f610e11  /etc/gshadow 
              b43f2208e196045e2e4eff32a32a43cb 
              /etc/gshadow-               |
              5db46dc414e73d989833f1718646ec40  /etc/gshadow- 
              289108902ba56c6d3d10392b994f5063 
              /etc/hostname               |
              be7724203e323a7c97fe531e3662521c  /etc/hostname 
              f6c39850a5646ce96a62f8bbfadcab12 
              /etc/hosts                  |
              e7358f34e94f27ce975c2beb64a5fd31  /etc/hosts 
              e9d8dadacde9e17f0a9b19951109bd15 
              /etc/init.d/cryptdisks      |
              94c8893c0233f51b5b35c44afcc9d064  /etc/init.d/.depend.boot 
              a544f8db0b5b71722ebf28cd29d5c99f 
              /etc/init.d/cryptdisks-earl |
              018bfdbf3ce7d5000d4771861558084c 
              /etc/init.d/.depend.start 
              47f49c3084a87495a4b21b16d62f08ce 
              /etc/init.d/.depend.boot    |
              efcaecb9b1dbf7ee6999ce6d7fe6cbce  /etc/init.d/.depend.stop 
              c32f1bac3bf2ead96ef4328f8fa8b6a4 
              /etc/init.d/.depend.start   < 
              7a5306deeb6f58cf9fddc44176e944b2 
              /etc/init.d/.depend.stop    < 
              0a911b5d7bdf62f4a27f59544859a25c 
              /etc/init.d/enc             < 
              d0b8cce6d932e1cd90812ce32c3f81a4 
              /etc/init.d/fuse            < 
              88def01f8a173be24e1347d215d713f1 
              /etc/init.d/github01        < 
              a8e1b7caac5f373ae3ee68cfc6703c4c 
              /etc/init.d/lxc             < 
              1593209e2edaef7930940759b07caee1 
              /etc/init.d/ntp             < 
              9d74671cca3077de30a6cbed26d4cd0e 
              /etc/init.d/rsync           < 
              7702ad8bd63cbe13b8bb455199435191 
              /etc/init.d/screen-cleanup  < 
              ee350831ec30475b16b8bda31a3f24de 
              /etc/init.d/sd-agent        < 
              8fb5289db2c7f67aa9347ae7e8b445dd 
              /etc/init.d/sws01           < 
              e3cf21d607c6852e2e5013524c657c6e 
              /etc/init.d/sysstat         < 
              bc93dd93f82749814f2bde70d9428c0d 
              /etc/initramfs-tools/conf.d < 
              98fc27159733746de9890831d51d95d4 
              /etc/iptables.up.rules      < 
              02e24efb09d5343d249780465b59bfd6 
              /etc/ld.so.cache            |
              ac8d6701fefa22e490c2c956a047f224  /etc/ld.so.cache 
              3459aad5fab462e1c126997b4ac449bb 
              /etc/logcheck/ignore.d.serv < 
              32d3f9199b3438cd41ed3cb1122135b7 
              /etc/lvm/backup/default     |
              c0adb70d988e7ad895b42aa23644dce0  /etc/lvm/backup/mech 
              a5ab01460cb3dba4daedd86002bdba67 
              /etc/lvm/cache/.cache       |
              4436f6d9c98cb918c4757f48446ddefc  /etc/lvm/backup/ssd 
                                                                           
              > a96425623ae854ba979f7fd8b901bd21 
              /etc/lvm/cache/.cache 
              f51730e7056489397b8b2a9a9b99662c 
              /etc/mailname               |
              1cc22bdbd9f0dd61c3dbdf484f5e1954  /etc/mailname 
              2b0e1a3f52757145d65c353ca49d7756 
              /etc/mdadm/mdadm.conf       |
              75ddc14817703ef468326041acd8bfb1  /etc/mdadm/mdadm.conf 
              2ed9e1f60da7f3f87faca9879d0a6531 
              /etc/mtab                   |
              d6eb5404b14026e3cb6377fdbfa36a07  /etc/mtab 
              0a38058aafd42b7b4105505379194e1b 
              /etc/nanorc                 |
              fc57b93c907fefbccf12317d40b4a204  /etc/nanorc 
              d8e3272886cfc3ca3e103840f17c28b3 
              /etc/network/interfaces     |
              11aed0887d5bd14a426e2a4f7c7d4f4a  /etc/network/interfaces 
              0925a154d88cf0bc2d0f588ab7c670c9 
              /etc/network/run/ifstate    |
              40a731c20283789c9916b1978b8b56b8  /etc/network/run/ifstate 
              1e47bfedf84da1cdfaa45ee016fb3609 
              /etc/networks               |
              5293c479ab40a68b4fa7a6eeff755256  /etc/networks 
              3e250ecaf470e1d3a2b68edd5de46bfd 
              /etc/ntp.conf               < 
              a3bf39d554578558648717be72405bb4 
              /etc/passwd                 |
              ec13c66df3dee36481a8c3d432e54d8f  /etc/passwd 
              c495660bf88840c35e5c3ede628e5e5d 
              /etc/passwd-                |
              ec13c66df3dee36481a8c3d432e54d8f  /etc/passwd- 
              b08c4faf56551a861d7ae6858ac52b2e 
              /etc/profile                |
              b94c2e3df2a779ac12080942df4d86ea  /etc/profile 
              fe0b86955e4eb444f17f54d086580b1f 
              /etc/resolv.conf            |
              2bc8c1c0361ac0fae5581bcaf8d7f136  /etc/resolv.conf 
              c22ef5f592ae97d3152e1d58657e2c8a 
              /etc/rssh.conf              < 
              f4aa40956bb6f150815b4d60a505760c 
              /etc/screenrc               < 
              78b737784042d973d6bed47e7667b1bb 
              /etc/sd-agent/config.cfg    < 
              4eccd6267f438812bfa1d4eb8ac05217 
              /etc/shadow                 |
              e2f45652caa1cbb84c778adc75f7545b  /etc/shadow 
              676a49b9dbe67ce8be7a2921f7e10570 
              /etc/shadow-                |
              e2f45652caa1cbb84c778adc75f7545b  /etc/shadow- 
              3c1144bd2727cf00af012021fa3de4c5 
              /etc/shells                 |
              0e85c87e09d716ecb03624ccff511760  /etc/shells 
              9fa92b39192a027af603fbff3d2f42eb 
              /etc/siege/siegerc          < 
              fb778297a8e612868e41225cf4db7c9d 
              /etc/siege/urls.txt         < 
              813856cf9d8c29095b3a4e19d92d3da0 
              /etc/ssh/ssh_host_dsa_key   |
              3f4beaeb582ce81b42cca475e65dc75a 
              /etc/ssh/ssh_host_dsa_key 
              75d221c8d4abe42699ff813e5a1e8cc7 
              /etc/ssh/ssh_host_dsa_key.p |
              55e34345f7a2e1ac5ec7ce78543487e7 
              /etc/ssh/ssh_host_dsa_key.p 
              b85a52219856a7ecf95d625a1bee5068 
              /etc/ssh/ssh_host_rsa_key   |
              70af0ef16b661edd96c317058ef55a78 
              /etc/ssh/ssh_host_rsa_key 
              3aea4190a19facc76222e69c5700f5ac 
              /etc/ssh/ssh_host_rsa_key.p |
              b8524bc48e4d5c71c69888b452e8d6ae 
              /etc/ssh/ssh_host_rsa_key.p 
              16e9567a6298125264967d276e6a139f 
              /etc/sudoers                |
              c5dab0f2771411ed7e67d6dab60a311f  /etc/sudoers 
              a86605ae7354f25d8060bcb5ad83edf7 
              /etc/sysctl.conf            |
              2c6f89fdb09aeac5735144497a261782  /etc/sysctl.conf 
              e52dbe02e5da26d9be965373676e9355 
              /etc/sysstat/sysstat        < 
              fa92b01baa2130e26822c30fb27ac56e 
              /etc/sysstat/sysstat.ioconf < 
                                                                           
              > d87271b624ab8e93c2e51cd59bade631 
              /etc/tripwire/site.key 
                                                                           
              > 8f6ebb12f511b46fbf0203d978c3bf01 
              /etc/tripwire/tw.cfg 
                                                                           
              > 1821c7a0d207a168f1d7c766f238e816 
              /etc/tripwire/twcfg.txt 
                                                                           
              > 717b4afa4f0f8614f3947441a3ddf422 
              /etc/tripwire/tw.pol 
                                                                           
              > 92c9b38e95c90eebf1d746633a81909c 
              /etc/tripwire/tw.pol.bak 
                                                                           
              > d08d31fa833b50d2fb9a37f97b07cbd0 
              /etc/tripwire/twpol.txt 
                                                                           
              > fdbfa3e0879f0d959bbdfd5601ef4d4f 
              /etc/tripwire/vicky-local.k 
              aeb6fe5dcfc873b0632ba92345ed16e2 
              /etc/udev/rules.d/70-persis |
              0fdf03b558e118edcf8ce29abaf296f1 
              /etc/udev/rules.d/70-persis 
              24cc33b9f96e3189b7e34cf5484cb99f 
              /etc/udev/rules.d/70-persis |
              4a49e7ddeacbb3ded8bb3968f219362c 
              /etc/udev/rules.d/70-persis 
               
              I patched up the /etc/init.d/.depend.* files so the 'ssh'
              entries are matching.. still. no. luck 
               
              It's now 24 hours of trying to fix this issue, and I'm
              getting extremely pissed off :@ 
               
              If anyone could offer up some advice, it would be VERY
              much appreciated. 
               
              On 30/05/2011 01:24, Cal Leeming [Simplicity Media Ltd]
              wrote:
              
                
                Another quick update on this... 
                 
                Something really really really strange has happened this
                time.. I just did a completely fresh re-install (exactly
                like last time) and the problem went away. 
                 
                So, I'm going to build up the lxc / kernel support step
                by step (and check for the oom_adj problem after each
                change) and will report back once I've found the step it
                kicks in at. 
                 
                Hopefully, my hours of frustration will prevent someone
                else from going through the same thing :/ 
                 
                Cal 
                 
                On 29/05/2011 23:41, Cal Leeming [Simplicity Media Ltd]
                wrote:
                
                  
                  After submitting this problem to Kernel mailing list,
                  I found a strange (and even more curious problem). 
                   
                  I attempted to run both kernels on two seperate
                  installs of Debian (both 6.0), but slightly different
                  in packages.. One install did not have the OOM
                  problem, yet the other one did.. And they both ran the
                  same kernel, the same LXC container etc.. 
                   
                  I have a feeling this may be a Debian bug..? Further
                  to back up these claims: 
                   
                  Working server: 
                   root@courtney.internal
                  [/mnt/encstore/lxc/github01.rootfs] > for x in
                  `find /proc -iname 'oom_adj' | xargs grep "\-17"  |
                  awk -F '/' '{print $3}'` ; do ps -p $x u --no-headers
                  ; done 
                  grep: /proc/3090/task/3090/oom_adj: No such file or
                  directory 
                  grep: /proc/3090/oom_adj: No such file or directory 
                  root       804  0.0  0.0  16992  1112 ?        S<s 
                  Mar17   0:00 udevd --daemon 
                  root       804  0.0  0.0  16992  1112 ?        S<s 
                  Mar17   0:00 udevd --daemon 
                  root     25536  0.0  0.0  49164  1100 ?        Ss  
                  Apr05   0:00 /usr/sbin/sshd 
                  root     25536  0.0  0.0  49164  1100 ?        Ss  
                  Apr05   0:00 /usr/sbin/sshd 
                  root     32173  0.0  0.0  16988   960 ?        S<  
                  21:53   0:00 udevd --daemon 
                  root     32173  0.0  0.0  16988   960 ?        S<  
                  21:53   0:00 udevd --daemon 
                  root     32174  0.0  0.0  16988   916 ?        S<  
                  21:53   0:00 udevd --daemon 
                  root     32174  0.0  0.0  16988   916 ?        S<  
                  21:53   0:00 udevd --daemon 
                   
                  Broken server: 
                   root@vicky [/var] > for x in `find /proc -iname
                  'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print
                  $3}'` ; do ps -p $x u --no-headers ; done 
                  grep: /proc/4597/task/4597/oom_adj: No such file or
                  directory 
                  grep: /proc/4597/oom_adj: No such file or directory 
                  root       889  0.0  0.0  16788   860 ?        S<s 
                  09:10   0:00 udevd --daemon 
                  root       889  0.0  0.0  16788   860 ?        S<s 
                  09:10   0:00 udevd --daemon 
                  root      1104  0.0  0.0  17228  1200 ?        S<  
                  09:10   0:00 udevd --daemon 
                  root      1104  0.0  0.0  17228  1200 ?        S<  
                  09:10   0:00 udevd --daemon 
                  root      1141  0.0  0.0  17120  1128 ?        S<  
                  09:10   0:00 udevd --daemon 
                  root      1141  0.0  0.0  17120  1128 ?        S<  
                  09:10   0:00 udevd --daemon 
                  root      2745  0.0  0.0  49168  1168 ?        Ss  
                  09:10   0:00 /usr/sbin/sshd 
                  root      2745  0.0  0.0  49168  1168 ?        Ss  
                  09:10   0:00 /usr/sbin/sshd 
                  root      2750  0.0  0.0  76672  3360 ?        Ss  
                  09:11   0:00 sshd: foxx [priv] 
                  root      2750  0.0  0.0  76672  3360 ?        Ss  
                  09:11   0:00 sshd: foxx [priv] 
                  foxx      2754  0.0  0.0  76672  1636 ?        S   
                  09:11   0:00 sshd: foxx@pts/0 
                  foxx      2754  0.0  0.0  76672  1636 ?        S   
                  09:11   0:00 sshd: foxx@pts/0 
                  foxx      2755  0.0  0.0 123984  1984 pts/0    Ss  
                  09:11   0:00 -bash 
                  foxx      2755  0.0  0.0 123984  1984 pts/0    Ss  
                  09:11   0:00 -bash 
                  root      2760  0.0  0.0 127176  1244 pts/0    S   
                  09:11   0:00 sudo su 
                  root      2760  0.0  0.0 127176  1244 pts/0    S   
                  09:11   0:00 sudo su 
                  root      2761  0.0  0.0 147820  1328 pts/0    S   
                  09:11   0:00 su 
                  root      2761  0.0  0.0 147820  1328 pts/0    S   
                  09:11   0:00 su 
                  root      2762  0.0  0.0 124084  2152 pts/0    S   
                  09:11   0:00 bash 
                  root      2762  0.0  0.0 124084  2152 pts/0    S   
                  09:11   0:00 bash 
                   
                  Notice how the sshd user instances on the broken
                  server are set to -17, yet the sshd user instances on
                  the working server, are not :S 
                   
                  Any input from anyone on the below would be VERY much
                  appreciated. 
                   
                  Cal 
                   
                  -------- Original Message --------
                  
                   
                   
                  Some further logs:
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369927] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.369939]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399285] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428690] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.428702]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487696] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.487708]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517023] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.517035]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546379] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:38 vicky kernel: [ 2283.546391]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310789] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.310804]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369918] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.369930]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399284] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.399296]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433634] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.433648]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463947] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.463959]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493439] redis-server 
invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
./log/syslog:May 30 07:44:43 vicky kernel: [ 2288.493451]  
[<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
On 29/05/2011 22:50, Cal Leeming [Simplicity Media Ltd] wrote:
>  First of all, my apologies if I have submitted this problem to the 
> wrong place, spent 20 minutes trying to figure out where it needs to 
> be sent, and was still none the wiser.
>
> The problem is related to applying memory limitations within a cgroup. 
> If the OOM killer kicks in, it gets stuck in a loop where it tries to 
> kill a process which has an oom_adj of -17. This causes an infinite 
> loop, which in turn locks up the system.
>
> May 30 03:13:08 vicky kernel: [ 1578.117055] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117154] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117248] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117343] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
> May 30 03:13:08 vicky kernel: [ 1578.117441] Memory cgroup out of 
> memory: kill process 6016 (java) score 0 or a child
>
>
>  root@vicky [/home/foxx] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
> (this happens on both the grsec patched and non patched 2.6.32.41 kernel)
>
> When this is encountered, the memory usage across the whole server is 
> still within limits (not even hitting swap).
>
> The memory configuration for the cgroup/lxc is:
> lxc.cgroup.memory.limit_in_bytes = 3000M
> lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
>
> Now, what is even more strange, is that when running under the 
> 2.6.32.28 kernel (both patched and unpatched), this problem doesn't 
> happen. However, there is a slight difference between the two kernels. 
> The 2.6.32.28 kernel gives a default of 0 in the /proc/X/oom_adj, 
> where as the 2.6.32.41 gives a default of -17. I suspect this is the 
> root cause of why it's showing in the later kernel, but not the earlier.
>
> To test this theory, I started up the lxc on both servers, and then 
> ran a one liner which showed me all the processes with an oom_adj of -17:
>
> (the below is the older/working kernel)
> root@courtney.internal [/mnt/encstore/lxc] > uname -a
> Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb 18 16:09:07 GMT 
> 2011 x86_64 GNU/Linux
> root@courtney.internal [/mnt/encstore/lxc] > for x in `find /proc 
> -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print $3}'` ; do 
> ps -p $x --no-headers ; done
> grep: /proc/1411/task/1411/oom_adj: No such file or directory
> grep: /proc/1411/oom_adj: No such file or directory
>   804 ?        00:00:00 udevd
>   804 ?        00:00:00 udevd
> 25536 ?        00:00:00 sshd
> 25536 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 31861 ?        00:00:00 sshd
> 32173 ?        00:00:00 udevd
> 32173 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
> 32174 ?        00:00:00 udevd
>
> (the below is the newer/broken kernel)
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > uname -a
> Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43 BST 2011 x86_64 
> GNU/Linux
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > for x in 
> `find /proc -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/' '{print 
> $3}'` ; do ps -p $x --no-headers ; done
> grep: /proc/3118/task/3118/oom_adj: No such file or directory
> grep: /proc/3118/oom_adj: No such file or directory
>   895 ?        00:00:00 udevd
>   895 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1091 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  1092 ?        00:00:00 udevd
>  2596 ?        00:00:00 sshd
>  2596 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2608 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2613 ?        00:00:00 sshd
>  2614 pts/0    00:00:00 bash
>  2614 pts/0    00:00:00 bash
>  2620 pts/0    00:00:00 sudo
>  2620 pts/0    00:00:00 sudo
>  2621 pts/0    00:00:00 su
>  2621 pts/0    00:00:00 su
>  2622 pts/0    00:00:00 bash
>  2622 pts/0    00:00:00 bash
>  2685 ?        00:00:00 lxc-start
>  2685 ?        00:00:00 lxc-start
>  2699 ?        00:00:00 init
>  2699 ?        00:00:00 init
>  2939 ?        00:00:00 rc
>  2939 ?        00:00:00 rc
>  2942 ?        00:00:00 startpar
>  2942 ?        00:00:00 startpar
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2964 ?        00:00:00 rsyslogd
>  2980 ?        00:00:00 startpar
>  2980 ?        00:00:00 startpar
>  2981 ?        00:00:00 ctlscript.sh
>  2981 ?        00:00:00 ctlscript.sh
>  3016 ?        00:00:00 cron
>  3016 ?        00:00:00 cron
>  3025 ?        00:00:00 mysqld_safe
>  3025 ?        00:00:00 mysqld_safe
>  3032 ?        00:00:00 sshd
>  3032 ?        00:00:00 sshd
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3097 ?        00:00:00 mysqld.bin
>  3113 ?        00:00:00 ctl.sh
>  3113 ?        00:00:00 ctl.sh
>  3115 ?        00:00:00 sleep
>  3115 ?        00:00:00 sleep
>  3116 ?        00:00:00 .memcached.bin
>  3116 ?        00:00:00 .memcached.bin
>
>
> As you can see, it is clear that the newer kernel is setting -17 by 
> default, which in turn is causing the OOM killer loop.
>
> So I began to try and find what may have caused this problem by 
> comparing the two sources...
>
> I checked the code for all references to 'oom_adj' and 'oom_adjust' in 
> both code sets, but found no obvious differences:
> grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e oom_adjust -e 
> oom_adj
>
> Then I checked for references to "-17" in all .c and .h files, and 
> found a couple of matches, but only one obvious one:
> grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" | wc -l
> ./include/linux/oom.h:#define OOM_DISABLE (-17)
>
> But again, a search for OOM_DISABLE came up with nothing obvious...
>
> In a last ditch attempt, I did a search for all references to 'oom' 
> (case-insensitive) in both code bases, then compared the two:
>  root@annabelle [~/lol/linux-2.6.32.28] > grep -i -R "oom" . | sort -n 
> > /tmp/annabelle.oom_adj
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > grep -i -R 
> "oom" . | sort -n > /tmp/vicky.oom_adj
>
> and this brought back (yet again) nothing obvious..
>
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./include/linux/oom.h
> 2a32622f6cd38299fc2801d10a9a3ea8  ./include/linux/oom.h
>
>  root@vicky [/mnt/encstore/ssd/kernel/linux-2.6.32.41] > md5sum 
> ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>  root@annabelle [~/lol/linux-2.6.32.28] > md5sum ./mm/oom_kill.c
> 1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
>
>
>
> Could anyone please shed some light as to why the default oom_adj is 
> set to -17 now (and where it is actually set)? From what I can tell, 
> the fix for this issue will either be:
>
>   1. Allow OOM killer to override the decision of ignoring oom_adj ==
>      -17 if an unrecoverable loop is encountered.
>   2. Change the default back to 0.
>
> Again, my apologies if this bug report is slightly unorthodox, or 
> doesn't follow usual procedure etc. I can assure you I have tried my 
> absolute best to give all the necessary information though.
>
> Cal
>
                 
                 
               
               
             
             
           
           
         
         
       
       
     
     
  
 |