Bug#345260: Apache2-mpm-worker 2.0.55-3 crashing box, memory leak?

To: submit@bugs.debian.org
Subject: Bug#345260: Apache2-mpm-worker 2.0.55-3 crashing box, memory leak?
From: Artur Makówka <juice@ursynow.2a.pl>
Date: Fri, 30 Dec 2005 00:27:08 +0100
Message-id: <[🔎] 43B470CC.3080607@ursynow.2a.pl>
Reply-to: Artur Makówka <juice@ursynow.2a.pl>, 345260@bugs.debian.org

Package: apache2-mpm-worker
Version: 2.0.55-3

We're running Apache 2.0.55 (distro testing package ver 2.0.55-3) onLinux 2.6.14.3. We use worker MPM, mod_actions, mod_rewrite, mod_layoutand fastcgi running php. Box serves about 40-50 requests per second indurning day. Once in a while one of apache2 processes rapidly starts toconsume lot of memory, freezing the system and crashing it, without anychance to stop it or trace it. We wasn't able to find out what iscausing this, error.log doesn't say anything useful and because ofsudden apache2 process behaviour tracing is also very hard.

Below is attached 'top' screen right before the actual crash, as well assome syslog output before another server reboot.


----------

 top - 21:15:21 up  1:29,  2 users,  load average: 17.18, 9.77, 6.55
 Tasks: 116 total,2 running, 114 sleeping,0 stopped,0 zombie

Cpu(s): 1.3% us, 5.3% sy, 0.0% ni, 0.0% id, 90.7% wa, 1.3% hi,1.3% si

 Mem:905776k total,898128k used,7648k free,404k buffers
 Swap:0k total,0k used,0k free,18236k cached

 PID USERPR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 130 root150000 S  3.0  0.00:00.13 pdflush
 6485 www-data  210  881m 600m 1100 S  1.3 67.90:04.83 apache2
 9959 www-data  160  284m 8404 1124 S  1.0  0.90:00.57 apache2
 1483 root170  2620  544  412 D  0.3  0.10:01.05 syslogd
 4789 www-data  250  293m  16m 1136 S  0.3  1.90:06.94 apache2
 7270 root160  2196 1100  820 R  0.3  0.10:01.30 top
 9790 www-data  170 17500 5552 1272 D  0.3  0.60:00.43 php
 10015 www-data  160  284m 8448 1096 S  0.3  0.90:00.59 apache2

----------

Dec 28 03:13:35 werewolf kernel: Out of Memory: Killed process 1080(apache2).Dec 28 03:13:35 werewolf kernel: Out of Memory: Killed process 25021(apache2

).

Dec 28 03:13:55 werewolf kernel: Out of Memory: Killed process 25084(apache2).Dec 28 03:13:55 werewolf kernel: Out of Memory: Killed process 25116(apache2).Dec 28 03:14:28 werewolf kernel: Out of Memory: Killed process 25087(apache2).Dec 28 03:14:29 werewolf kernel: Out of Memory: Killed process 25164(apache2).


----------

apache2.conf (parts):

ServerLimit 12

<IfModule worker.c>
StartServers         3
MaxClients         384
MinSpareThreads     20
MaxSpareThreads     50
ThreadsPerChild     32
MaxRequestsPerChild  10000
</IfModule>

AddOutputFilterByType LAYOUT text/html application/x-httpd-php
LayoutEndTag </BODY>
LayoutFooter /some_document.html

FastCgiServer /usr/lib/cgi-bin/php4
FastCgiConfig -idle-timeout 15
AddHandler php-fastcgi .php .php3 .php4
<Location /cgi-bin/php>
    SetHandler fastcgi-script
</Location>
Action php-fastcgi /cgi-bin/php
AddType application/x-httpd-php .php .php3 .php4

I've added this two lines:

ulimit -S -m 25600
ulimit -H -m 25600

on top of apache2ctl script (which actually only calls"/usr/sbin/apache2 -k start"), without any result. Top still showsapache2 processes with resident size >= 30Mb. It already crashed ourserver two times, here's top output in last

4 seconds:

 top - 22:03:16 up 1 day,  1:33,  2 users,  load average: 8.26, 6.90, 4.71
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 23439 www-data  19   0  372m  94m 1820 S 28.3 10.7   0:08.90 apache2

 top - 22:03:18 up 1 day,  1:33,  2 users,  load average: 9.76, 7.23, 4.83
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 23439 www-data  19   0  551m 273m 1820 S 53.1 30.9   0:09.97 apache2
  132 root      15   0     0    0    0 D  9.9  0.0   0:27.16 kswapd0

This apache process was spawned after 22:03:14, then it consumed 273Mbof memory in about 4 seconds, and finally frozen our box (probablytaking rest of memory).

In this situation you understand it's hard to deliver any debuginformation. I'm suprised that ulimit doesn't work with apache, maybe weuse it the wrong way?

We have some info that may be useful though. From time to time appearapache processes consuming a lot of memory. They grow in time, mostlyexiting normally, so I'm not sure they're related to this bug. But Iguess that's better than nothing. Below is a trace of one of this kindof processes.


 # ls /proc/9200/task
 9200  9211  9214  9215  9224  9233

 # strace -p 9200
 Process 9200 attached - interrupt to quit
 read(4, 0xbfc73f57, 1)                  = ? ERESTARTSYS (To be restarted)
 --- SIGTERM (Terminated) @ 0 (0) ---

sigreturn() = ? (mask now HUP INT QUITUSR1 ALRM

STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR?)
 futex(0xb7024bf8, FUTEX_WAIT, 9202, NULL) = 0
 futex(0xb6824bf8, FUTEX_WAIT, 9203, NULL) = 0
 futex(0xb5824bf8, FUTEX_WAIT, 9205, NULL) = 0
 munmap(0xb7025000, 8388608)             = 0
 futex(0xb5024bf8, FUTEX_WAIT, 9206, NULL) = 0
 munmap(0xa6825000, 8388608)             = 0
 munmap(0xb6825000, 8388608)             = 0
 munmap(0xb6025000, 8388608)             = 0
 munmap(0xb5825000, 8388608)             = 0
 munmap(0xb5025000, 8388608)             = 0
 futex(0xb2824bf8, FUTEX_WAIT, 9211, NULL) = 0
 munmap(0xb4825000, 8388608)             = 0
 munmap(0xb4025000, 8388608)             = 0
 munmap(0xb3825000, 8388608)             = 0
 munmap(0xb3025000, 8388608)             = 0

 == more than 100 entries ==

 munmap(0xa562d000, 430080)              = 0
 munmap(0xa5a07000, 360448)              = 0
 munmap(0xa5d5e000, 315392)              = 0
 munmap(0xa604d000, 274432)              = 0
 munmap(0xa63dc000, 221184)              = 0
 munmap(0xa6547000, 180224)              = 0
 munmap(0xa6724000, 139264)              = 0
 brk(0x894d000)                          = 0x894d000
 brk(0x889e000)                          = 0x889e000
 brk(0x87ef000)                          = 0x87ef000
 brk(0x87e5000)                          = 0x87e5000
 brk(0x84c2000)                          = 0x84c2000
 brk(0x84c0000)                          = 0x84c0000
 brk(0x828a000)                          = 0x828a000
 brk(0x8286000)                          = 0x8286000
 brk(0x81a3000)                          = 0x81a3000
 brk(0x819b000)                          = 0x819b000
 write(7, "www.example.com 84."..., 1018) = 1018
 write(8, "111.111.111.111 - - [28/Dec/2005:"..., 3930) = 3930
 exit_group(0)                           = ?
 Process 9200 detached

 # strace -p 9211
 Process 9211 attached - interrupt to quit
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [1982387], 1012962)    = 13140
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [1995527], 999822)     = 13140

 == about 70 entries more ==

 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [2967887], 27462)      = 13140
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [2981027], 14322)      = 13140
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [2994167], 1182)       = 1182
 gettimeofday({1135804066, 798228}, NULL) = 0
 gettimeofday({1135804066, 798456}, NULL) = 0

times({tms_utime=82, tms_stime=79, tms_cutime=0, tms_cstime=0}) =1718650698

 gettimeofday({1135804066, 798886}, NULL) = 0
 shutdown(24, 1 /* send */)              = 0
 poll({fd=24, events=POLLIN, revents=POLLIN?, 1, 2000) = 1
 read(24, "", 512)                       = 0
 close(24)                               = 0
 close(25)                               = 0
 gettimeofday({1135804068, 189043}, NULL) = 0
 gettimeofday({1135804068, 189074}, NULL) = 0
 _exit(0)                                = ?
 Process 9211 detached

This poll/sendfile pattern is similar to all of them.

Please help us. :) We'll be happy with any sane workaround.

Debian testing is up-to-date with every package updated from distro testing.

Reply to:

Prev by Date: Bug#345189: apache2-common: Starting Apache2 fails with undefined symbol in cgi module
Next by Date: Processing of apr1.0_1.2.2-1_i386.changes
Previous by thread: Bug#345189: apache2-common: Starting Apache2 fails with undefined symbol in cgi module
Next by thread: Processing of apr1.0_1.2.2-1_i386.changes
Index(es):
- Date
- Thread