[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#345260: Apache2-mpm-worker 2.0.55-3 crashing box, memory leak?



Package: apache2-mpm-worker
Version: 2.0.55-3

We're running Apache 2.0.55 (distro testing package ver 2.0.55-3) on Linux 2.6.14.3. We use worker MPM, mod_actions, mod_rewrite, mod_layout and fastcgi running php. Box serves about 40-50 requests per second in durning day. Once in a while one of apache2 processes rapidly starts to consume lot of memory, freezing the system and crashing it, without any chance to stop it or trace it. We wasn't able to find out what is causing this, error.log doesn't say anything useful and because of sudden apache2 process behaviour tracing is also very hard.

Below is attached 'top' screen right before the actual crash, as well as some syslog output before another server reboot.

----------

 top - 21:15:21 up  1:29,  2 users,  load average: 17.18, 9.77, 6.55
 Tasks: 116 total,2 running, 114 sleeping,0 stopped,0 zombie
Cpu(s): 1.3% us, 5.3% sy, 0.0% ni, 0.0% id, 90.7% wa, 1.3% hi, 1.3% si
 Mem:905776k total,898128k used,7648k free,404k buffers
 Swap:0k total,0k used,0k free,18236k cached

 PID USERPR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 130 root150000 S  3.0  0.00:00.13 pdflush
 6485 www-data  210  881m 600m 1100 S  1.3 67.90:04.83 apache2
 9959 www-data  160  284m 8404 1124 S  1.0  0.90:00.57 apache2
 1483 root170  2620  544  412 D  0.3  0.10:01.05 syslogd
 4789 www-data  250  293m  16m 1136 S  0.3  1.90:06.94 apache2
 7270 root160  2196 1100  820 R  0.3  0.10:01.30 top
 9790 www-data  170 17500 5552 1272 D  0.3  0.60:00.43 php
 10015 www-data  160  284m 8448 1096 S  0.3  0.90:00.59 apache2

----------

Dec 28 03:13:35 werewolf kernel: Out of Memory: Killed process 1080 (apache2). Dec 28 03:13:35 werewolf kernel: Out of Memory: Killed process 25021 (apache2
).
Dec 28 03:13:55 werewolf kernel: Out of Memory: Killed process 25084 (apache2). Dec 28 03:13:55 werewolf kernel: Out of Memory: Killed process 25116 (apache2). Dec 28 03:14:28 werewolf kernel: Out of Memory: Killed process 25087 (apache2). Dec 28 03:14:29 werewolf kernel: Out of Memory: Killed process 25164 (apache2).

----------

apache2.conf (parts):

ServerLimit 12

<IfModule worker.c>
StartServers         3
MaxClients         384
MinSpareThreads     20
MaxSpareThreads     50
ThreadsPerChild     32
MaxRequestsPerChild  10000
</IfModule>

AddOutputFilterByType LAYOUT text/html application/x-httpd-php
LayoutEndTag </BODY>
LayoutFooter /some_document.html

FastCgiServer /usr/lib/cgi-bin/php4
FastCgiConfig -idle-timeout 15
AddHandler php-fastcgi .php .php3 .php4
<Location /cgi-bin/php>
    SetHandler fastcgi-script
</Location>
Action php-fastcgi /cgi-bin/php
AddType application/x-httpd-php .php .php3 .php4

I've added this two lines:

ulimit -S -m 25600
ulimit -H -m 25600

on top of apache2ctl script (which actually only calls "/usr/sbin/apache2 -k start"), without any result. Top still shows apache2 processes with resident size >= 30Mb. It already crashed our server two times, here's top output in last
4 seconds:

 top - 22:03:16 up 1 day,  1:33,  2 users,  load average: 8.26, 6.90, 4.71
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 23439 www-data  19   0  372m  94m 1820 S 28.3 10.7   0:08.90 apache2

 top - 22:03:18 up 1 day,  1:33,  2 users,  load average: 9.76, 7.23, 4.83
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 23439 www-data  19   0  551m 273m 1820 S 53.1 30.9   0:09.97 apache2
  132 root      15   0     0    0    0 D  9.9  0.0   0:27.16 kswapd0

This apache process was spawned after 22:03:14, then it consumed 273Mb of memory in about 4 seconds, and finally frozen our box (probably taking rest of memory).

In this situation you understand it's hard to deliver any debug information. I'm suprised that ulimit doesn't work with apache, maybe we use it the wrong way?

We have some info that may be useful though. From time to time appear apache processes consuming a lot of memory. They grow in time, mostly exiting normally, so I'm not sure they're related to this bug. But I guess that's better than nothing. Below is a trace of one of this kind of processes.

 # ls /proc/9200/task
 9200  9211  9214  9215  9224  9233

 # strace -p 9200
 Process 9200 attached - interrupt to quit
 read(4, 0xbfc73f57, 1)                  = ? ERESTARTSYS (To be restarted)
 --- SIGTERM (Terminated) @ 0 (0) ---
sigreturn() = ? (mask now HUP INT QUIT USR1 ALRM
STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR?)
 futex(0xb7024bf8, FUTEX_WAIT, 9202, NULL) = 0
 futex(0xb6824bf8, FUTEX_WAIT, 9203, NULL) = 0
 futex(0xb5824bf8, FUTEX_WAIT, 9205, NULL) = 0
 munmap(0xb7025000, 8388608)             = 0
 futex(0xb5024bf8, FUTEX_WAIT, 9206, NULL) = 0
 munmap(0xa6825000, 8388608)             = 0
 munmap(0xb6825000, 8388608)             = 0
 munmap(0xb6025000, 8388608)             = 0
 munmap(0xb5825000, 8388608)             = 0
 munmap(0xb5025000, 8388608)             = 0
 futex(0xb2824bf8, FUTEX_WAIT, 9211, NULL) = 0
 munmap(0xb4825000, 8388608)             = 0
 munmap(0xb4025000, 8388608)             = 0
 munmap(0xb3825000, 8388608)             = 0
 munmap(0xb3025000, 8388608)             = 0

 == more than 100 entries ==

 munmap(0xa562d000, 430080)              = 0
 munmap(0xa5a07000, 360448)              = 0
 munmap(0xa5d5e000, 315392)              = 0
 munmap(0xa604d000, 274432)              = 0
 munmap(0xa63dc000, 221184)              = 0
 munmap(0xa6547000, 180224)              = 0
 munmap(0xa6724000, 139264)              = 0
 brk(0x894d000)                          = 0x894d000
 brk(0x889e000)                          = 0x889e000
 brk(0x87ef000)                          = 0x87ef000
 brk(0x87e5000)                          = 0x87e5000
 brk(0x84c2000)                          = 0x84c2000
 brk(0x84c0000)                          = 0x84c0000
 brk(0x828a000)                          = 0x828a000
 brk(0x8286000)                          = 0x8286000
 brk(0x81a3000)                          = 0x81a3000
 brk(0x819b000)                          = 0x819b000
 write(7, "www.example.com 84."..., 1018) = 1018
 write(8, "111.111.111.111 - - [28/Dec/2005:"..., 3930) = 3930
 exit_group(0)                           = ?
 Process 9200 detached

 # strace -p 9211
 Process 9211 attached - interrupt to quit
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [1982387], 1012962)    = 13140
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [1995527], 999822)     = 13140

 == about 70 entries more ==

 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [2967887], 27462)      = 13140
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [2981027], 14322)      = 13140
 poll({fd=24, events=POLLOUT, revents=POLLOUT}?, 1, 60000) = 1
 sendfile(24, 25, [2994167], 1182)       = 1182
 gettimeofday({1135804066, 798228}, NULL) = 0
 gettimeofday({1135804066, 798456}, NULL) = 0
times({tms_utime=82, tms_stime=79, tms_cutime=0, tms_cstime=0}) = 1718650698
 gettimeofday({1135804066, 798886}, NULL) = 0
 shutdown(24, 1 /* send */)              = 0
 poll({fd=24, events=POLLIN, revents=POLLIN?, 1, 2000) = 1
 read(24, "", 512)                       = 0
 close(24)                               = 0
 close(25)                               = 0
 gettimeofday({1135804068, 189043}, NULL) = 0
 gettimeofday({1135804068, 189074}, NULL) = 0
 _exit(0)                                = ?
 Process 9211 detached

This poll/sendfile pattern is similar to all of them.

Please help us. :) We'll be happy with any sane workaround.

Debian testing is up-to-date with every package updated from distro testing.





Reply to: