server memory & swap eaten up then freaks out
I have a mail and Samba PDC server that, after a couple of days, runs out
of RAM then swap then freaks out with oom killer kicking in, at which
point it becomes very unresponsive and needs rebooting.
Does anyone have any ideas of what could be causing this please?
The server has a Pentium 4 era Intel Xeon CPU with 2GB RAM and 2GB swap.
It runs Debian 4.0 Etch with volatile updates. The issue happens with
Debian 4.0 Etch kernel 2.6.18 or EtchnHalf kernel 2.6.24. This server has
operated fine for the past 2 years on Debian 3.1 Sarge, only having been
upgraded to
Etch 2 months ago.
The mail server is setup as per
http://workaround.org/articles/ispmail-etch/; the Samba setup is as per
http://thegoldenear.org/toolbox/unices/samba-3-pdc-print-server-debian-etch.html.
The symptom is that all available physical RAM is used up (as reported by
free), then swap is used up, then the server freaks out, giving out of
memory errors continually to the screen (see the actual message below).
The server seems to operate at its maximum capacity of physical RAM for a
day or two before eating into swap.
Initially when I was seeing this, much of the time, but not all of the
time, there were a _lot_ of 'imap' instances running. But now I don't see
that at all and yet it crashes all the same.
There are around 10 concurrent users, each with up to 5 mailboxes open in
Thunderbird 2. Thunderbird's concurrent 'Maximum number of server
connections to cache' is set to 1 (previously 5 but I've gotten this
turned down recently, though, according to
http://kb.mozillazine.org/IMAP_servers, should be OK with dovecot).
I would have thought 2GB of RAM would be plenty, is that true?
I ran memtest86+ and the memory tested OK.
Here's the output of various tools and syslog, from back when I was
getting lots of instances of imap:
#top:
11598 vmail 18 0 4700 352 132 R 6 0.0 1:16.50 imap
11309 vmail 18 0 4696 316 116 R 6 0.0 1:19.11 imap
16184 vmail 18 0 63144 500 276 D 4 0.0 1:05.17 imap
23445 vmail 18 0 63144 7112 276 R 4 0.3 0:42.32 imap
27232 vmail 18 0 63144 7004 112 D 4 0.3 0:28.84 imap
20573 vmail 18 0 63144 524 276 D 3 0.0 0:45.58 imap
11865 vmail 18 0 4696 240 48 R 3 0.0 1:17.85 imap
13167 vmail 18 0 5868 520 276 D 3 0.0 1:15.14 imap
14976 vmail 18 0 63320 544 280 D 3 0.0 1:11.70 imap
16272 vmail 18 0 63148 520 276 D 3 0.0 1:09.01 imap
25847 vmail 18 0 63144 7172 276 D 3 0.3 0:31.96 imap
5260 root 16 0 2892 1316 456 R 3 0.1 0:02.92 top
4534 vmail 18 0 4824 452 276 D 3 0.0 2:04.29 imap
4633 vmail 18 0 4824 460 276 D 3 0.0 1:57.42 imap
13094 vmail 18 0 4700 480 276 D 3 0.0 1:11.33 imap
23244 vmail 18 0 63148 488 276 D 3 0.0 0:45.67 imap
24505 vmail 18 0 63148 7240 276 D 3 0.3 0:40.54 imap
26400 vmail 18 0 63148 7156 276 D 3 0.3 0:32.45 imap
27212 vmail 18 0 63144 7184 276 D 3 0.3 2:18.90 imap
27226 vmail 18 0 63148 7232 276 D 3 0.3 0:28.54 imap
2724 amavis 18 0 55196 10m 44 R 2 0.5 6:32.14 amavisd-new
3751 root 18 0 9668 516 136 D 2 0.0 7:02.11 miniserv.pl
6401 vmail 18 0 4828 576 276 D 2 0.0 1:46.62 imap
23341 vmail 18 0 63148 7068 276 D 2 0.3 0:39.53 imap
28622 vmail 18 0 63148 7080 112 D 2 0.3 0:26.98 imap
4563 vmail 18 0 4824 288 112 D 2 0.0 1:56.91 imap
9561 vmail 18 0 4700 308 112 D 2 0.0 1:31.07 imap
21819 vmail 18 0 63144 364 112 D 2 0.0 0:46.23 imap
25949 vmail 18 0 63148 6924 112 D 2 0.3 0:33.42 imap
9885 vmail 18 0 4700 324 112 D 1 0.0 1:26.72 imap
10127 vmail 18 0 4700 308 112 D 1 0.0 1:23.44 imap
# free -m
total used free shared buffers cached
Mem: 2027 1976 50 0 226 612
-/+ buffers/cache: 1138 889
Swap: 2572 0 2572
# vmstat -s
2076388 total memory
2023448 used memory
1395056 active memory
444268 inactive memory
52940 free memory
241696 buffer memory
500268 swap cache
2634620 total swap
56 used swap
2634564 free swap
33797 non-nice user cpu ticks
5477 nice user cpu ticks
731695 system cpu ticks
19954325 idle cpu ticks
378484 IO-wait cpu ticks
1368 IRQ cpu ticks
4493 softirq cpu ticks
0 stolen cpu ticks
3196946 pages paged in
6119050 pages paged out
0 pages swapped in
14 pages swapped out
17992684 interrupts
10515665 CPU context switches
1237926809 boot time
25277 forks
>From /var/log/syslog:
Mar 14 12:58:20 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0
Mar 14 12:58:20 localhost kernel: [<c01444e3>] out_of_memory+0x25/0x13a
Mar 14 12:58:20 localhost kernel: [<c014594c>] __alloc_pages+0x1f5/0x275
Mar 14 12:58:20 localhost kernel: [<c0146afb>]
__do_page_cache_readahead+0xc8/0x1e8
Mar 14 12:58:20 localhost kernel: [<c0141674>] find_get_page+0x18/0x38
Mar 14 12:58:20 localhost kernel: [<c015bb3b>]
__find_get_block_slow+0xfb/0x105
Mar 14 12:58:20 localhost kernel: [<c015be64>] __find_get_block+0x15c/0x166
Mar 14 12:58:20 localhost kernel: [<c0146c61>]
blockable_page_cache_readahead+0x46/0x99
Mar 14 12:58:20 localhost kernel: [<c0146df3>]
page_cache_readahead+0xb2/0x177
Mar 14 12:58:20 localhost kernel: [<c0141d09>]
do_generic_mapping_read+0x137/0x42a
Mar 14 12:58:20 localhost kernel: [<c014283b>]
__generic_file_aio_read+0x16b/0x1b2
Mar 14 12:58:20 localhost kernel: [<c0141524>] file_read_actor+0x0/0xca
Mar 14 12:58:20 localhost kernel: [<c01428bd>]
generic_file_aio_read+0x3b/0x42
Mar 14 12:58:20 localhost kernel: [<c0159e7f>] do_sync_read+0xb6/0xf1
Mar 14 12:58:20 localhost kernel: [<c012d8b1>]
autoremove_wake_function+0x0/0x2d
Mar 14 12:58:20 localhost kernel: [<c0159dc9>] do_sync_read+0x0/0xf1
Mar 14 12:58:20 localhost kernel: [<c015a788>] vfs_read+0x9f/0x141
Mar 14 12:58:20 localhost kernel: [<c015aca2>] sys_pread64+0x44/0x5a
Mar 14 12:58:20 localhost kernel: [<c0102c11>] sysenter_past_esp+0x56/0x79
Mar 14 12:58:20 localhost kernel: Mem-info:
Mar 14 12:58:20 localhost kernel: DMA per-cpu:
Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: DMA32 per-cpu: empty
Mar 14 12:58:20 localhost kernel: Normal per-cpu:
Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 186, batch 31 used:0
Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 62, batch 15 used:54
Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 186, batch 31 used:27
Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 62, batch 15 used:61
Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 186, batch 31 used:49
Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 62, batch 15 used:59
Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 186, batch 31 used:8
Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 62, batch 15 used:19
Mar 14 12:58:20 localhost kernel: HighMem per-cpu:
Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 186, batch 31 used:69
Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 62, batch 15 used:48
Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 186, batch 31 used:170
Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 62, batch 15 used:15
Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 186, batch 31 used:162
Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 62, batch 15 used:12
Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 186, batch 31 used:174
Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 62, batch 15 used:49
Mar 14 12:58:20 localhost kernel: Free pages: 45488kB (496kB HighMem)
Mar 14 12:58:20 localhost kernel: Active:263348 inactive:216164 dirty:0
writeback:0 unstable:0 free:11372 slab:14025 mapped:144 pagetables:11530
Mar 14 12:58:20 localhost kernel: DMA free:8192kB min:68kB low:84kB
high:100kB active:2968kB inactive:2160kB present:16384kB
pages_scanned:6530 all_unreclaimable? yes
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 880 2031
Mar 14 12:58:20 localhost kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 880 2031
Mar 14 12:58:20 localhost kernel: Normal free:36800kB min:3756kB
low:4692kB high:5632kB active:402344kB inactive:338216kB present:901120kB
pages_scanned:806676 all_unreclaimable? yes
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 0 9212
Mar 14 12:58:20 localhost kernel: HighMem free:496kB min:512kB low:1740kB
high:2968kB active:648080kB inactive:524280kB present:1179136kB
pages_scanned:1275911 all_unreclaimable? yes
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 0 0
Mar 14 12:58:20 localhost kernel: DMA: 6*4kB 219*8kB 123*16kB 23*32kB
2*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 8192kB
Mar 14 12:58:20 localhost kernel: DMA32: empty
Mar 14 12:58:20 localhost kernel: Normal: 2772*4kB 296*8kB 1151*16kB
22*32kB 2*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB =
36800kB
Mar 14 12:58:20 localhost kernel: HighMem: 0*4kB 14*8kB 0*16kB 0*32kB
0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB
Mar 14 12:58:20 localhost kernel: Swap cache: add 948264, delete 948263,
find 97950/132027, race 0+1030
Mar 14 12:58:20 localhost kernel: Free swap = 0kB
Mar 14 12:58:20 localhost kernel: Total swap = 2634620kB
Mar 14 12:58:20 localhost kernel: Free swap: 0kB
Mar 14 12:58:20 localhost kernel: 524160 pages of RAM
Mar 14 12:58:20 localhost kernel: 294784 pages of HIGHMEM
Mar 14 12:58:20 localhost kernel: 5063 reserved pages
Mar 14 12:58:20 localhost kernel: 22710 pages shared
Mar 14 12:58:20 localhost kernel: 1 pages swap cached
Mar 14 12:58:20 localhost kernel: 0 pages dirty
Mar 14 12:58:20 localhost kernel: 0 pages writeback
Mar 14 12:58:20 localhost kernel: 144 pages mapped
Mar 14 12:58:20 localhost kernel: 14025 pages slab
Mar 14 12:58:20 localhost kernel: 11530 pages pagetables
Mar 14 12:58:20 localhost kernel: 524160 pages of RAM
Mar 14 12:58:20 localhost kernel: 294784 pages of HIGHMEM
Mar 14 12:58:20 localhost kernel: 5063 reserved pages
Mar 14 12:58:20 localhost kernel: 22715 pages shared
Mar 14 12:58:20 localhost kernel: 1 pages swap cached
Mar 14 12:58:20 localhost kernel: 0 pages dirty
Mar 14 12:58:20 localhost kernel: 0 pages writeback
Mar 14 12:58:20 localhost kernel: 144 pages mapped
Mar 14 12:58:20 localhost kernel: 14025 pages slab
Mar 14 12:58:20 localhost kernel: 11530 pages pagetables
Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 29845
(mysqld) score 16080 and children.
Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 29845
(mysqld).
Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 3410
(dovecot) score 8662 and children.
Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 3511
(pop3-login).
Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 3410
(dovecot) score 8661 and children.
Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 2925 (imap).
Mar 14 12:58:22 localhost mysqld_safe[5237]: Number of processes running
now: 0
Mar 14 12:58:22 localhost mysqld_safe[5239]: restarted
Mar 14 12:58:33 localhost mysqld[5242]: 090314 12:58:32 InnoDB: Database
was not shut down normally!
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Starting crash recovery.
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Reading tablespace
information from the .ibd files...
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Restoring possible
half-written data pages from the doublewrite
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: buffer...
Mar 14 12:58:37 localhost fetchmail[3695]: awakened at Sat 14 Mar 2009
12:58:34 GMT
Mar 14 12:58:38 localhost mysqld[5242]: 090314 12:58:36 InnoDB: Starting
log scan based on checkpoint at
Mar 14 12:58:38 localhost mysqld[5242]: InnoDB: log sequence number 0 68717.
Mar 14 12:58:48 localhost mysqld[5242]: InnoDB: Doing recovery: scanned up
to log sequence number 0 68717
qMar 14 12:58:56 localhost dovecot: auth-worker(default): mysql: Connect
failed to 127.0.0.1 (mailserver): Can't connect to MySQL server on
'127.0.0.1' (111) - waiting for 1 seconds before retry
Mar 14 12:58:56 localhost dovecot: auth-worker(default):
sql(letters@DOMAIN.org,IPADDRESS.111): Password query failed: Not
connected to database
Mar 14 12:58:57 localhost fetchmail[3695]: 145 messages (144 seen) for
info@DOMAIN.org at mail.MAILHOST.co.uk (154739969 octets).
Mar 14 12:58:57 localhost dovecot: auth-worker(default):
sql(letters@DOMAIN.org,IPADDRESS.111): Password query failed: Not
connected to database
Mar 14 12:58:59 localhost fetchmail[3695]: reading message
info@DOMAIN.org@mail.MAILHOST.co.uk:145 of 145 (2501 octets) (log message
incomplete)
Mar 14 12:58:59 localhost fetchmail[3695]: connection to localhost:smtp
[127.0.0.1/25] failed: Connection refused.
Mar 14 12:58:59 localhost fetchmail[3695]: SMTP connect to localhost failed
Mar 14 12:58:59 localhost fetchmail[3695]: SMTP transaction error while
fetching from info@DOMAIN.org@mail.MAILHOST.co.uk and delivering to SMTP
host localhost
Mar 14 12:59:00 localhost fetchmail[3695]: Query status=10 (SMTP)
Mar 14 12:59:00 localhost fetchmail[3695]: sleeping at Sat 14 Mar 2009
12:59:00 GMT for 60 seconds
Mar 14 12:59:00 localhost mysqld[5242]: InnoDB: Last MySQL binlog file
position 0 232, file name /var/log/mysql/mysql-bin.000902
Mar 14 12:59:01 localhost mysqld[5242]: 090314 12:59:01 InnoDB: Started;
log sequence number 0 68717
Mar 14 12:59:09 localhost mysqld[5242]: 090314 12:59:04 [Note] Recovering
after a crash using /var/log/mysql/mysql-bin
Mar 14 12:59:10 localhost mysqld[5242]: 090314 12:59:08 [Note] Starting
crash recovery...
Mar 14 12:59:10 localhost mysqld[5242]: 090314 12:59:08 [Note] Crash
recovery finished.
Mar 14 12:59:26 localhost dovecot: auth-worker(default): mysql: Connect
failed to 127.0.0.1 (mailserver): Can't connect to MySQL server on
'127.0.0.1' (111) - waiting for 75 seconds before retry
--
Pete Boyd
Open Plan IT - http://openplanit.co.uk
The Golden Ear - http://thegoldenear.org
Reply to: