[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

server memory & swap eaten up then freaks out



I have a mail and Samba PDC server that, after a couple of days, runs out
of RAM then swap then freaks out with oom killer kicking in, at which
point it becomes very unresponsive and needs rebooting.

Does anyone have any ideas of what could be causing this please?

The server has a Pentium 4 era Intel Xeon CPU with 2GB RAM and 2GB swap.
It runs Debian 4.0 Etch with volatile updates. The issue happens with
Debian 4.0 Etch kernel 2.6.18 or EtchnHalf kernel 2.6.24. This server has
operated fine for the past 2 years on Debian 3.1 Sarge, only having been
upgraded to
Etch 2 months ago.

The mail server is setup as per
http://workaround.org/articles/ispmail-etch/; the Samba setup is as per
http://thegoldenear.org/toolbox/unices/samba-3-pdc-print-server-debian-etch.html.

The symptom is that all available physical RAM is used up (as reported by
free), then swap is used up, then the server freaks out, giving out of
memory errors continually to the screen (see the actual message below).

The server seems to operate at its maximum capacity of physical RAM for a
day or two before eating into swap.

Initially when I was seeing this, much of the time, but not all of the
time, there were a _lot_ of 'imap' instances running. But now I don't see
that at all and yet it crashes all the same.

There are around 10 concurrent users, each with up to 5 mailboxes open in
Thunderbird 2. Thunderbird's concurrent 'Maximum number of server
connections to cache' is set to 1 (previously 5 but I've gotten this
turned down recently, though, according to
http://kb.mozillazine.org/IMAP_servers, should be OK with dovecot).

I would have thought 2GB of RAM would be plenty, is that true?

I ran memtest86+ and the memory tested OK.

Here's the output of various tools and syslog, from back when I was
getting lots of instances of imap:

#top:

11598 vmail     18   0  4700  352  132 R    6  0.0   1:16.50 imap
11309 vmail     18   0  4696  316  116 R    6  0.0   1:19.11 imap
16184 vmail     18   0 63144  500  276 D    4  0.0   1:05.17 imap
23445 vmail     18   0 63144 7112  276 R    4  0.3   0:42.32 imap
27232 vmail     18   0 63144 7004  112 D    4  0.3   0:28.84 imap
20573 vmail     18   0 63144  524  276 D    3  0.0   0:45.58 imap
11865 vmail     18   0  4696  240   48 R    3  0.0   1:17.85 imap
13167 vmail     18   0  5868  520  276 D    3  0.0   1:15.14 imap
14976 vmail     18   0 63320  544  280 D    3  0.0   1:11.70 imap
16272 vmail     18   0 63148  520  276 D    3  0.0   1:09.01 imap
25847 vmail     18   0 63144 7172  276 D    3  0.3   0:31.96 imap
 5260 root      16   0  2892 1316  456 R    3  0.1   0:02.92 top
 4534 vmail     18   0  4824  452  276 D    3  0.0   2:04.29 imap
 4633 vmail     18   0  4824  460  276 D    3  0.0   1:57.42 imap
13094 vmail     18   0  4700  480  276 D    3  0.0   1:11.33 imap
23244 vmail     18   0 63148  488  276 D    3  0.0   0:45.67 imap
24505 vmail     18   0 63148 7240  276 D    3  0.3   0:40.54 imap
26400 vmail     18   0 63148 7156  276 D    3  0.3   0:32.45 imap
27212 vmail     18   0 63144 7184  276 D    3  0.3   2:18.90 imap
27226 vmail     18   0 63148 7232  276 D    3  0.3   0:28.54 imap
 2724 amavis    18   0 55196  10m   44 R    2  0.5   6:32.14 amavisd-new
 3751 root      18   0  9668  516  136 D    2  0.0   7:02.11 miniserv.pl
 6401 vmail     18   0  4828  576  276 D    2  0.0   1:46.62 imap
23341 vmail     18   0 63148 7068  276 D    2  0.3   0:39.53 imap
28622 vmail     18   0 63148 7080  112 D    2  0.3   0:26.98 imap
 4563 vmail     18   0  4824  288  112 D    2  0.0   1:56.91 imap
 9561 vmail     18   0  4700  308  112 D    2  0.0   1:31.07 imap
21819 vmail     18   0 63144  364  112 D    2  0.0   0:46.23 imap
25949 vmail     18   0 63148 6924  112 D    2  0.3   0:33.42 imap
 9885 vmail     18   0  4700  324  112 D    1  0.0   1:26.72 imap
10127 vmail     18   0  4700  308  112 D    1  0.0   1:23.44 imap


# free -m
             total       used       free     shared    buffers     cached
Mem:          2027       1976         50          0        226        612
-/+ buffers/cache:       1138        889
Swap:         2572          0       2572


# vmstat -s
      2076388  total memory
      2023448  used memory
      1395056  active memory
       444268  inactive memory
        52940  free memory
       241696  buffer memory
       500268  swap cache
      2634620  total swap
           56  used swap
      2634564  free swap
        33797 non-nice user cpu ticks
         5477 nice user cpu ticks
       731695 system cpu ticks
     19954325 idle cpu ticks
       378484 IO-wait cpu ticks
         1368 IRQ cpu ticks
         4493 softirq cpu ticks
            0 stolen cpu ticks
      3196946 pages paged in
      6119050 pages paged out
            0 pages swapped in
           14 pages swapped out
     17992684 interrupts
     10515665 CPU context switches
   1237926809 boot time
        25277 forks

>From /var/log/syslog:

Mar 14 12:58:20 localhost kernel: oom-killer: gfp_mask=0x201d2, order=0
Mar 14 12:58:20 localhost kernel:  [<c01444e3>] out_of_memory+0x25/0x13a
Mar 14 12:58:20 localhost kernel:  [<c014594c>] __alloc_pages+0x1f5/0x275
Mar 14 12:58:20 localhost kernel:  [<c0146afb>]
__do_page_cache_readahead+0xc8/0x1e8
Mar 14 12:58:20 localhost kernel:  [<c0141674>] find_get_page+0x18/0x38
Mar 14 12:58:20 localhost kernel:  [<c015bb3b>]
__find_get_block_slow+0xfb/0x105
Mar 14 12:58:20 localhost kernel:  [<c015be64>] __find_get_block+0x15c/0x166
Mar 14 12:58:20 localhost kernel:  [<c0146c61>]
blockable_page_cache_readahead+0x46/0x99
Mar 14 12:58:20 localhost kernel:  [<c0146df3>]
page_cache_readahead+0xb2/0x177
Mar 14 12:58:20 localhost kernel:  [<c0141d09>]
do_generic_mapping_read+0x137/0x42a
Mar 14 12:58:20 localhost kernel:  [<c014283b>]
__generic_file_aio_read+0x16b/0x1b2
Mar 14 12:58:20 localhost kernel:  [<c0141524>] file_read_actor+0x0/0xca
Mar 14 12:58:20 localhost kernel:  [<c01428bd>]
generic_file_aio_read+0x3b/0x42
Mar 14 12:58:20 localhost kernel:  [<c0159e7f>] do_sync_read+0xb6/0xf1
Mar 14 12:58:20 localhost kernel:  [<c012d8b1>]
autoremove_wake_function+0x0/0x2d
Mar 14 12:58:20 localhost kernel:  [<c0159dc9>] do_sync_read+0x0/0xf1
Mar 14 12:58:20 localhost kernel:  [<c015a788>] vfs_read+0x9f/0x141
Mar 14 12:58:20 localhost kernel:  [<c015aca2>] sys_pread64+0x44/0x5a
Mar 14 12:58:20 localhost kernel:  [<c0102c11>] sysenter_past_esp+0x56/0x79
Mar 14 12:58:20 localhost kernel: Mem-info:
Mar 14 12:58:20 localhost kernel: DMA per-cpu:
Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 0, batch 1 used:0
Mar 14 12:58:20 localhost kernel: DMA32 per-cpu: empty
Mar 14 12:58:20 localhost kernel: Normal per-cpu:
Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 186, batch 31 used:0
Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 62, batch 15 used:54
Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 186, batch 31 used:27
Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 62, batch 15 used:61
Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 186, batch 31 used:49
Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 62, batch 15 used:59
Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 186, batch 31 used:8
Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 62, batch 15 used:19
Mar 14 12:58:20 localhost kernel: HighMem per-cpu:
Mar 14 12:58:20 localhost kernel: cpu 0 hot: high 186, batch 31 used:69
Mar 14 12:58:20 localhost kernel: cpu 0 cold: high 62, batch 15 used:48
Mar 14 12:58:20 localhost kernel: cpu 1 hot: high 186, batch 31 used:170
Mar 14 12:58:20 localhost kernel: cpu 1 cold: high 62, batch 15 used:15
Mar 14 12:58:20 localhost kernel: cpu 2 hot: high 186, batch 31 used:162
Mar 14 12:58:20 localhost kernel: cpu 2 cold: high 62, batch 15 used:12
Mar 14 12:58:20 localhost kernel: cpu 3 hot: high 186, batch 31 used:174
Mar 14 12:58:20 localhost kernel: cpu 3 cold: high 62, batch 15 used:49
Mar 14 12:58:20 localhost kernel: Free pages:       45488kB (496kB HighMem)
Mar 14 12:58:20 localhost kernel: Active:263348 inactive:216164 dirty:0
writeback:0 unstable:0 free:11372 slab:14025 mapped:144 pagetables:11530
Mar 14 12:58:20 localhost kernel: DMA free:8192kB min:68kB low:84kB
high:100kB active:2968kB inactive:2160kB present:16384kB
pages_scanned:6530 all_unreclaimable? yes
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 880 2031
Mar 14 12:58:20 localhost kernel: DMA32 free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 880 2031
Mar 14 12:58:20 localhost kernel: Normal free:36800kB min:3756kB
low:4692kB high:5632kB active:402344kB inactive:338216kB present:901120kB
pages_scanned:806676 all_unreclaimable? yes
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 0 9212
Mar 14 12:58:20 localhost kernel: HighMem free:496kB min:512kB low:1740kB
high:2968kB active:648080kB inactive:524280kB present:1179136kB
pages_scanned:1275911 all_unreclaimable? yes
Mar 14 12:58:20 localhost kernel: lowmem_reserve[]: 0 0 0 0
Mar 14 12:58:20 localhost kernel: DMA: 6*4kB 219*8kB 123*16kB 23*32kB
2*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 8192kB
Mar 14 12:58:20 localhost kernel: DMA32: empty
Mar 14 12:58:20 localhost kernel: Normal: 2772*4kB 296*8kB 1151*16kB
22*32kB 2*64kB 2*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB =
36800kB
Mar 14 12:58:20 localhost kernel: HighMem: 0*4kB 14*8kB 0*16kB 0*32kB
0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB
Mar 14 12:58:20 localhost kernel: Swap cache: add 948264, delete 948263,
find 97950/132027, race 0+1030
Mar 14 12:58:20 localhost kernel: Free swap  = 0kB
Mar 14 12:58:20 localhost kernel: Total swap = 2634620kB
Mar 14 12:58:20 localhost kernel: Free swap:            0kB
Mar 14 12:58:20 localhost kernel: 524160 pages of RAM
Mar 14 12:58:20 localhost kernel: 294784 pages of HIGHMEM
Mar 14 12:58:20 localhost kernel: 5063 reserved pages
Mar 14 12:58:20 localhost kernel: 22710 pages shared
Mar 14 12:58:20 localhost kernel: 1 pages swap cached
Mar 14 12:58:20 localhost kernel: 0 pages dirty
Mar 14 12:58:20 localhost kernel: 0 pages writeback
Mar 14 12:58:20 localhost kernel: 144 pages mapped
Mar 14 12:58:20 localhost kernel: 14025 pages slab
Mar 14 12:58:20 localhost kernel: 11530 pages pagetables
Mar 14 12:58:20 localhost kernel: 524160 pages of RAM
Mar 14 12:58:20 localhost kernel: 294784 pages of HIGHMEM
Mar 14 12:58:20 localhost kernel: 5063 reserved pages
Mar 14 12:58:20 localhost kernel: 22715 pages shared
Mar 14 12:58:20 localhost kernel: 1 pages swap cached
Mar 14 12:58:20 localhost kernel: 0 pages dirty
Mar 14 12:58:20 localhost kernel: 0 pages writeback
Mar 14 12:58:20 localhost kernel: 144 pages mapped
Mar 14 12:58:20 localhost kernel: 14025 pages slab
Mar 14 12:58:20 localhost kernel: 11530 pages pagetables
Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 29845
(mysqld) score 16080 and children.
Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 29845
(mysqld).
Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 3410
(dovecot) score 8662 and children.
Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 3511
(pop3-login).
Mar 14 12:58:20 localhost kernel: Out of Memory: Kill process 3410
(dovecot) score 8661 and children.
Mar 14 12:58:20 localhost kernel: Out of memory: Killed process 2925 (imap).
Mar 14 12:58:22 localhost mysqld_safe[5237]: Number of processes running
now: 0
Mar 14 12:58:22 localhost mysqld_safe[5239]: restarted
Mar 14 12:58:33 localhost mysqld[5242]: 090314 12:58:32  InnoDB: Database
was not shut down normally!
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Starting crash recovery.
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Reading tablespace
information from the .ibd files...
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: Restoring possible
half-written data pages from the doublewrite
Mar 14 12:58:33 localhost mysqld[5242]: InnoDB: buffer...
Mar 14 12:58:37 localhost fetchmail[3695]: awakened at Sat 14 Mar 2009
12:58:34 GMT
Mar 14 12:58:38 localhost mysqld[5242]: 090314 12:58:36  InnoDB: Starting
log scan based on checkpoint at
Mar 14 12:58:38 localhost mysqld[5242]: InnoDB: log sequence number 0 68717.
Mar 14 12:58:48 localhost mysqld[5242]: InnoDB: Doing recovery: scanned up
to log sequence number 0 68717
qMar 14 12:58:56 localhost dovecot: auth-worker(default): mysql: Connect
failed to 127.0.0.1 (mailserver): Can't connect to MySQL server on
'127.0.0.1' (111) - waiting for 1 seconds before retry
Mar 14 12:58:56 localhost dovecot: auth-worker(default):
sql(letters@DOMAIN.org,IPADDRESS.111): Password query failed: Not
connected to database
Mar 14 12:58:57 localhost fetchmail[3695]: 145 messages (144 seen) for
info@DOMAIN.org at mail.MAILHOST.co.uk (154739969 octets).
Mar 14 12:58:57 localhost dovecot: auth-worker(default):
sql(letters@DOMAIN.org,IPADDRESS.111): Password query failed: Not
connected to database
Mar 14 12:58:59 localhost fetchmail[3695]: reading message
info@DOMAIN.org@mail.MAILHOST.co.uk:145 of 145 (2501 octets) (log message
incomplete)
Mar 14 12:58:59 localhost fetchmail[3695]: connection to localhost:smtp
[127.0.0.1/25] failed: Connection refused.
Mar 14 12:58:59 localhost fetchmail[3695]: SMTP connect to localhost failed
Mar 14 12:58:59 localhost fetchmail[3695]: SMTP transaction error while
fetching from info@DOMAIN.org@mail.MAILHOST.co.uk and delivering to SMTP
host localhost
Mar 14 12:59:00 localhost fetchmail[3695]: Query status=10 (SMTP)
Mar 14 12:59:00 localhost fetchmail[3695]: sleeping at Sat 14 Mar 2009
12:59:00 GMT for 60 seconds
Mar 14 12:59:00 localhost mysqld[5242]: InnoDB: Last MySQL binlog file
position 0 232, file name /var/log/mysql/mysql-bin.000902
Mar 14 12:59:01 localhost mysqld[5242]: 090314 12:59:01  InnoDB: Started;
log sequence number 0 68717
Mar 14 12:59:09 localhost mysqld[5242]: 090314 12:59:04 [Note] Recovering
after a crash using /var/log/mysql/mysql-bin
Mar 14 12:59:10 localhost mysqld[5242]: 090314 12:59:08 [Note] Starting
crash recovery...
Mar 14 12:59:10 localhost mysqld[5242]: 090314 12:59:08 [Note] Crash
recovery finished.
Mar 14 12:59:26 localhost dovecot: auth-worker(default): mysql: Connect
failed to 127.0.0.1 (mailserver): Can't connect to MySQL server on
'127.0.0.1' (111) - waiting for 75 seconds before retry




-- 
Pete Boyd

Open Plan IT - http://openplanit.co.uk
The Golden Ear - http://thegoldenear.org



Reply to: