[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#386056: linux-image-2.6.17-2-amd64: Out of Memory syslog kernel message with more than 2Gb of RAM



Package: linux-image-2.6.17-2-amd64
Version: 2.6.17-8
Severity: critical
Justification: breaks unrelated software

The specifications of the machine in question before the bug was found:
    AMD64 3700+ processor (single core)
    ASUS A8N-SLI deluxe motherboard
    2Gb of 333MHz DDR in two 1Gb sticks
    Debian Etch (dist-upgrade as of August 31)
    Debian stock kernel 2.6.17-2-amd64
    VMWare 5.5 using Any-to-Any patch, installed and running properly

Everything was happy until I added another 1Gb of RAM to the machine (in
the form of two 512Mb 400MHz DDR sticks).  Upon doing so, the VMWare GUI
launches properly but starting a VM crashes the entire X session and a
number of other programs with the following output to syslog:

-------------------------------------------------------------------------------
Sep  4 00:40:31 localhost kernel: Active:73720 inactive:32596 dirty:683
writeback:0 unstable:0 free:648340 slab:9099 mapped:57706
pagetables:1790
Sep  4 00:40:31 localhost kernel: Node 0 DMA free:24kB min:24kB low:28kB
high:36kB active:0kB inactive:0kB present:11292kB pages_scanned:0
all_unreclaimable? yes
Sep  4 00:40:31 localhost kernel: lowmem_reserve[]: 0 3014 3014 3014
Sep  4 00:40:31 localhost kernel: Node 0 DMA32 free:2593336kB min:7012kB
low:8764kB high:10516kB active:294880kB inactive:130384kB
present:3086500kB pages_scanned:0 all_unreclaimable? no
Sep  4 00:40:31 localhost kernel: lowmem_reserve[]: 0 0 0 0
Sep  4 00:40:31 localhost kernel: Node 0 Normal free:0kB min:0kB low:0kB
high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Sep  4 00:40:31 localhost kernel: lowmem_reserve[]: 0 0 0 0
Sep  4 00:40:31 localhost kernel: Node 0 HighMem free:0kB min:128kB
low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0
all_unreclaimable? no
Sep  4 00:40:31 localhost kernel: lowmem_reserve[]: 0 0 0 0
Sep  4 00:40:31 localhost kernel: Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB
0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 24kB
Sep  4 00:40:31 localhost kernel: Node 0 DMA32: 2506*4kB 19942*8kB
21602*16kB 16772*32kB 10139*64kB 4347*128kB 843*256kB 109*512kB
29*1024kB 11*2048kB 3*4096kB = 2593336kB
Sep  4 00:40:31 localhost kernel: Node 0 Normal: empty
Sep  4 00:40:31 localhost kernel: Node 0 HighMem: empty
Sep  4 00:40:31 localhost kernel: Swap cache: add 44, delete 44, find
0/0, race 0+0
Sep  4 00:40:31 localhost kernel: Free swap  = 3855420kB
Sep  4 00:40:31 localhost kernel: Total swap = 3855592kB
Sep  4 00:40:31 localhost kernel: Free swap:       3855420kB
Sep  4 00:40:31 localhost kernel: 786416 pages of RAM
Sep  4 00:40:31 localhost kernel: 13657 reserved pages
Sep  4 00:40:31 localhost kernel: 120890 pages shared
Sep  4 00:40:31 localhost kernel: 0 pages swap cached
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 200185 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21634
(artsd).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 191734 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21638
(konqueror).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 165438 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21639
(konqueror).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 139152 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21640
(kwrite).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 124096 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21641
(konqueror).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 97760 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21642
(konqueror).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 71478 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21643
(konqueror).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21644
(gaim) score 58404 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21651
(gaim).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 45197 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21644
(gaim).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21623
(kdesktop) score 30577 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21623
(kdesktop).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 30482 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21823
(wrapper-gtk24.s).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21613
(kded) score 30037 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21613
(kded).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 29581 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21856
(klauncher).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21658
(knotify) score 28944 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21658
(knotify).
Sep  4 00:40:31 localhost kernel: Out of Memory: Kill process 21489
(FahCore_a0.exe) score 28370 and children.
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21489
(FahCore_a0.exe).
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21490
(FahCore_a0.exe).
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21491
(FahCore_a0.exe).
Sep  4 00:40:31 localhost kernel: Out of memory: Killed process 21492
(FahCore_a0.exe).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21625
(kicker) score 26543 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21625
(kicker).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21654
(klipper) score 22490 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21654
(klipper).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21636
(kaccess) score 21457 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21636
(kaccess).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21620
(ksmserver) score 21218 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21620
(ksmserver).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21606
(kdeinit) score 19539 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21606
(kdeinit).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21609
(dcopserver) score 18792 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21609
(dcopserver).
Sep  4 00:40:32 localhost kernel: Out of Memory: Kill process 21853
(vmware) score 11787 and children.
Sep  4 00:40:32 localhost kernel: Out of memory: Killed process 21855
(vmware-vmx).
Sep  4 00:40:36 localhost kdm_greet[21878]: Can't open default user face
-------------------------------------------------------------------------------

The following appears later in the syslog:

-------------------------------------------------------------------------------
Sep  4 00:42:28 localhost kernel: DMA memory shortage. Temporarily
falling back on virtual DMA
Sep  4 00:43:26 localhost kernel: vmware: page allocation failure.
order:3, mode:0x10d1
Sep  4 00:43:26 localhost kernel:
Sep  4 00:43:26 localhost kernel: Call Trace:
<ffffffff8020e081>{__alloc_pages+680}
<ffffffff8023a566>{__get_free_pages+14}
Sep  4 00:43:26 localhost kernel:
<ffffffff88160d7d>{:floppy:floppy_open+295}
<ffffffff802b6590>{do_open+160}
Sep  4 00:43:26 localhost kernel:        <ffffffff802b64e8>{bdget+276}
<ffffffff802b6ab2>{blkdev_open+0}
Sep  4 00:43:26 localhost kernel:
<ffffffff802b6ad3>{blkdev_open+33} <ffffffff8021cadf>{__dentry_open+217}
Sep  4 00:43:26 localhost kernel:
<ffffffff80225701>{do_filp_open+42}
<ffffffff80214751>{get_unused_fd+102}
Sep  4 00:43:26 localhost kernel:
<ffffffff8021859e>{do_sys_open+68} <ffffffff8025ccd2>{ia32_sysret+0}
Sep  4 00:43:26 localhost kernel: Mem-info:
Sep  4 00:43:26 localhost kernel: Node 0 DMA per-cpu:
Sep  4 00:43:26 localhost kernel: cpu 0 hot: high 0, batch 1 used:0
Sep  4 00:43:26 localhost kernel: cpu 0 cold: high 0, batch 1 used:0
Sep  4 00:43:26 localhost kernel: Node 0 DMA32 per-cpu:
Sep  4 00:43:26 localhost kernel: cpu 0 hot: high 186, batch 31 used:159
Sep  4 00:43:26 localhost kernel: cpu 0 cold: high 62, batch 15 used:11
Sep  4 00:43:26 localhost kernel: Node 0 Normal per-cpu: empty
Sep  4 00:43:26 localhost kernel: Node 0 HighMem per-cpu: empty
Sep  4 00:43:26 localhost kernel: Free pages:     2313696kB (0kB
HighMem)
Sep  4 00:43:26 localhost kernel: Active:126170 inactive:48987 dirty:596
writeback:0 unstable:0 free:578424 slab:9283 mapped:115120
pagetables:2412
Sep  4 00:43:26 localhost kernel: Node 0 DMA free:32kB min:24kB low:28kB
high:36kB active:0kB inactive:0kB present:11292kB pages_scanned:0
all_unreclaimable? yes
Sep  4 00:43:26 localhost kernel: lowmem_reserve[]: 0 3014 3014 3014
Sep  4 00:43:26 localhost kernel: Node 0 DMA32 free:2313664kB min:7012kB
low:8764kB high:10516kB active:504680kB inactive:195948kB
present:3086500kB pages_scanned:0 all_unreclaimable? no
-------------------------------------------------------------------------------

I am hoping this is the right place to submit this information.  Given
as the problem only seems to be reproducable with VMWare (thus requiring
a tainted kernel), I am submitting here instead of LKML.  Removal of the
extra 1Gb of DDR (reducing the system to 2Gb again) fixes the problem;
at 2Gb, VMWare runs properly again.  The problem is not unique to a
specific VM: two VMs (one Windows 2000, one Debian IA32 Sarge) both
exhibited the same behavior, as did a new one.

I would not mark this problem with such high severity except that the
following output to the syslog halted my tinkering for the evening:

-------------------------------------------------------------------------------
Sep  4 01:06:01 localhost kernel: md: syncing RAID array md0
Sep  4 01:06:01 localhost kernel: md: minimum _guaranteed_
reconstruction speed: 1000 KB/sec/disc.
Sep  4 01:06:01 localhost kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep  4 01:06:01 localhost kernel: md: using 128k window, over a total of
312571136 blocks.
-------------------------------------------------------------------------------

This happened immediately after my fourth X session crash as a result of
starting VMWare.  In this case, whatever memory problem is arising
apparently caused my RAID5 to go out of sync.  mdadm --detail /dev/md0
reported the array's status as "clean, resyncing".  I did not feel
comfortable proceeding until the array reconstruction had completed.

The following is the output of a few potentially helpful commands from
my system:

-------------------------------------------------------------------------------
root@nestor:~# free
             total       used       free     shared    buffers
cached
Mem:       3091600    1342420    1749180          0       7528
834596
-/+ buffers/cache:     500296    2591304
Swap:      3855592        116    3855476
root@nestor:~# cat /proc/meminfo
MemTotal:      3091600 kB
MemFree:       1740884 kB
Buffers:          7884 kB
Cached:         840740 kB
SwapCached:         16 kB
Active:         803144 kB
Inactive:       457132 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      3091600 kB
LowFree:       1740884 kB
SwapTotal:     3855592 kB
SwapFree:      3855476 kB
Dirty:             112 kB
Writeback:           0 kB
Mapped:         543992 kB
Slab:            50144 kB
CommitLimit:   5401392 kB
Committed_AS:   829868 kB
PageTables:       9344 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    298580 kB
VmallocChunk: 34359432699 kB
root@nestor:~# cat /proc/vmstat
nr_dirty 121
nr_writeback 0
nr_unstable 0
nr_page_table_pages 2338
nr_mapped 129519
nr_slab 12908
pgpgin 79491310
pgpgout 58817848
pswpin 18
pswpout 44
pgalloc_high 0
pgalloc_normal 0
pgalloc_dma32 67361665
pgalloc_dma 590
pgfree 67755244
pgactivate 4297035
pgdeactivate 2885886
pgfault 34368037
pgmajfault 46515
pgrefill_high 0
pgrefill_normal 0
pgrefill_dma32 8859331
pgrefill_dma 0
pgsteal_high 0
pgsteal_normal 0
pgsteal_dma32 18486611
pgsteal_dma 0
pgscan_kswapd_high 0
pgscan_kswapd_normal 0
pgscan_kswapd_dma32 21019296
pgscan_kswapd_dma 0
pgscan_direct_high 0
pgscan_direct_normal 0
pgscan_direct_dma32 1526688
pgscan_direct_dma 0
pginodesteal 526399
slabs_scanned 874310528
kswapd_steal 17458257
kswapd_inodesteal 10148069
pageoutrun 149720
allocstall 6287
pgrotated 2698
nr_bounce 0
root@nestor:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0    116 1568544  14812 1015812  0    0   899   665   99  176 15  4 77  4
 2  0    116 1568544  14812 1015812  0    0     0     0 1539 2059 59 41  0  0
 1  0    116 1568420  14812 1015812  0    0     4     0 1416 2223 61 39  0  0
-------------------------------------------------------------------------------

More information available upon request.  I'm not overly familiar with
the bug reporting process, so my e-mail is zep01@bahj.com (just in case
more information is needed and I'm not responding to questions or
somesuch).

Cheers!

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (600, 'testing'), (500, 'stable'), (400, 'unstable')
Architecture: amd64 (x86_64)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17-2-amd64
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)

Versions of packages linux-image-2.6.17-2-amd64 depends on:
ii  e2fsprogs                     1.39-1     ext2 file system utilities and lib
ii  initramfs-tools [linux-initra 0.76       tools for generating an initramfs
ii  module-init-tools             3.2.2-3    tools for managing Linux kernel mo

linux-image-2.6.17-2-amd64 recommends no packages.

-- debconf information:
  linux-image-2.6.17-2-amd64/postinst/kimage-is-a-directory:
  shared/kernel-image/really-run-bootloader: true
  linux-image-2.6.17-2-amd64/preinst/abort-overwrite-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/preinst/overwriting-modules-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/postinst/old-system-map-link-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/preinst/failed-to-move-modules-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/preinst/abort-install-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/postinst/depmod-error-2.6.17-2-amd64: false
  linux-image-2.6.17-2-amd64/preinst/elilo-initrd-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/postinst/bootloader-error-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/prerm/would-invalidate-boot-loader-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/preinst/lilo-has-ramdisk:
  linux-image-2.6.17-2-amd64/postinst/old-dir-initrd-link-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/preinst/lilo-initrd-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/preinst/already-running-this-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/postinst/old-initrd-link-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/postinst/depmod-error-initrd-2.6.17-2-amd64: false
  linux-image-2.6.17-2-amd64/postinst/create-kimage-link-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/prerm/removing-running-kernel-2.6.17-2-amd64: true
  linux-image-2.6.17-2-amd64/postinst/bootloader-test-error-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/preinst/initrd-2.6.17-2-amd64:
  linux-image-2.6.17-2-amd64/preinst/bootloader-initrd-2.6.17-2-amd64: true



Reply to: