--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: linux-doc-2.6.18: Kernel oopses on xen-amd64 SMP while dealing with LVM snapshots
- From: Ingo Juergensmann <ij@2006.bluespice.org>
- Date: Thu, 11 Jan 2007 14:34:09 +0100
- Message-id: <20070111133409.8133.88371.reportbug@wbs-euserv>
Package: linux-doc-2.6.18
Severity: important
Hi!
Subject already says it:
Debian kernel 2.6.18-3-xen-vserver-amd64 causes kernel oopses (and
therefore probably data loss or broken systems) when using LVM snapshots
with Xen on a dual amd64 machine.
More details:
Machine is a AMD Athlon(tm) 64 X2 Dual Core Processor 3800+. 2G Ram system
with two identical 250GB harddisks hooked up as Software-RAID1. One RAID1
partition is setup for LVM to host the disks for Xen domUs. It's Etch.
I'm using LVM snapshots to backup the running Xen Domains. When removing
the snapshots again, the Kernel oopses like this:
Jan 11 11:36:40 wbs-euserv kernel: ----------- [cut here ] --------- [please bite here ] ---------
Jan 11 11:36:40 wbs-euserv kernel: Kernel BUG at mm/slab.c:595
Jan 11 11:36:40 wbs-euserv kernel: invalid opcode: 0000 [1] SMP
Jan 11 11:36:40 wbs-euserv kernel: CPU 0
Jan 11 11:36:40 wbs-euserv kernel: Modules linked in: xfrm4_mode_tunnel esp4 netloop tun ipv6 ipt_LOG ipt_iprange xt_physdev ipt_ULOG ipt_recent ipt_REJECT xt_tcpudp xt_state ip_conntrack nfnetlink iptable_fi
lter iptable_mangle ip_tables x_tables bridge deflate zlib_deflate twofish serpent aes blowfish des sha256 sha1 crypto_null af_key ext3 jbd mbcache loop snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device s
nd analog irtty_sir i2c_nforce2 sir_dev parport_pc gameport soundcore psmouse irda floppy parport serial_core serio_raw crc_ccitt evdev i2c_core pcspkr xfs dm_mirror dm_snapshot dm_mod raid1 md_mod ide_generi
c sd_mod sata_nv libata scsi_mod ehci_hcd amd74xx forcedeth generic ide_core ohci_hcd fan
Jan 11 11:36:40 wbs-euserv kernel: Pid: 7579, comm: lvremove Not tainted 2.6.18-3-xen-vserver-amd64 #1
Jan 11 11:36:40 wbs-euserv kernel: RIP: e030:[<ffffffff80207119>] [<ffffffff80207119>] kmem_cache_free+0x58/0xca
Jan 11 11:36:40 wbs-euserv kernel: RSP: e02b:ffff88006f79fc68 EFLAGS: 00010202
Jan 11 11:36:40 wbs-euserv kernel: RAX: 0000000000000068 RBX: 0000000000000000 RCX: 0000000000000000
Jan 11 11:36:40 wbs-euserv kernel: RDX: ffff8800035b0e70 RSI: ffff880070c42588 RDI: ffff88007fc45840
Jan 11 11:36:40 wbs-euserv kernel: RBP: ffff880070c42588 R08: ffff88006f79e000 R09: 0000000000000000
Jan 11 11:36:40 wbs-euserv kernel: R10: ffff8800720ab200 R11: ffff88007386f880 R12: ffff88007fc45840
Jan 11 11:36:40 wbs-euserv kernel: R13: 0000000000003020 R14: 0000000000000302 R15: 0000000000000800
Jan 11 11:36:40 wbs-euserv kernel: FS: 00002ae26846bc80(0000) GS:ffffffff804d3000(0000) knlGS:0000000000000000
Jan 11 11:36:40 wbs-euserv kernel: CS: e033 DS: 0000 ES: 0000
Jan 11 11:36:40 wbs-euserv kernel: Process lvremove (pid: 7579[#0], threadinfo ffff88006f79e000, task ffff88007386f880)
Jan 11 11:36:40 wbs-euserv kernel: Stack: 0000000000000000 0000000000000000 ffff880072233ee8 ffffc200001ef020
Jan 11 11:36:40 wbs-euserv kernel: 0000000000003020 ffffffff880dc9d4 ffff88007fc45840 ffff880072233e80
Jan 11 11:36:40 wbs-euserv kernel: ffffc200001a4080 0000000000000000
Jan 11 11:36:40 wbs-euserv kernel: Call Trace:
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880dc9d4>] :dm_snapshot:exit_exception_table+0x3c/0x67
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880cef4a>] :dm_mod:dev_remove+0x0/0xb5
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880dcaa5>] :dm_snapshot:snapshot_dtr+0xa6/0xe6
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880cc856>] :dm_mod:dm_table_put+0x58/0xc5
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880cb92a>] :dm_mod:dm_put+0x90/0x151
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880cefec>] :dm_mod:dev_remove+0xa2/0xb5
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff880cf52d>] :dm_mod:ctl_ioctl+0x213/0x25e
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff802431fa>] do_ioctl+0x55/0x6b
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff80231e5a>] vfs_ioctl+0x364/0x38b
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff8024dacf>] sys_ioctl+0x59/0x78
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff8025ed5e>] system_call+0x86/0x8b
Jan 11 11:36:40 wbs-euserv kernel: [<ffffffff8025ecd8>] system_call+0x0/0x8b
Jan 11 11:36:40 wbs-euserv kernel:
Jan 11 11:36:40 wbs-euserv kernel:
Jan 11 11:36:40 wbs-euserv kernel: Code: 0f 0b 68 05 a6 41 80 c2 53 02 4c 39 62 28 74 0a 0f 0b 68 05
Jan 11 11:36:40 wbs-euserv kernel: RIP [<ffffffff80207119>] kmem_cache_free+0x58/0xca
Jan 11 11:36:40 wbs-euserv kernel: RSP <ffff88006f79fc68>
The funny thing is: this happens only on the AMD64 X2 machine. We have
a second machine which is nearly identical to the above:
a AMD Athlon(tm) 64 Processor 3500+, 2G Ram, 2x 120GB disks softRaid1+LVM.
The second machine runs the same procedure for its Xen domains: snapshot,
backup, remove snapshot - but it doesn't show *any* kernel oopses in its
logfiles. This makes me guess that it could have something to do with
the dual core CPU on the AMD64 X2 machine.
(a very short test on a dual Xeon i386 machine didn't show oopses, but really
not deeply tested.)
Regards,
Ingo
-- System Information:
Debian Release: 4.0
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-xen-vserver-amd64
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
--- End Message ---