Bug#542250: repeatable crashes while copying 500G from NFS mount to local logical volume
Package: linux-image-2.6.26-2-xen-amd64
Version: 2.6.26-17lenny2
Severity: normal
I'm running several servers (Dual Core AMD Opteron, 16G of ram) with
Xen.
Dom0 is debian lenny adm64.
Today I tried to run a large backup in dom0 - copy NFS-mounted partition
to a local volume group.
buki:~# cd backup/
buki:~/backup# mount homes:/srv/homes src
buki:~/backup# mount /dev/vg/homes-backup dst
buki:~/backup# (cd src && tar cpf - .) | (cd dst && tar xpf -)
This resulted into server crash in a few minutes.
Crash is reproducible - if after reboot I try the same, same crash
happens soon with similary-looking backtrace.
[ 712.450538] ------------[ cut here ]------------
[ 712.450538] kernel BUG at drivers/xen/core/spinlock.c:74!
[ 712.450538] invalid opcode: 0000 [1] SMP
[ 712.450538] CPU 0
[ 712.450538] Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs sctp libcrc32c ipv6 lock_dlm gfs2 dlm configfs bridge 8021q dm_emc dm_round_robin dm_multipath loop psmouse pcspkr serio_raw k8temp i2c_amd8111 i2c_amd756 i2c_core button amd_rng rng_core shpchp pci_hotplug evdev reiserfs dm_mirror dm_log dm_snapshot dm_mod raid456 async_xor async_memcpy async_tx xor raid1 md_mod ide_cd_mod cdrom ide_pci_generic sd_mod amd74xx ide_core floppy lpfc scsi_transport_fc scsi_tgt tg3 e100 mii sata_sil ata_generic ohci_hcd libata scsi_mod dock thermal processor fan thermal_sys
[ 712.450538] Pid: 5636, comm: tar Not tainted 2.6.26-2-xen-amd64 #1
[ 712.450538] RIP: e030:[<ffffffff8037fa9c>] [<ffffffff8037fa9c>] xen_spin_wait+0x90/0x139
[ 712.450538] RSP: e02b:ffffffff80595e38 EFLAGS: 00010246
[ 712.450538] RAX: ffff88001b1adfd8 RBX: ffff880001bbda20 RCX: ffff88008162e000
[ 712.450538] RDX: ffffffffff5f7000 RSI: 0000000000008483 RDI: 0000000000000000
[ 712.450538] RBP: 0000000000000103 R08: 00000000dc38d979 R09: ffffffffffffffff
[ 712.450538] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 712.450538] R13: ffff88001fe7f698 R14: ffff88001fe7f698 R15: 0000000000000000
[ 712.450538] FS: 00007f3ef40eb6e0(0000) GS:ffffffff80539000(0000) knlGS:0000000000000000
[ 712.450538] CS: e033 DS: 0000 ES: 0000
[ 712.450538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 712.450538] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 712.450538] Process tar (pid: 5636, threadinfo ffff88001b1ac000, task ffff88001e270280)
[ 712.450538] Stack: ffff88001da47480 ffff88001fe7f698 0000000000008483 0000000000000000
[ 712.450538] 0000000000000013 ffffffff80435886 0000000000000000 ffff88001d8f8700
[ 712.450538] 0000000000000000 ffffffffa00854f4 0000000000000000 ffffc2000001ac00
[ 712.450538] Call Trace:
[ 712.450538] <IRQ> [<ffffffff80435886>] ? _spin_lock+0x3a/0x42
[ 712.450538] [<ffffffffa00854f4>] ? :sata_sil:sil_interrupt+0x2b/0x275
[ 712.450538] [<ffffffff8025f0e6>] ? handle_IRQ_event+0x47/0x8e
[ 712.450538] [<ffffffff802605a2>] ? handle_level_irq+0xae/0x116
[ 712.450538] [<ffffffff8020e13e>] ? do_IRQ+0x4e/0x9a
[ 712.450538] [<ffffffff8037d4cc>] ? evtchn_do_upcall+0x13c/0x1fc
[ 712.450538] [<ffffffff8020bbde>] ? do_hypervisor_callback+0x1e/0x30
[ 712.450538] <EOI> [<ffffffff802e48d4>] ? dummy_inode_setattr+0x0/0x3
[ 712.450538] [<ffffffff8037cbf9>] ? xen_poll_irq+0x67/0x74
[ 712.450538] [<ffffffff8037fb06>] ? xen_spin_wait+0xfa/0x139
[ 712.450538] [<ffffffff80435cfd>] ? lock_kernel+0x4f/0x63
[ 712.450538] [<ffffffffa01dcd12>] ? :reiserfs:reiserfs_setattr+0x25/0x27d
[ 712.450538] [<ffffffff8031156f>] ? __up_read+0x13/0x8a
[ 712.450538] [<ffffffffa01f4064>] ? :reiserfs:reiserfs_getxattr+0xa9/0xbe
[ 712.450538] [<ffffffff8029e7e3>] ? notify_change+0x174/0x2f5
[ 712.450538] [<ffffffff80288db5>] ? chown_common+0x8c/0xa1
[ 712.450538] [<ffffffff802a1923>] ? mnt_want_write+0x31/0x86
[ 712.450538] [<ffffffff80288f83>] ? sys_chown+0x4b/0x6f
[ 712.450538] [<ffffffff8029a9d8>] ? dput+0x35/0x13e
[ 712.450538] [<ffffffff802ab10e>] ? sys_futimesat+0x88/0x9e
[ 712.450538] [<ffffffff8020b528>] ? system_call+0x68/0x6d
[ 712.450538] [<ffffffff8020b4c0>] ? system_call+0x0/0x6d
[ 712.450538]
[ 712.450538]
[ 712.450538] Code: 30 fa 58 80 4c 39 2c 08 75 04 0f 0b eb fe 48 c7 c0 40 fa 58 80 eb 1f 65 48 8b 04 25 10 00 00 00 66 f7 80 44 e0 ff ff 00 ff 75 04 <0f> 0b eb fe 48 c7 c0 30 fa 58 80 48 8d 1c 08 48 83 3b 00 74 04
[ 712.450538] RIP [<ffffffff8037fa9c>] xen_spin_wait+0x90/0x139
[ 712.450538] RSP <ffffffff80595e38>
[ 712.450538] ---[ end trace 73e60cdc01c1f34c ]---
[ 712.450538] Kernel panic - not syncing: Aiee, killing interrupt handler!
Server info:
buki:~# cat /proc/version
Linux version 2.6.26-2-xen-amd64 (Debian 2.6.26-17lenny2) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri Aug 14 10:19:53 UTC 2009
buki:~# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 280
stepping : 2
cpu MHz : 2391.532
cache size : 1024 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de tsc msr pae cx8 apic mtrr cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy
bogomips : 4785.06
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 280
stepping : 2
cpu MHz : 2391.532
cache size : 1024 KB
physical id : 0
siblings : 1
core id : 1
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de tsc msr pae cx8 apic mtrr cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy
bogomips : 4785.06
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor : 2
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 280
stepping : 2
cpu MHz : 2391.532
cache size : 1024 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de tsc msr pae cx8 apic mtrr cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy
bogomips : 4785.06
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor : 3
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 280
stepping : 2
cpu MHz : 2391.532
cache size : 1024 KB
physical id : 1
siblings : 1
core id : 1
cpu cores : 1
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de tsc msr pae cx8 apic mtrr cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good pni lahf_lm cmp_legacy
bogomips : 4785.06
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
Reply to: