[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#629865: xen-linux-system-2.6.26-2-xen-amd64 causes system crash when using aacraid driver



Package: xen-linux-system-2.6.26-2-xen-amd64
Version: 2.6.26-26lenny1
Severity: critical
Justification: breaks the whole system


Similar to: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=596419, we have several Debian lenny systems running Xen that crash at the aacraid driver ('drivers/scsi/aacraid/aachba.c:2825') when using the 2.6.26-26lenny1 version of the kernel. In our situation, this will happen every few days, maybe once a week while we run an IO/CPU heavy backup using 'duplicity'.

Following this, the Xen kernel notices the issue and reboots the system, printing this to the console "(XEN) Domain 0 crashed: rebooting machine in 5 seconds". When the system is rebooted, there are no useful logs in kern.log, syslog, etc, seemingly due to the aacraid driver crashing, which is providing the kernel access to the RAID array we log to, so nothing is logged.

Below are two stack traces for two servers running the exact same kernel version and dependencies:

"[311083.335680] ------------[ cut here ]------------
[311083.335707] kernel BUG at drivers/scsi/aacraid/aachba.c:2825!
[311083.335736] invalid opcode: 0000 [1] SMP 
[311083.335764] CPU 0 
[311083.335764] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge ipv6 ipmi_devintf ipmi_si ipmi_msghandler xenblktap loop psmouse serio_raw i2c_i801 i2c_core pcspkr button joydev evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_pci_generic ide_core ata_generic sd_mod usbhid hid ff_memless ata_piix libata aacraid uhci_hcd ehci_hcd dock igb scsi_mod thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
[311083.336104] Pid: 31747, comm: duplicity Not tainted 2.6.26-2-xen-amd64 #1
[311083.336104] RIP: e030:[<ffffffffa007cb4b>]  [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116
[311083.340104] RSP: e02b:ffffffff80595c50  EFLAGS: 00010082
[311083.340104] RAX: 00000000fffffff4 RBX: 0000000000000000 RCX: 00000000fffffff4
[311083.340104] RDX: ffff8800521d4000 RSI: ffff8800521d4000 RDI: ffff88007f443870
[311083.340104] RBP: ffff88007ca08034 R08: 0000000000000000 R09: ffffffff80595700
[311083.340104] R10: 0000000000000000 R11: 000001f496193157 R12: 00000000fffffff4
[311083.340104] R13: ffff88004c44c5c0 R14: ffff88007c8e0780 R15: ffff88004c44c5c0
[311083.340104] FS:  00007f50404596e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000
[311083.340104] CS:  e033 DS: 0000 ES: 0000
[311083.340104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[311083.340104] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[311083.340104] Process duplicity (pid: 31747, threadinfo ffff88007db8c000, task ffff88007f54c840)
[311083.340104] Stack:  0000000300001000 0000000000000000 ffff88007ca08020 0000000000040000
[311083.340104]  0000000000000000 ffff88007c8e0780 ffff88004c44c5c0 ffffffffa007d800
[311083.340104]  0000000000000200 ffffffffa002287a 000012127d1657d8 00000000177fd6c1
[311083.340104] Call Trace:
[311083.340104]  <IRQ>  [<ffffffffa007d800>] ? :aacraid:aac_write_raw_io+0x90/0xea
[311083.340104]  [<ffffffffa002287a>] ? :scsi_mod:scsi_init_sgtable+0x6b/0x95
[311083.340104]  [<ffffffffa007c0e8>] ? :aacraid:aac_scsi_cmd+0xd40/0x10f2
[311083.340104]  [<ffffffff80235c38>] ? lock_timer_base+0x26/0x4b
[311083.340104]  [<ffffffff80235dde>] ? __mod_timer+0xd4/0xe3
[311083.340104]  [<ffffffffa007a77c>] ? :aacraid:aac_queuecommand+0x6e/0x7d
[311083.340104]  [<ffffffffa001dcbd>] ? :scsi_mod:scsi_dispatch_cmd+0x1da/0x26c
[311083.340104]  [<ffffffffa0023f0e>] ? :scsi_mod:scsi_request_fn+0x303/0x436
[311083.340104]  [<ffffffff8030070d>] ? __blk_run_queue+0x71/0xcf
[311083.340104]  [<ffffffff8030078c>] ? blk_run_queue+0x21/0x34
[311083.340104]  [<ffffffffa0022735>] ? :scsi_mod:scsi_next_command+0x2d/0x39
[311083.340104]  [<ffffffffa0022957>] ? :scsi_mod:scsi_end_request+0x74/0x82
[311083.340104]  [<ffffffffa002364b>] ? :scsi_mod:scsi_io_completion+0x1c0/0x3bf
[311083.340104]  [<ffffffff802ff180>] ? blk_done_softirq+0x97/0xa5
[311083.340104]  [<ffffffff8037d4ff>] ? startup_pirq+0xfe/0x109
[311083.340104]  [<ffffffff80231c98>] ? __do_softirq+0x77/0x103
[311083.340104]  [<ffffffff8020c13c>] ? call_softirq+0x1c/0x28
[311083.340104]  [<ffffffff8020e092>] ? do_softirq+0x55/0xbb
[311083.340104]  [<ffffffff8020e175>] ? do_IRQ+0x7d/0x9a
[311083.340104]  [<ffffffff8037df18>] ? evtchn_do_upcall+0x13c/0x1fc
[311083.340104]  [<ffffffff8020bbde>] ? do_hypervisor_callback+0x1e/0x30
[311083.340104]  <EOI>  [<ffffffff8037d1e6>] ? force_evtchn_callback+0xa/0xb
[311083.340104]  [<ffffffff8026467d>] ? find_get_page+0x68/0x6f
[311083.340104]  [<ffffffff802661b3>] ? generic_file_aio_read+0x1b4/0x4b7
[311083.340104]  [<ffffffff8028a583>] ? do_sync_read+0xc9/0x10c
[311083.340104]  [<ffffffff8020e7bc>] ? get_nsec_offset+0x9/0x2c
[311083.340104]  [<ffffffff8023f6ad>] ? autoremove_wake_function+0x0/0x2e
[311083.340104]  [<ffffffff804350f3>] ? thread_return+0x3e/0xdb
[311083.340104]  [<ffffffff8028ad74>] ? vfs_read+0xaa/0x152
[311083.340104]  [<ffffffff8028b155>] ? sys_read+0x45/0x6e
[311083.340104]  [<ffffffff8020b528>] ? system_call+0x68/0x6d
[311083.340104]  [<ffffffff8020b4c0>] ? system_call+0x0/0x6d
[311083.340104] 
[311083.340104] 
[311083.340104] Code: 00 00 c7 46 0c 00 00 00 00 c7 46 10 00 00 00 00 c7 46 14 00 00 00 00 c7 46 18 00 00 00 00 e8 d1 76 fa ff 83 f8 00 41 89 c4 7d 04 <0f> 0b eb fe 75 08 45 31 ff e9 a7 00 00 00 49 8b bd b0 00 00 00 
[311083.340104] RIP  [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116
[311083.340104]  RSP <ffffffff80595c50>
[311083.340104] ---[ end trace 2a3216fa63bee17b ]---"

And the second:

"[358983.871010] ------------[ cut here ]------------
[358983.871010] kernel BUG at drivers/scsi/aacraid/aachba.c:2825!
[358983.871010] invalid opcode: 0000 [1] SMP 
[358983.871010] CPU 0 
[358983.871010] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge ipv6 ipmi_devintf ipmi_si ipmi_msghandler xenblktap loop i2c_i801 psmouse i2c_core serio_raw pcspkr button joydev evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod sg sr_mod cdrom ide_pci_generic ide_core ata_generic sd_mod usbhid hid ff_memless ata_piix libata aacraid dock ehci_hcd uhci_hcd scsi_mod igb thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
[358983.871010] Pid: 21082, comm: duplicity Not tainted 2.6.26-2-xen-amd64 #1
[358983.871010] RIP: e030:[<ffffffffa007cb4b>]  [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116
[358983.871010] RSP: e02b:ffffffff80595c50  EFLAGS: 00010082
[358983.871010] RAX: 00000000fffffff4 RBX: 0000000000000000 RCX: 00000000fffffff4
[358983.871010] RDX: ffff88007dfa8800 RSI: 0000000000000001 RDI: ffff88007dfa8800
[358983.871010] RBP: ffff88007c50a834 R08: ffff880008366000 R09: ffffffff80595700
[358983.871010] R10: 0000000000000000 R11: 000001775f51db99 R12: 00000000fffffff4
[358983.871010] R13: ffff88007d032440 R14: ffff88007c4509d8 R15: ffff88007d032440
[358983.871010] FS:  00007f12fe4806e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000
[358983.871010] CS:  e033 DS: 0000 ES: 0000
[358983.871010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[358983.871010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
1307333713: [358983.871010] Process duplicity (pid: 21082, threadinfo ffff88007de0c000, task ffff88007451e480)
[358983.871010] Stack:  0000000300001000 0000000000000000 ffff88007c50a820 0000000000040000
1307333715: [358983.871010]  0000000000000000 ffff88007c4509d8 ffff88007d032440 ffffffffa007d800
[358983.871010]  0000000000000200 ffffffffa003887a 00006c6c7d6997d8 0000000001308cdc
[358983.871010] Call Trace:
[358983.871010]  <IRQ>  [<ffffffffa007d800>] ? :aacraid:aac_write_raw_io+0x90/0xea
[358983.871010]  [<ffffffffa003887a>] ? :scsi_mod:scsi_init_sgtable+0x6b/0x95
[358983.871010]  [<ffffffffa007c0e8>] ? :aacraid:aac_scsi_cmd+0xd40/0x10f2
[358983.871010]  [<ffffffff80235c38>] ? lock_timer_base+0x26/0x4b
[358983.871010]  [<ffffffff80235dde>] ? __mod_timer+0xd4/0xe3
[358983.871010]  [<ffffffffa007a77c>] ? :aacraid:aac_queuecommand+0x6e/0x7d
[358983.871010]  [<ffffffffa0033cbd>] ? :scsi_mod:scsi_dispatch_cmd+0x1da/0x26c
[358983.871010]  [<ffffffffa0039f0e>] ? :scsi_mod:scsi_request_fn+0x303/0x436
[358983.871010]  [<ffffffff8030070d>] ? __blk_run_queue+0x71/0xcf
[358983.871010]  [<ffffffff8030078c>] ? blk_run_queue+0x21/0x34
[358983.871010]  [<ffffffffa0038735>] ? :scsi_mod:scsi_next_command+0x2d/0x39
[358983.871010]  [<ffffffffa0038957>] ? :scsi_mod:scsi_end_request+0x74/0x82
[358983.871010]  [<ffffffffa003964b>] ? :scsi_mod:scsi_io_completion+0x1c0/0x3bf
[358983.871010]  [<ffffffff802ff180>] ? blk_done_softirq+0x97/0xa5
[358983.871010]  [<ffffffff8037d4ff>] ? startup_pirq+0xfe/0x109
[358983.871010]  [<ffffffff80231c98>] ? __do_softirq+0x77/0x103
[358983.871010]  [<ffffffff8020c13c>] ? call_softirq+0x1c/0x28
[358983.871010]  [<ffffffff8020e092>] ? do_softirq+0x55/0xbb
[358983.871010]  [<ffffffff8020e175>] ? do_IRQ+0x7d/0x9a
[358983.871010]  [<ffffffff8037df18>] ? evtchn_do_upcall+0x13c/0x1fc
[358983.871010]  [<ffffffff8020bbde>] ? do_hypervisor_callback+0x1e/0x30
[358983.871010]  <EOI> 
[358983.871010] 
[358983.871010] Code: 00 00 c7 46 0c 00 00 00 00 c7 46 10 00 00 00 00 c7 46 14 00 00 00 00 c7 46 18 00 00 00 00 e8 d1 d6 fb ff 83 f8 00 41 89 c4 7d 04 <0f> 0b eb fe 75 08 45 31 ff e9 a7 00 00 00 49 8b bd b0 00 00 00 
[358983.871010] RIP  [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116
[358983.871010]  RSP <ffffffff80595c50>
[358983.871010] ---[ end trace 264cd7428e0ff025 ]---"

Any suggestions are greatly appreciated. I will attempt a hand-compiled aacraid driver to see if this issue goes away, but it is purely an experiment out of interest and I of course would prefer a solid fix.

Best regards,

Tim Vaillancourt

-- System Information:
Debian Release: 5.0.8
  APT prefers oldstable
  APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-2-xen-amd64 (SMP w/16 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages xen-linux-system-2.6.26-2-xen-amd64 depends on:
ii  linux-image-2.6.26-2-xen 2.6.26-26lenny1 Linux 2.6.26 image on AMD64, oldst
ii  xen-hypervisor-3.2-1-amd 3.2.1-2         The Xen Hypervisor on AMD64

xen-linux-system-2.6.26-2-xen-amd64 recommends no packages.

xen-linux-system-2.6.26-2-xen-amd64 suggests no packages.

-- no debconf information



Reply to: