Hey there, I am not sure if this is the best place to chime in, but here goes: I am having the same issue in lenny/5.0 with version of 2.6.26-26lenny1 xen-linux-system-2.6.26-2-xen-amd64 on several identical servers running Xen. It seems the aacraid driver is at fault and dies on the same line of 'drivers/scsi/aacraid/aachba.c'. This seems to occur during periods of extreme IO/CPU load when we run duplicity on our database data, but this is an observation and could be incorrect. Here is two traces from two servers experiencing this issue: "[311083.335680] ------------[ cut here ]------------ [311083.335707] kernel BUG at drivers/scsi/aacraid/aachba.c:2825! [311083.335736] invalid opcode: 0000 [1] SMP [311083.335764] CPU 0 [311083.335764] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge ipv6 ipmi_devintf ipmi_si ipmi_msghandler xenblktap loop psmouse serio_raw i2c_i801 i2c_core pcspkr button joydev evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_pci_generic ide_core ata_generic sd_mod usbhid hid ff_memless ata_piix libata aacraid uhci_hcd ehci_hcd dock igb scsi_mod thermal processor fan thermal_sys [last unloaded: scsi_wait_scan] [311083.336104] Pid: 31747, comm: duplicity Not tainted 2.6.26-2-xen-amd64 #1 [311083.336104] RIP: e030:[<ffffffffa007cb4b>] [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116 [311083.340104] RSP: e02b:ffffffff80595c50 EFLAGS: 00010082 [311083.340104] RAX: 00000000fffffff4 RBX: 0000000000000000 RCX: 00000000fffffff4 [311083.340104] RDX: ffff8800521d4000 RSI: ffff8800521d4000 RDI: ffff88007f443870 [311083.340104] RBP: ffff88007ca08034 R08: 0000000000000000 R09: ffffffff80595700 [311083.340104] R10: 0000000000000000 R11: 000001f496193157 R12: 00000000fffffff4 [311083.340104] R13: ffff88004c44c5c0 R14: ffff88007c8e0780 R15: ffff88004c44c5c0 [311083.340104] FS: 00007f50404596e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000 [311083.340104] CS: e033 DS: 0000 ES: 0000 [311083.340104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [311083.340104] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [311083.340104] Process duplicity (pid: 31747, threadinfo ffff88007db8c000, task ffff88007f54c840) [311083.340104] Stack: 0000000300001000 0000000000000000 ffff88007ca08020 0000000000040000 [311083.340104] 0000000000000000 ffff88007c8e0780 ffff88004c44c5c0 ffffffffa007d800 [311083.340104] 0000000000000200 ffffffffa002287a 000012127d1657d8 00000000177fd6c1 [311083.340104] Call Trace: [311083.340104] <IRQ> [<ffffffffa007d800>] ? :aacraid:aac_write_raw_io+0x90/0xea [311083.340104] [<ffffffffa002287a>] ? :scsi_mod:scsi_init_sgtable+0x6b/0x95 [311083.340104] [<ffffffffa007c0e8>] ? :aacraid:aac_scsi_cmd+0xd40/0x10f2 [311083.340104] [<ffffffff80235c38>] ? lock_timer_base+0x26/0x4b [311083.340104] [<ffffffff80235dde>] ? __mod_timer+0xd4/0xe3 [311083.340104] [<ffffffffa007a77c>] ? :aacraid:aac_queuecommand+0x6e/0x7d [311083.340104] [<ffffffffa001dcbd>] ? :scsi_mod:scsi_dispatch_cmd+0x1da/0x26c [311083.340104] [<ffffffffa0023f0e>] ? :scsi_mod:scsi_request_fn+0x303/0x436 [311083.340104] [<ffffffff8030070d>] ? __blk_run_queue+0x71/0xcf [311083.340104] [<ffffffff8030078c>] ? blk_run_queue+0x21/0x34 [311083.340104] [<ffffffffa0022735>] ? :scsi_mod:scsi_next_command+0x2d/0x39 [311083.340104] [<ffffffffa0022957>] ? :scsi_mod:scsi_end_request+0x74/0x82 [311083.340104] [<ffffffffa002364b>] ? :scsi_mod:scsi_io_completion+0x1c0/0x3bf [311083.340104] [<ffffffff802ff180>] ? blk_done_softirq+0x97/0xa5 [311083.340104] [<ffffffff8037d4ff>] ? startup_pirq+0xfe/0x109 [311083.340104] [<ffffffff80231c98>] ? __do_softirq+0x77/0x103 [311083.340104] [<ffffffff8020c13c>] ? call_softirq+0x1c/0x28 [311083.340104] [<ffffffff8020e092>] ? do_softirq+0x55/0xbb [311083.340104] [<ffffffff8020e175>] ? do_IRQ+0x7d/0x9a [311083.340104] [<ffffffff8037df18>] ? evtchn_do_upcall+0x13c/0x1fc [311083.340104] [<ffffffff8020bbde>] ? do_hypervisor_callback+0x1e/0x30 [311083.340104] <EOI> [<ffffffff8037d1e6>] ? force_evtchn_callback+0xa/0xb [311083.340104] [<ffffffff8026467d>] ? find_get_page+0x68/0x6f [311083.340104] [<ffffffff802661b3>] ? generic_file_aio_read+0x1b4/0x4b7 [311083.340104] [<ffffffff8028a583>] ? do_sync_read+0xc9/0x10c [311083.340104] [<ffffffff8020e7bc>] ? get_nsec_offset+0x9/0x2c [311083.340104] [<ffffffff8023f6ad>] ? autoremove_wake_function+0x0/0x2e [311083.340104] [<ffffffff804350f3>] ? thread_return+0x3e/0xdb [311083.340104] [<ffffffff8028ad74>] ? vfs_read+0xaa/0x152 [311083.340104] [<ffffffff8028b155>] ? sys_read+0x45/0x6e [311083.340104] [<ffffffff8020b528>] ? system_call+0x68/0x6d [311083.340104] [<ffffffff8020b4c0>] ? system_call+0x0/0x6d [311083.340104] [311083.340104] [311083.340104] Code: 00 00 c7 46 0c 00 00 00 00 c7 46 10 00 00 00 00 c7 46 14 00 00 00 00 c7 46 18 00 00 00 00 e8 d1 76 fa ff 83 f8 00 41 89 c4 7d 04 <0f> 0b eb fe 75 08 45 31 ff e9 a7 00 00 00 49 8b bd b0 00 00 00 [311083.340104] RIP [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116 [311083.340104] RSP <ffffffff80595c50> [311083.340104] ---[ end trace 2a3216fa63bee17b ]---" And the second: "[358983.871010] ------------[ cut here ]------------ [358983.871010] kernel BUG at drivers/scsi/aacraid/aachba.c:2825! [358983.871010] invalid opcode: 0000 [1] SMP [358983.871010] CPU 0 [358983.871010] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables bridge ipv6 ipmi_devintf ipmi_si ipmi_msghandler xenblktap loop i2c_i801 psmouse i2c_core serio_raw pcspkr button joydev evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod sg sr_mod cdrom ide_pci_generic ide_core ata_generic sd_mod usbhid hid ff_memless ata_piix libata aacraid dock ehci_hcd uhci_hcd scsi_mod igb thermal processor fan thermal_sys [last unloaded: scsi_wait_scan] [358983.871010] Pid: 21082, comm: duplicity Not tainted 2.6.26-2-xen-amd64 #1 [358983.871010] RIP: e030:[<ffffffffa007cb4b>] [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116 [358983.871010] RSP: e02b:ffffffff80595c50 EFLAGS: 00010082 [358983.871010] RAX: 00000000fffffff4 RBX: 0000000000000000 RCX: 00000000fffffff4 [358983.871010] RDX: ffff88007dfa8800 RSI: 0000000000000001 RDI: ffff88007dfa8800 [358983.871010] RBP: ffff88007c50a834 R08: ffff880008366000 R09: ffffffff80595700 [358983.871010] R10: 0000000000000000 R11: 000001775f51db99 R12: 00000000fffffff4 [358983.871010] R13: ffff88007d032440 R14: ffff88007c4509d8 R15: ffff88007d032440 [358983.871010] FS: 00007f12fe4806e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000 [358983.871010] CS: e033 DS: 0000 ES: 0000 [358983.871010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [358983.871010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 1307333713: [358983.871010] Process duplicity (pid: 21082, threadinfo ffff88007de0c000, task ffff88007451e480) [358983.871010] Stack: 0000000300001000 0000000000000000 ffff88007c50a820 0000000000040000 1307333715: [358983.871010] 0000000000000000 ffff88007c4509d8 ffff88007d032440 ffffffffa007d800 [358983.871010] 0000000000000200 ffffffffa003887a 00006c6c7d6997d8 0000000001308cdc [358983.871010] Call Trace: [358983.871010] <IRQ> [<ffffffffa007d800>] ? :aacraid:aac_write_raw_io+0x90/0xea [358983.871010] [<ffffffffa003887a>] ? :scsi_mod:scsi_init_sgtable+0x6b/0x95 [358983.871010] [<ffffffffa007c0e8>] ? :aacraid:aac_scsi_cmd+0xd40/0x10f2 [358983.871010] [<ffffffff80235c38>] ? lock_timer_base+0x26/0x4b [358983.871010] [<ffffffff80235dde>] ? __mod_timer+0xd4/0xe3 [358983.871010] [<ffffffffa007a77c>] ? :aacraid:aac_queuecommand+0x6e/0x7d [358983.871010] [<ffffffffa0033cbd>] ? :scsi_mod:scsi_dispatch_cmd+0x1da/0x26c [358983.871010] [<ffffffffa0039f0e>] ? :scsi_mod:scsi_request_fn+0x303/0x436 [358983.871010] [<ffffffff8030070d>] ? __blk_run_queue+0x71/0xcf [358983.871010] [<ffffffff8030078c>] ? blk_run_queue+0x21/0x34 [358983.871010] [<ffffffffa0038735>] ? :scsi_mod:scsi_next_command+0x2d/0x39 [358983.871010] [<ffffffffa0038957>] ? :scsi_mod:scsi_end_request+0x74/0x82 [358983.871010] [<ffffffffa003964b>] ? :scsi_mod:scsi_io_completion+0x1c0/0x3bf [358983.871010] [<ffffffff802ff180>] ? blk_done_softirq+0x97/0xa5 [358983.871010] [<ffffffff8037d4ff>] ? startup_pirq+0xfe/0x109 [358983.871010] [<ffffffff80231c98>] ? __do_softirq+0x77/0x103 [358983.871010] [<ffffffff8020c13c>] ? call_softirq+0x1c/0x28 [358983.871010] [<ffffffff8020e092>] ? do_softirq+0x55/0xbb [358983.871010] [<ffffffff8020e175>] ? do_IRQ+0x7d/0x9a [358983.871010] [<ffffffff8037df18>] ? evtchn_do_upcall+0x13c/0x1fc [358983.871010] [<ffffffff8020bbde>] ? do_hypervisor_callback+0x1e/0x30 [358983.871010] <EOI> [358983.871010] [358983.871010] Code: 00 00 c7 46 0c 00 00 00 00 c7 46 10 00 00 00 00 c7 46 14 00 00 00 00 c7 46 18 00 00 00 00 e8 d1 d6 fb ff 83 f8 00 41 89 c4 7d 04 <0f> 0b eb fe 75 08 45 31 ff e9 a7 00 00 00 49 8b bd b0 00 00 00 [358983.871010] RIP [<ffffffffa007cb4b>] :aacraid:aac_build_sgraw+0x51/0x116 [358983.871010] RSP <ffffffff80595c50> [358983.871010] ---[ end trace 264cd7428e0ff025 ]---" Any help is very much appreciated! Cheers guys, --
Tim Vaillancourt System Administrator FillZ Inc. "Microsoft gives you Windows, Open-source gives you the whole house." |