[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev



On 15:07 Sun 15 May     , Ben Hutchings wrote:
> On Fri, 2011-05-13 at 16:34 +0300, Apollon Oikonomopoulos wrote:
> [...]
> > Having upgraded from lenny to squeeze last week, we encountered the following
> > crash during a SCSI bus rescan that added new disks to a system:
> > 
> > [ 1258.343275] ------------[ cut here ]------------
> > [ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> > [ 1258.343287] kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539!
> > [ 1258.343289] invalid opcode: 0000 [#2] SMP
> > [ 1258.343292] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable
> > [ 1258.343295] CPU 4
> > [ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm ipmi_msghandler i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac snd soundcore snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr rng_core evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc tg3 ehci_hcd bnx2 scsi_tgt usbcore nls_base cciss libphy scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
> > [ 1258.343332] Pid: 12287, comm: async/20 Tainted: G      D W  2.6.32-5-amd64 #1 ProLiant BL460c G1
> [...]
> 
> We really need to see the first BUG message after boot.  The 'D' here
> indicates that this is not the first.
> 
> Ben.

Hi Ben,

You're right, this was not the first occurence. I was trying to forcibly
reproduce the problem by adding and removing ~1.5k SCSI disks to the system,
which caused a lot of WARNINGS like the following to appear first:

May 12 18:13:27 hn-11 kernel: [  513.803626] ------------[ cut here ]------------
May 12 18:13:27 hn-11 kernel: [  513.803636] WARNING: at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/sysfs.h:139 __sysfs_get+0x20/0x28()
May 12 18:13:27 hn-11 kernel: [  513.803639] Hardware name: ProLiant BL460c G1
May 12 18:13:27 hn-11 kernel: [  513.803641] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm snd_pcm drm_kms_helper snd_timer drm i2c_algo_bit snd soundcore snd_page_alloc i2c_core hpwdt i5k_amb ipmi_si i5000_edac pcspkr rng_core ipmi_msghandler hpilo psmouse edac_core serio_raw evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc cciss ehci_hcd tg3 libphy usbcore scsi_tgt bnx2 nls_base scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
May 12 18:13:27 hn-11 kernel: [  513.803725] Pid: 24914, comm: async/17 Not tainted 2.6.32-5-amd64 #1
May 12 18:13:27 hn-11 kernel: [  513.803727] Call Trace:
May 12 18:13:27 hn-11 kernel: [  513.803733]  [<ffffffff8113efad>] ? __sysfs_get+0x20/0x28
May 12 18:13:27 hn-11 kernel: [  513.803736]  [<ffffffff8113efad>] ? __sysfs_get+0x20/0x28
May 12 18:13:27 hn-11 kernel: [  513.803740]  [<ffffffff8104db34>] ? warn_slowpath_common+0x77/0xa3
May 12 18:13:27 hn-11 kernel: [  513.803744]  [<ffffffff8113efad>] ? __sysfs_get+0x20/0x28
May 12 18:13:27 hn-11 kernel: [  513.803747]  [<ffffffff8113f0f1>] ? __sysfs_add_one+0x2b/0x84
May 12 18:13:27 hn-11 kernel: [  513.803750]  [<ffffffff8113f1a0>] ? sysfs_add_one+0x19/0xe4
May 12 18:13:27 hn-11 kernel: [  513.803754]  [<ffffffff8113ec39>] ? sysfs_add_file_mode+0x4e/0x7f
May 12 18:13:27 hn-11 kernel: [  513.803759]  [<ffffffff811761aa>] ? elv_register_queue+0x4f/0x6f
May 12 18:13:27 hn-11 kernel: [  513.803764]  [<ffffffff8118024b>] ? blk_register_queue+0x7f/0xcc
May 12 18:13:27 hn-11 kernel: [  513.803768]  [<ffffffff81184021>] ? add_disk+0xb8/0x108
May 12 18:13:27 hn-11 kernel: [  513.803776]  [<ffffffffa01566be>] ? sd_probe_async+0x119/0x1d8 [sd_mod]
May 12 18:13:27 hn-11 kernel: [  513.803781]  [<ffffffff810698a7>] ? async_thread+0x0/0x20d
May 12 18:13:27 hn-11 kernel: [  513.803784]  [<ffffffff810699a6>] ? async_thread+0xff/0x20d
May 12 18:13:27 hn-11 kernel: [  513.803789]  [<ffffffff81049fee>] ? default_wake_function+0x0/0x9
May 12 18:13:27 hn-11 kernel: [  513.803792]  [<ffffffff810698a7>] ? async_thread+0x0/0x20d
May 12 18:13:27 hn-11 kernel: [  513.803795]  [<ffffffff81064721>] ? kthread+0x79/0x81
May 12 18:13:27 hn-11 kernel: [  513.803800]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
May 12 18:13:27 hn-11 kernel: [  513.803803]  [<ffffffff810646a8>] ? kthread+0x0/0x81
May 12 18:13:27 hn-11 kernel: [  513.803806]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
May 12 18:13:27 hn-11 kernel: [  513.803808] ---[ end trace 293bee1f4c673c1d ]---
May 12 18:13:27 hn-11 kernel: [  513.803829] ------------[ cut here ]------------
May 12 18:13:27 hn-11 kernel: [  513.804176] sd 1:0:3:161: [sdbhz] 10485760 512-byte logical blocks: (5.36 GB/5.00 GiB)
May 12 18:13:27 hn-11 kernel: [  513.804410] sd 1:0:3:161: [sdbhz] Write Protect is off
May 12 18:13:27 hn-11 kernel: [  513.804418] sd 1:0:3:161: [sdbhz] Mode Sense: bd 00 00 08
May 12 18:13:27 hn-11 kernel: [  513.804789] sd 1:0:3:161: [sdbhz] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
May 12 18:13:27 hn-11 kernel: [  513.816268] kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539!
May 12 18:13:27 hn-11 kernel: [  513.816268] invalid opcode: 0000 [#1] SMP
May 12 18:13:27 hn-11 kernel: [  513.816268] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:79/state
May 12 18:13:27 hn-11 kernel: [  513.816268] CPU 5
May 12 18:13:27 hn-11 kernel: [  513.816268] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm snd_pcm drm_kms_helper snd_timer drm i2c_algo_bit snd soundcore snd_page_alloc i2c_core hpwdt i5k_amb ipmi_si i5000_edac pcspkr rng_core ipmi_msghandler hpilo psmouse edac_core serio_raw evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc cciss ehci_hcd tg3 libphy usbcore scsi_tgt bnx2 nls_base scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
May 12 18:13:27 hn-11 kernel: [  513.834234] Pid: 24914, comm: async/17 Tainted: G        W  2.6.32-5-amd64 #1 ProLiant BL460c G1
May 12 18:13:27 hn-11 kernel: [  513.834234] RIP: 0010:[<ffffffff8113ecff>]  [<ffffffff8113ecff>] sysfs_create_file+0x13/0x21
May 12 18:13:27 hn-11 kernel: [  513.864414] RSP: 0018:ffff880422125dd8  EFLAGS: 00010246
May 12 18:13:27 hn-11 kernel: [  513.864414] RAX: 0000000000000000 RBX: ffffffff81485598 RCX: ffff88042e0e4000
May 12 18:13:27 hn-11 kernel: [  513.864414] RDX: 0000000000000000 RSI: ffffffff81485598 RDI: 0000000000000000
May 12 18:13:27 hn-11 kernel: [  513.864414] RBP: ffff880410b26110 R08: 0000000000000000 R09: 0000000047d20d3e
May 12 18:13:27 hn-11 kernel: [  513.864414] R10: 0000000000000292 R11: ffffffff81027d54 R12: 0000000000000000
May 12 18:13:27 hn-11 kernel: [  513.864414] R13: 0000000000000000 R14: ffff880406db5c58 R15: 0000000000000000
May 12 18:13:27 hn-11 kernel: [  513.864414] FS:  0000000000000000(0000) GS:ffff88000fd40000(0000) knlGS:0000000000000000
May 12 18:13:27 hn-11 kernel: [  513.864414] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
May 12 18:13:27 hn-11 kernel: [  513.864414] CR2: 00007f4adc08c5e8 CR3: 00000004067c4000 CR4: 00000000000426e0
May 12 18:13:27 hn-11 kernel: [  513.864414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 12 18:13:27 hn-11 kernel: [  513.864414] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 12 18:13:27 hn-11 kernel: [  513.864414] Process async/17 (pid: 24914, threadinfo ffff880422124000, task ffff880413653f90)
May 12 18:13:27 hn-11 kernel: [  513.864414] Stack:
May 12 18:13:27 hn-11 kernel: [  513.864414]  ffffffff811761aa 0000000000000000 ffff880417718bf8 ffff8804177188d0
May 12 18:13:27 hn-11 kernel: [  513.864414] <0> ffffffff8118024b ffff880406db5c00 ffff880406db5c00 ffff880417738000
May 12 18:13:27 hn-11 kernel: [  513.864414] <0> ffff880417738128 0000000000000001 ffffffff81184021 ffff880406db5c00
May 12 18:13:27 hn-11 kernel: [  513.864414] Call Trace:
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff811761aa>] ? elv_register_queue+0x4f/0x6f
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff8118024b>] ? blk_register_queue+0x7f/0xcc
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff81184021>] ? add_disk+0xb8/0x108
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffffa01566be>] ? sd_probe_async+0x119/0x1d8 [sd_mod]
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff810698a7>] ? async_thread+0x0/0x20d
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff810699a6>] ? async_thread+0xff/0x20d
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff81049fee>] ? default_wake_function+0x0/0x9
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff810698a7>] ? async_thread+0x0/0x20d
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff81064721>] ? kthread+0x79/0x81
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff810646a8>] ? kthread+0x0/0x81
May 12 18:13:27 hn-11 kernel: [  513.864414]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
May 12 18:13:27 hn-11 kernel: [  513.864414] Code: 74 0f 48 89 ef e8 24 07 00 00 eb 05 bb fe ff ff ff 89 d8 5b 5d 41 5c c3 48 85 ff 74 0e 48 8b 7f 30 48 85 ff 74 05 48 85 f6 75 04 <0f> 0b eb fe ba 02 00 00 00 e9 5d ff ff ff 55 53 48 89 fb 48 c7
May 12 18:13:27 hn-11 kernel: [  513.864414] RIP  [<ffffffff8113ecff>] sysfs_create_file+0x13/0x21
May 12 18:13:27 hn-11 kernel: [  513.864414]  RSP <ffff880422125dd8>
May 12 18:13:27 hn-11 kernel: [  513.951869] ---[ end trace 293bee1f4c673c1e ]---


The following is the backtrace from the first crash we experienced, where the
BUG appeared directly on an untainted kernel. In general, when trying to
reproduce the bug, it was purely random whether the BUG or the WARNING would
appear first.

May 11 22:57:29 hn-11 kernel: [179009.323165] sd 1:0:2:209: [sdbbf] Attached SCSI disk
May 11 22:57:33 hn-11 kernel: [179009.330102]  unknown partition table
May 11 22:57:33 hn-11 kernel: [179009.331182] ------------[ cut here ]------------
May 11 22:57:33 hn-11 kernel: [179009.342116] kernel BUG at /build/buildd-linux-2.6_2.6.32-31-amd64-vrfdM4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539!
May 11 22:57:33 hn-11 kernel: [179009.342116] invalid opcode: 0000 [#1] SMP 
May 11 22:57:33 hn-11 kernel: [179009.342116] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:10:00.0/host0/rport-0:0-2/target0:0:2/fc_transport/target0:0:2/node_name
May 11 22:57:33 hn-11 kernel: [179009.707269] CPU 4 
May 11 22:57:33 hn-11 kernel: [179009.707269] Modules linked in: tun kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm drm_kms_helper snd_pcm snd_timer snd soundcore snd_page_alloc drm i2c_algo_bit ipmi_si i5000_edac ipmi_msghandler i2c_core hpilo hpwdt edac_core psmouse pcspkr rng_core serio_raw i5k_amb evdev shpchp pci_hotplug button container processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx tg3 cciss libphy scsi_transport_fc ehci_hcd scsi_tgt bnx2 usbcore nls_base scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
May 11 22:57:33 hn-11 kernel: [179009.885838] Pid: 13134, comm: async/0 Not tainted 2.6.32-5-amd64 #1 ProLiant BL460c G1
May 11 22:57:33 hn-11 kernel: [179009.885838] RIP: 0010:[<ffffffff8113f7ce>]  [<ffffffff8113f7ce>] sysfs_create_file+0x13/0x21
May 11 22:57:33 hn-11 kernel: [179009.885838] RSP: 0018:ffff8800c2695dd8  EFLAGS: 00010246
May 11 22:57:33 hn-11 kernel: [179009.885838] RAX: 0000000000000000 RBX: ffffffff814896b0 RCX: 0000000000000001
May 11 22:57:33 hn-11 kernel: [179009.885838] RDX: 0000000000000000 RSI: ffffffff814896b0 RDI: 0000000000000000
May 11 22:57:33 hn-11 kernel: [179009.885838] RBP: ffff880385d74b10 R08: 0000000000000000 R09: ffff8800c2695c70
May 11 22:57:33 hn-11 kernel: [179009.885838] R10: 00000000fffffff4 R11: ffff880426ae4410 R12: 0000000000000000
May 11 22:57:33 hn-11 kernel: [179009.885838] R13: 0000000000000000 R14: ffff880426d97c58 R15: 0000000000000000
May 11 22:57:33 hn-11 kernel: [179009.885838] FS:  0000000000000000(0000) GS:ffff88000fd00000(0000) knlGS:0000000000000000
May 11 22:57:33 hn-11 kernel: [179009.885838] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
May 11 22:57:33 hn-11 kernel: [179009.885838] CR2: 00007f3714d71ae0 CR3: 000000021d863000 CR4: 00000000000426e0
May 11 22:57:33 hn-11 kernel: [179009.885838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 11 22:57:33 hn-11 kernel: [179011.482707] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 11 22:57:33 hn-11 kernel: [179011.482707] Process async/0 (pid: 13134, threadinfo ffff8800c2694000, task ffff8802471b7810)
May 11 22:57:33 hn-11 kernel: [179011.482707] Stack:
May 11 22:57:33 hn-11 kernel: [179011.482707]  ffffffff81176d5e 0000000000000000 ffff88040508c9a8 ffff88040508c680
May 11 22:57:33 hn-11 kernel: [179011.482707] <0> ffffffff81180dff ffff880426d97c00 ffff880426d97c00 ffff880326311800
May 11 22:57:33 hn-11 kernel: [179011.482707] <0> ffff880326311928 0000000000000001 ffffffff81184c41 ffff880426d97c00
May 11 22:57:33 hn-11 kernel: [179011.482707] Call Trace:
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81176d5e>] ? elv_register_queue+0x4f/0x6f
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81180dff>] ? blk_register_queue+0x7f/0xcc
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81184c41>] ? add_disk+0xb8/0x108
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffffa015170e>] ? sd_probe_async+0x111/0x1d0 [sd_mod]
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81069dd7>] ? async_thread+0x0/0x20d
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81069ed6>] ? async_thread+0xff/0x20d
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff8104a42f>] ? default_wake_function+0x0/0x9
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81069dd7>] ? async_thread+0x0/0x20d
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81064bc9>] ? kthread+0x79/0x81
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81064b50>] ? kthread+0x0/0x81
May 11 22:57:33 hn-11 kernel: [179011.482707]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
May 11 22:57:33 hn-11 kernel: [179011.482707] Code: 74 0f 48 89 ef e8 25 07 00 00 eb 05 bb fe ff ff ff 89 d8 5b 5d 41 5c c3 48 85 ff 74 0e 48 8b 7f 30 48 85 ff 74 05 48 85 f6 75 04 <0f> 0b eb fe ba 02 00 00 00 e9 5d ff ff ff 55 53 48 89 fb 48 c7 
May 11 22:57:33 hn-11 kernel: [179011.482707] RIP  [<ffffffff8113f7ce>] sysfs_create_file+0x13/0x21
May 11 22:57:33 hn-11 kernel: [179011.482707]  RSP <ffff8800c2695dd8>
May 11 22:57:33 hn-11 kernel: [179013.112125] ---[ end trace d6c359162ef351cc ]---

Thanks,
Apollon



Reply to: