[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#688224: linux-image-3.2.0-3-amd64: deadlock when failing disks while reshape from md RAID1 to RAID5



Package: src:linux
Version: 3.2.23-1
Severity: normal
Tags: upstream

Dear Maintainer,

When growing RAID1 two-device array to three device RAID5:

        # mdadm --manage /dev/md2 --add /dev/sdc4
        mdadm: added /dev/sde4

        # mdadm --grow /dev/md2 --level=raid5 --raid-devices=3
        mdadm: level of /dev/md2 changed to raid5
        mdadm: Need to backup 128K of critical section..

If during this operation two disks marked failed with mdadm or being disconnected, array goes to FAILED state,
but reshape operation not interrupted. This leads to situation when array can't be stopped and system can't
recovery without reboot.

  # mdadm --manage /dev/md2 --fail /dev/sde4
  mdadm: set /dev/sdb4 faulty in /dev/md2
  # mdadm --manage /dev/md2 --fail /dev/sdb4 
  mdadm: set /dev/sdc4 faulty in /dev/md2

  # mdadm -D /dev/md2 
  /dev/md2:
          Version : 1.2
    Creation Time : Thu Sep 20 15:27:48 2012
       Raid Level : raid5
       Array Size : 58116992 (55.42 GiB 59.51 GB)
    Used Dev Size : 58116992 (55.42 GiB 59.51 GB)
     Raid Devices : 3
    Total Devices : 3
      Persistence : Superblock is persistent

      Update Time : Thu Sep 20 15:59:00 2012
            State : active, FAILED, reshaping 
   Active Devices : 1
  Working Devices : 1
   Failed Devices : 2
    Spare Devices : 0

           Layout : left-symmetric
       Chunk Size : 64K

   Reshape Status : 6% complete
    Delta Devices : 1, (2->3)

             Name : md-test:2  (local to host md-test)
             UUID : ef6ebcd3:68b2ec22:3c762511:e4343695
           Events : 77

      Number   Major   Minor   RaidDevice State
         0       8        4        0      active sync   /dev/sda4
         1       8       20        1      faulty spare rebuilding   /dev/sdb4
         2       8       36        2      faulty spare rebuilding   /dev/sdc4

  # mdadm --stop /dev/md2
  mdadm: failed to stop array /dev/md2: Device or resource busy
  Perhaps a running process, mounted filesystem or active volume group?

  # cat /proc/mdstat 
  Personalities : [raid1] [raid6] [raid5] [raid4] 
  md2 : active raid5 sdc4[2](F) sdb4[1](F) sda4[0]
        58116992 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [U__]
        [=>...................]  reshape =  6.6% (3850816/58116992) finish=11936.9min speed=75K/sec
      
  md1 : active raid1 sda3[1]
        9991040 blocks super 1.2 [1/1] [U]
      
  md0 : active raid1 sda2[1]
        9991040 blocks super 1.2 [1/1] [U]
      
  unused devices: <none>


-- Package-specific info:
** Version:
Linux version 3.2.0-3-amd64 (Debian 3.2.23-1) (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1 SMP Mon Jul 23 02:45:17 UTC 2012

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.2.0-3-amd64 root=UUID=977b1927-3130-4d3b-9afa-94ca8e0eeb1a ro fastboot elevator=deadline

** Tainted: C (1024)
 * Module from drivers/staging has been loaded.

** Kernel log:
[    6.151219] ACPI: Power Button [PWRB]
[    6.151526] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    6.151650] ACPI: Power Button [PWRF]
[    6.188399] input: BTC USB Multimedia Keyboard as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1:1.0/input/input2
[    6.188782] hid-generic 0003:046D:C312.0001: input,hidraw0: USB HID v1.10 Keyboard [BTC USB Multimedia Keyboard] on usb-0000:00:1d.0-1/input0
[    6.205890] input: BTC USB Multimedia Keyboard as /devices/pci0000:00/0000:00:1d.0/usb5/5-1/5-1:1.1/input/input3
[    6.206440] hid-generic 0003:046D:C312.0002: input,hiddev0,hidraw1: USB HID v1.10 Device [BTC USB Multimedia Keyboard] on usb-0000:00:1d.0-1/input1
[    6.222692] hid-generic 0003:10C4:EA90.0003: hiddev0,hidraw2: USB HID v1.01 Device [Silicon Laboratories CP2112 HID USB-to-SMBus Bridge] on usb-0000:00:1d.2-1/input0
[    6.265568] ACPI Warning: 0x0000000000000860-0x000000000000087f SystemIO conflicts with Region \_SB_.PCI0.GFX0.TCOI 1 (20120320/utaddress-251)
[    6.265803] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    6.265931] lpc_ich: Resource conflict(s) found affecting iTCO_wdt
[    6.331792] input: PC Speaker as /devices/platform/pcspkr/input/input4
[    6.377767] [drm] Initialized drm 1.1.0 20060810
[    6.443634] microcode: CPU0 sig=0x106ca, pf=0x8, revision=0x107
[    6.681532] microcode: CPU1 sig=0x106ca, pf=0x8, revision=0x107
[    6.692433] microcode: CPU2 sig=0x106ca, pf=0x8, revision=0x107
[    6.702013] microcode: CPU3 sig=0x106ca, pf=0x8, revision=0x107
[    6.709041] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[    6.842291] i915 0000:00:02.0: setting latency timer to 64
[    6.865081] i915 0000:00:02.0: irq 52 for MSI/MSI-X
[    6.865108] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    6.865180] [drm] Driver supports precise vblank timestamp query.
[    6.865334] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    6.904457] Error: Driver 'pcspkr' is already registered, aborting...
[    6.920032] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3
[    6.936276] i915: fixme: max PWM is zero
[    6.953543] [drm] initialized overlay support
[    7.128450] fbcon: inteldrmfb (fb0) is primary device
[    7.224711] Console: switching to colour frame buffer device 128x48
[    7.233387] fb0: inteldrmfb frame buffer device
[    7.233465] drm: registered panic notifier
[    7.233558] i915: No ACPI video bus found
[    7.233710] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[   16.108147] EXT4-fs (md0): re-mounted. Opts: (null)
[   16.336548] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[   16.401728] it87: Found IT8721F chip at 0xa10, revision 3
[   17.461356] EXT4-fs (sdd2): mounted filesystem with ordered data mode. Opts: (null)
[   17.782440] bonding: bond0: Setting MII monitoring interval to 100.
[   17.786234] bonding: bond0: Setting down delay to 200.
[   17.790013] bonding: bond0: Setting up delay to 200.
[   17.798939] bonding: bond0: setting mode to balance-alb (6).
[   17.807985] bonding: bond0: Adding slave eth0.
[   17.910389] e1000e 0000:00:19.0: irq 42 for MSI/MSI-X
[   18.012153] e1000e 0000:00:19.0: irq 42 for MSI/MSI-X
[   18.013604] bonding: bond0: enslaving eth0 as an active interface with a down link.
[   18.024712] bonding: bond0: Adding slave eth1.
[   18.108801] bonding: bond0: enslaving eth1 as an active interface with a down link.
[   18.301198] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[   19.677030] e1000e: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
[   19.680744] e1000e 0000:02:00.0: eth1: 10/100 speed: disabling TSO
[   19.700033] bonding: bond0: link status up for interface eth1, enabling it in 0 ms.
[   19.703840] bonding: bond0: link status definitely up for interface eth1, 100 Mbps full duplex.
[   19.707661] bonding: bond0: making interface eth1 the new active one.
[   19.713118] device eth1 entered promiscuous mode
[   19.716921] bonding: bond0: first active interface up!
[   19.721300] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   29.700041] device eth1 left promiscuous mode
[   30.945778] RPC: Registered named UNIX socket transport module.
[   30.949739] RPC: Registered udp transport module.
[   30.953573] RPC: Registered tcp transport module.
[   30.957382] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   30.992868] FS-Cache: Loaded
[   31.051875] NFS: Registering the id_resolver key type
[   31.055807] Key type id_resolver registered
[   31.059529] Key type id_legacy registered
[   31.063267] FS-Cache: Netfs 'nfs' registered for caching
[   31.087150] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   31.848101] Rounding down aligned max_sectors from 4294967295 to 8388600
[   32.090060] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[   32.123109] NFSD: starting 90-second grace period
[   37.737966] md1: detected capacity change from 10230824960 to 0
[   37.741866] md: md1 stopped.
[   37.745665] md: unbind<sda3>
[   37.752187] md: export_rdev(sda3)
[   37.957543] md: md1 still in use.
[   37.981719] md: bind<sda3>
[   38.014776] md/raid1:md1: active with 1 out of 1 mirrors
[   38.020343] md1: detected capacity change from 0 to 10230824960
[   38.030287]  md1: unknown partition table
[   39.659279] md: md1 switched to read-write mode.
[   59.359456] md: bind<sda4>
[   59.359820] md: bind<sdb4>
[   59.385469] md/raid1:md2: not clean -- starting background reconstruction
[   59.385598] md/raid1:md2: active with 2 out of 2 mirrors
[   59.385743] md2: detected capacity change from 0 to 59511799808
[   59.404420]  md2: unknown partition table
[   71.771545] md: md2 switched to read-write mode.
[   71.771836] md: resync of RAID array md2
[   71.771956] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   71.772124] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[   71.772346] md: using 128k window, over a total of 58116992k.
[ 1394.048132] md: bind<sdc4>
[ 1531.947085] md: md2: resync done.
[ 1532.087136] RAID1 conf printout:
[ 1532.087146]  --- wd:2 rd:2
[ 1532.087152]  disk 0, wo:0, o:1, dev:sda4
[ 1532.087157]  disk 1, wo:0, o:1, dev:sdb4
[ 1532.094874] RAID1 conf printout:
[ 1532.094883]  --- wd:2 rd:2
[ 1532.094891]  disk 0, wo:0, o:1, dev:sda4
[ 1532.094898]  disk 1, wo:0, o:1, dev:sdb4
[ 1551.543537] async_tx: api initialized (async)
[ 1551.550439] xor: automatically using best checksumming function:
[ 1551.588021]    generic_sse:  5454.000 MB/sec
[ 1551.660235] raid6: sse2x1     124 MB/s
[ 1551.728216] raid6: sse2x2     240 MB/s
[ 1551.796124] raid6: sse2x4     473 MB/s
[ 1551.796217] raid6: using algorithm sse2x4 (473 MB/s)
[ 1551.796301] raid6: using ssse3x2 recovery algorithm
[ 1551.805710] md: raid6 personality registered for level 6
[ 1551.805843] md: raid5 personality registered for level 5
[ 1551.805956] md: raid4 personality registered for level 4
[ 1551.806568] md/raid:md2: device sdb4 operational as raid disk 1
[ 1551.806678] md/raid:md2: device sda4 operational as raid disk 0
[ 1551.807497] md/raid:md2: allocated 2224kB
[ 1551.812169] md/raid:md2: raid level 5 active with 2 out of 2 devices, algorithm 2
[ 1551.812301] RAID conf printout:
[ 1551.812305]  --- level:5 rd:2 wd:2
[ 1551.812310]  disk 0, o:1, dev:sda4
[ 1551.812314]  disk 1, o:1, dev:sdb4
[ 1551.870074] RAID conf printout:
[ 1551.870082]  --- level:5 rd:3 wd:3
[ 1551.870087]  disk 0, o:1, dev:sda4
[ 1551.870091]  disk 1, o:1, dev:sdb4
[ 1551.870095]  disk 2, o:1, dev:sdc4
[ 1551.870329] md: reshape of RAID array md2
[ 1551.870431] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1551.870541] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[ 1551.870730] md: using 128k window, over a total of 58116992k.
[ 1551.999047] md: md_do_sync() got signal ... exiting
[ 1552.016750] md: reshape of RAID array md2
[ 1552.016863] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1552.016992] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[ 1552.017182] md: using 128k window, over a total of 58116992k.
[ 1926.214903] md/raid:md2: Disk failure on sdc4, disabling device.
[ 1926.214903] md/raid:md2: Operation continuing on 3 devices.
[ 1926.290565] md: md2: reshape done.
[ 1926.376486] md: reshape of RAID array md2
[ 1926.376588] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1926.376717] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[ 1926.376927] md: using 128k window, over a total of 58116992k.
[ 1929.016353] md/raid:md2: Disk failure on sdb4, disabling device.
[ 1929.016353] md/raid:md2: Operation continuing on 2 devices.
[ 1929.022339] md: md2: reshape done.
[ 1932.969727] nfsd: last server has exited, flushing export cache
[ 2073.881290] md: md2 still in use.
[ 2074.085991] md: md2 still in use.
[ 2074.289493] md: md2 still in use.
[ 2074.492936] md: md2 still in use.
[ 2074.696384] md: md2 still in use.
[ 2074.900822] md: md2 still in use.
[ 2075.104037] md: md2 still in use.
[ 2075.307261] md: md2 still in use.
[ 2075.510435] md: md2 still in use.
[ 2075.713487] md: md2 still in use.
[ 2075.916456] md: md2 still in use.
[ 2076.120593] md: md2 still in use.
[ 2076.323435] md: md2 still in use.
[ 2076.526270] md: md2 still in use.
[ 2076.728997] md: md2 still in use.
[ 2076.931695] md: md2 still in use.
[ 2077.134460] md: md2 still in use.
[ 2077.338088] md: md2 still in use.
[ 2077.540697] md: md2 still in use.
[ 2077.743244] md: md2 still in use.
[ 2077.945793] md: md2 still in use.
[ 2078.148286] md: md2 still in use.
[ 2078.350743] md: md2 still in use.
[ 2078.553346] md: md2 still in use.
[ 2078.755750] md: md2 still in use.
[ 2160.908103] INFO: task md2_resync:6333 blocked for more than 120 seconds.
[ 2160.910500] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2160.912954] md2_resync      D ffff88007f593680     0  6333      2 0x00000000
[ 2160.915481]  ffff880036c7b8d0 0000000000000046 ffff880079975c70 ffff88007c004ea0
[ 2160.918013]  0000000000013680 ffff880079975fd8 ffff880079975fd8 ffff880036c7b8d0
[ 2160.920523]  ffff88007ad02618 ffff88007ad02400 ffff88007ad02618 ffffffffa020ba70
[ 2160.922916] Call Trace:
[ 2160.925394]  [<ffffffffa0201bca>] ? md_do_sync+0x9a1/0xb8b [md_mod]
[ 2160.927810]  [<ffffffff81053a8b>] ? add_wait_queue+0x3c/0x3c
[ 2160.930263]  [<ffffffffa01fe6ed>] ? md_thread+0x101/0x11f [md_mod]
[ 2160.932680]  [<ffffffffa01fe5ec>] ? md_rdev_init+0xf9/0xf9 [md_mod]
[ 2160.934932]  [<ffffffff8105342e>] ? kthread+0x7d/0x85
[ 2160.937262]  [<ffffffff81366ae4>] ? kernel_thread_helper+0x4/0x10
[ 2160.939537]  [<ffffffff810533b1>] ? kthread_freezable_should_stop+0x3c/0x3c
[ 2160.941840]  [<ffffffff81366ae0>] ? gs_change+0x13/0x13
[ 2160.944154] INFO: task vgs:6637 blocked for more than 120 seconds.
[ 2160.946414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2160.948796] vgs             D ffff88007f513680     0  6637   6465 0x00000000
[ 2160.951123]  ffff88007b6488f0 0000000000000086 ffff88007c7cb200 ffff88007c005590
[ 2160.953478]  0000000000013680 ffff880079637fd8 ffff880079637fd8 ffff88007b6488f0
[ 2160.955808]  ffff88007bbc3de0 ffff88007f513680 ffff88007b6488f0 ffff88007b6488f0
[ 2160.958143] Call Trace:
[ 2160.960515]  [<ffffffff8135ff22>] ? io_schedule+0x56/0x6d
[ 2160.962785]  [<ffffffff8112a691>] ? do_blockdev_direct_IO+0x9b8/0xa81
[ 2160.965162]  [<ffffffff81128802>] ? blkdev_max_block+0x2b/0x2b
[ 2160.967538]  [<ffffffff811280c8>] ? blkdev_direct_IO+0x4e/0x53
[ 2160.969959]  [<ffffffff81128802>] ? blkdev_max_block+0x2b/0x2b
[ 2160.972503]  [<ffffffff810bbf53>] ? generic_file_aio_read+0xeb/0x5b5
[ 2160.974985]  [<ffffffff811104c5>] ? dput+0x26/0xf4
[ 2160.977529]  [<ffffffff81115c4f>] ? mntput_no_expire+0x2a/0x134
[ 2160.980168]  [<ffffffff8110b4c4>] ? do_last+0x67d/0x717
[ 2160.983376]  [<ffffffff810fff0c>] ? do_sync_read+0xb4/0xec
[ 2160.986835]  [<ffffffff811005e6>] ? vfs_read+0x9f/0xe6
[ 2160.989560]  [<ffffffff81100672>] ? sys_read+0x45/0x6b
[ 2160.992242]  [<ffffffff813657f9>] ? system_call_fastpath+0x16/0x1b

** Model information
sys_vendor: Gigabyte
product_name: MN525MI
product_version: 01234567
chassis_vendor: Gigabyte
chassis_version: 01234567
bios_vendor: American Megatrends Inc.
bios_version: F2
board_vendor: Gigabyte
board_name: MN525MI
board_version: 01234567

** Loaded modules:
iscsi_target_mod
target_core_mod
configfs
nfsd
nfs
nfs_acl
auth_rpcgss
fscache
lockd
sunrpc
it87
hwmon_vid
bonding
i915
snd_pcm
snd_page_alloc
snd_timer
video
snd
drm_kms_helper
soundcore
drm
psmouse
i2c_i801
i2c_algo_bit
coretemp
serio_raw
i2c_core
iTCO_wdt
pcspkr
evdev
iTCO_vendor_support
processor
button
thermal_sys
dm_mod
raid1
md_mod
ext4
crc16
jbd2
mbcache
aufs(C)
sg
sd_mod
crc_t10dif
usbhid
hid
usb_storage
uas
ahci
libahci
libata
scsi_mod
uhci_hcd
xhci_hcd
ehci_hcd
usbcore
usb_common
e1000e

** PCI devices:
not available

** USB devices:
not available


-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-3-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968) (ignored: LC_ALL set to en_US.UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages linux-image-3.2.0-3-amd64 depends on:
ii  debconf [debconf-2.0]                   1.5.46
ii  initramfs-tools [linux-initramfs-tool]  0.107
ii  kmod                                    9-2
ii  linux-base                              3.5
ii  module-init-tools                       9-2

Versions of packages linux-image-3.2.0-3-amd64 recommends:
ii  firmware-linux-free  3.1

Versions of packages linux-image-3.2.0-3-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  grub-pc                 1.99-22.1
pn  linux-doc-3.2           <none>

Versions of packages linux-image-3.2.0-3-amd64 is related to:
pn  firmware-atheros        <none>
pn  firmware-bnx2           <none>
pn  firmware-bnx2x          <none>
pn  firmware-brcm80211      <none>
pn  firmware-intelwimax     <none>
pn  firmware-ipw2x00        <none>
pn  firmware-ivtv           <none>
pn  firmware-iwlwifi        <none>
pn  firmware-libertas       <none>
pn  firmware-linux          <none>
pn  firmware-linux-nonfree  <none>
pn  firmware-myricom        <none>
pn  firmware-netxen         <none>
pn  firmware-qlogic         <none>
pn  firmware-ralink         <none>
pn  firmware-realtek        <none>
pn  xen-hypervisor          <none>

-- debconf information:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = "en_US.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
  linux-image-3.2.0-3-amd64/postinst/depmod-error-initrd-3.2.0-3-amd64: false
  linux-image-3.2.0-3-amd64/postinst/ignoring-ramdisk:
  linux-image-3.2.0-3-amd64/postinst/missing-firmware-3.2.0-3-amd64:
  linux-image-3.2.0-3-amd64/prerm/removing-running-kernel-3.2.0-3-amd64: true


Reply to: