Bug#791794: RAID device not active during boot

To: Philip Hands <phil@hands.com>, "791794@bugs.debian.org" <791794@bugs.debian.org>
Subject: Bug#791794: RAID device not active during boot
From: Peter Nagel <peter.nagel@kit.edu>
Date: Tue, 14 Jul 2015 13:52:51 +0200
Message-id: <[🔎] 55A4F813.6090501@kit.edu>
Reply-to: Peter Nagel <peter.nagel@kit.edu>, 791794@bugs.debian.org
In-reply-to: <[🔎] 87k2u5ek90.fsf@whist.hands.com>
References: <[🔎] 55A0D0CF.8060701@kit.edu> <[🔎] 20150711161247.GA5425@topoi.pooq.com> <[🔎] 87mvz2ejgo.fsf@whist.hands.com> <[🔎] 55A228A6.8090709@kit.edu> <[🔎] 87k2u5ek90.fsf@whist.hands.com>

When all disks are available during boot the system is starting without problems:

   > ls -l /dev/disk/by-uuid/
   total 0
   lrwxrwxrwx 1 root root 10 Jul 13 18:15 2138f67e-7b9e-4960-80d3-2ac2ce31d882 -> ../../sdc2
   lrwxrwxrwx 1 root root 10 Jul 13 18:15 21a660eb-729d-48fe-b9e3-140ae0ee79f4 -> ../../sdd2
   lrwxrwxrwx 1 root root 9 Jul 13 18:15 c4263f89-eb0c-4372-90ae-ce1a1545613e -> ../../md0
   lrwxrwxrwx 1 root root 10 Jul 13 18:15 cbeaebcb-2c55-48c0-b6bd-d5e8a5c4ac06 -> ../../sdb2
   lrwxrwxrwx 1 root root 10 Jul 13 18:15 ff2bae51-c5b8-41e3-855b-68ee57b61c0c -> ../../sda2

When starting the system with only two (instead of four) disks I'm droped into emergency shell with the following error message:

   ALERT! /dev/disk/by-uuid/c4263f89-eb0c-4372-90ae-ce1a1545613e does not exist. Dropping to a shell!

... which seems to be consistent with the fact that the UUID for /dev/md0 is not available ...

   (initramfs) ls -l /dev/disk/by-uuid/
   total 0
   lrwxrwxrwx    1 0        0               10 Jul 13 15:20 cbeaebcb-2c55-48c0-b6bd-d5e8a5c4ac06 -> ../../sdb2
   lrwxrwxrwx    1 0        0               10 Jul 13 15:20 ff2bae51-c5b8-41e3-855b-68ee57b61c0c -> ../../sda2

... which in turn is caused the RAID device itself being inactive at that time:

   (initramfs) cat /proc/mdstat
   Personalities :
   md0 : inactive sdb1[5](S) sda1[6](S)
     39028736 blocks super 1.2

   unused devices: <none>

In order to re-activate /dev/md0 I use the following commands:

   (initramfs) mdadm --stop /dev/md0
   [ 178.719551] md: md0 stopped.
   [ 178.722463] md: unbind<sdb1>
   [ 178.725386] md: export_rdev(sdb1)
   [ 178.728804] md: unbind<sda1>
   [ 178.731711] md: export_rdev(sda1)
   mdadm: stopped /dev/md0

   (initramfs) mdadm --assemble /dev/md0
   [ 214.171191] md: md0 stopped.
   [ 214.184471] md: bind<sda1>
   [ 214.195838] md: bind<sdb1>
   [ 214.218253] md: raid1 personality registered for level 1
   [ 214.226156] md/raid1:md0: active with 1 out of 3 mirrors
   [ 214.231651] md0: detected capacity change from 0 to 19982581760
   [ 214.247893] md0: unknown partition table
   mdadm: /dev/md0 has been started with 1 drive (out of 3) and 1 spare.

   (initramfs) cat /proc/mdstat
   Personalities : [raid1]
   md0 : active (auto-read-only) raid1 sdb1[5] sda1[6](S)
     19514240 blocks super 1.2 [3/1] [U__]

   unused devices: <none>

... which will make the RAID device available in /dev/disk/by-uuid/

   (initramfs) ls -l /dev/disk/by-uuid/
   total 0
   lrwxrwxrwx    1 0        0                9 Jul 13 15:24 c4263f89-eb0c-4372-90ae-ce1a1545613e -> ../../md0
   lrwxrwxrwx    1 0        0               10 Jul 13 15:20 cbeaebcb-2c55-48c0-b6bd-d5e8a5c4ac06 -> ../../sdb2
   lrwxrwxrwx    1 0        0               10 Jul 13 15:20 ff2bae51-c5b8-41e3-855b-68ee57b61c0c -> ../../sda2

Now, if I exit the emergency shell the system is able to boot without problems.

In bug report #784070 it is mentioned that "with the version of mdadm shipping with Debian Jessie, the --run parameter seems to be ignored when used in conjunction with --scan. According to the man page it is supposed to activate all arrays even if they are degraded. But instead, any arrays that are degraded are marked as 'inactive'. If the root filesystem is on one of those inactive arrays, the boot process is halted."

As suggested in the bug report (see message#109) I have changed the file /usr/share/initramfs-tools/scripts/local-top/mdadm and used the comand update-initramfs -u in order to update /boot/initrd.img-3.16.* (... you might first want to make a copy of this file before update.)
After reboot the system is able to start even if some disks (out of the RAID device) are missing (see bootlog from serial console below):

   ...
   Begin: Running /scripts/init-premount ... done.
   Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Assembling all MD arrays ... [   24.799665] random
   : nonblocking pool is initialized
   Failure: failed to assemble all arrays.
   done.
   Begin: Assembling all MD arrays ... Warning: failed to assemble all arrays...attempting individual starts
   Begin: attempting mdadm --run md0 ... [   24.883069] md: raid1 personality registered for level 1
   [   24.889111] md/raid1:md0: active with 2 out of 3 mirrors
   [   24.894598] md0: detected capacity change from 0 to 19982581760
   mdadm: started array /dev/md/0
   [   24.908255] md0: unknown partition table
   Success: started md0
   done.
   done.
   Begin: Running /scripts/local-premount ... done.
   Begin: Checking root file system ... fsck from util-linux 2.25.2
   /dev/md0: clean, 36905/1220608 files, 398026/4878560 blocks
   done.
   ...

Problem solved ...
... and many thanks to Phil.

PS:
There is still one thing I do not understand:
The file etc/mdadm/mdadm.conf (within initrd.img.*) contains an UUID (see below) ...

   ARRAY /dev/md/0 metadata=1.2 UUID=92da2301:37626555:6e73a527:3ccc045f name=debian:0
      spares=1

... wich seems to be different from the output of ls -l /dev/disk/by-uuid:

   lrwxrwxrwx 1 root root 9 Jul 14 11:27 c4263f89-eb0c-4372-90ae-ce1a1545613e -> ../../md0

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply to:

Follow-Ups:
- Bug#791794: RAID device not active during boot
  - From: Ian Campbell <ijc@debian.org>

References:
- Bug#791794: RAID device not active during boot
  - From: "Nagel, Peter (IFP)" <peter.nagel@kit.edu>
- Bug#791794: RAID device not active during boot
  - From: Hendrik Boom <hendrik@topoi.pooq.com>
- Bug#791794: RAID device not active during boot
  - From: Philip Hands <phil@hands.com>
- Bug#791794: RAID device not active during boot
  - From: Peter Nagel <peter.nagel@kit.edu>
- Bug#791794: RAID device not active during boot
  - From: Philip Hands <phil@hands.com>

Prev by Date: choose-mirror_2.65_amd64.changes ACCEPTED into unstable
Next by Date: Bug#791794: RAID device not active during boot
Previous by thread: Bug#791794: RAID device not active during boot
Next by thread: Bug#791794: RAID device not active during boot
Index(es):
- Date
- Thread