[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#791794: RAID device not active during boot



Hendrik Boom <hendrik@topoi.pooq.com> writes:

> On Sat, Jul 11, 2015 at 10:16:15AM +0200, Nagel, Peter (IFP) wrote:
>> The problem might be related to
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=789152.
>> However, in my case everything seems to be fine as long as all
>> harddisks (within the RAID) are working.
>> The Problem appears only if during boot one (or more) disk(s) out of
>> the RAID device have a problem.
>> 
>> The problem might be related to the fact that jessie comes with a
>> new init system which has a stricter handling of failing "auto"
>> mounts during boot. If it fails to mount an "auto" mount, systemd
>> will drop to an emergency shell rather than continuing the boot -
>> see release-notes (section 5.6.1):
>> https://www.debian.org/releases/stable/amd64/release-notes/ch-information.en.html#systemd-upgrade-default-init-system
>
> Would a temporary work-around be to use another init system?

It seems very unlikely to me that the init system has anything to do
with this.

If switching away from systemd "fixes" it, then that would
suggest that the file system in question is not actually needed for
boot, so should be marked as such in fstab (with nofail or noauto)

If getting past that point (by using sysvinit, which ignores the
failure) results in an operational RAID (presumably running in degraded
mode), then that would suggest that it's only being started by the
/etc/init.d/mdadm script, which would seem to suggest that the scripts
in the initramfs are not doing it, which would normally be a consequence
of having something wrong with /etc/mdamdm/mdadm.conf when the initramfs
was built.

The underlying problem is the failure to bring up the raid in the
initramfs, which is before systemd gets involved.

>
>> 
>> For example:
>> If you have installed your system to a RAID1 device and the system
>> is faced with a power failure which (might at the same time) causes
>> a damage to one of your harddisks (out of this RAID1 device) your
>> system will (during boot) drop to an emergency shell rather than
>> boot from the remaining harddisk(s).
>> I found that during boot (for some reason) the RAID device is not
>> active anymore and therefore not available within /dev/disk/by-uuid
>> (what causes the drop to the emergency shell).
>> 
>> A quickfix (to boot the system) would be, to re-activate the RAID
>> device (e.g. /dev/md0) from the emergency shell ...
>> 
>> mdadm --stop /dev/md0
>> mdadm --assemble /dev/md0
>> 
>> ... and to exit the shell.
>> 
>> Nevertheless, it would be nice if the system would boot
>> automatically (as it is known to happend under wheezy) in order to
>> be able to use e.g. a spare disk for data synchronization.
>
> After all, isn't it the whole point of a RAID1 that it can keep going when 
> one of its hard drives fails?

Exactly, which is what suggests to me that it's been broken by other
means -- the fact that one can apparently start it by hand tells you
that it's basically working, so I'd think the described symptoms point
strongly towards duff mdadm.conf in the initramfs.

N.B. I've not very had much to do with systemd, so am in no sense an
expert about that, but I've been using software raid and initrd's since
almost as soon as they were available, and the idea that this would be
down to systemd does not ring true.

Cheers, Phil.
-- 
|)|  Philip Hands  [+44 (0)20 8530 9560]  HANDS.COM Ltd.
|-|  http://www.hands.com/    http://ftp.uk.debian.org/
|(|  Hugo-Klemm-Strasse 34,   21075 Hamburg,    GERMANY

Attachment: signature.asc
Description: PGP signature


Reply to: