Peter Nagel <peter.nagel@kit.edu> writes:
> Am 11.07.2015 18:40, schrieb Philip Hands:
>>
>> ... which is what suggests to me that it's been broken by other
>> means -- the fact that one can apparently start it by hand tells you
>> that it's basically working, so I'd think the described symptoms point
>> strongly towards duff mdadm.conf in the initramfs.
>>
>> N.B. I've not very had much to do with systemd, so am in no sense an
>> expert about that, but I've been using software raid and initrd's since
>> almost as soon as they were available, and the idea that this would be
>> down to systemd does not ring true.
>
> Thanks for pointing out this.
> Hopefully, someone is able to solve this problem.
Well, yes -- _you_ can hopefully.
0) (just in case you've not already done so, check all the bits
suggested in the warning that you quoted initially, about the
contents of /proc/... etc.)
1) on the system when booted up, check the current state of your
/etc/mdadm/mdadm.conf
Compare it with the output of:
mdadm --examine --scan
If there are significant differences (other than the missing disk),
then fix them.
2) have a look at your initrd, thus:
mkdir /tmp/initrd ; cd /tmp/initrd ; zcat /boot/initrd.img-* | cpio -iv --no-absolute-filenames
(of course, being an ARM thing, you probably have some sort of
uInitrd thing as well, so I guess it's possible to break things
between the initrd.img and that, but someone who knows about such
things would need to tell you about that).
Anyway, you should have something like this:
/tmp/initrd$ find . -name mdadm\*
./scripts/local-top/mdadm
./etc/mdadm
./etc/mdadm/mdadm.conf
./etc/modprobe.d/mdadm.conf
./conf/mdadm
./sbin/mdadm
so, take a look at that lot to see if you can spot what's up.
As an example, this is what I see on a little amd64 RAID box with
Jessie, which I have to hand:
root@linhost-th:/tmp/initrd# cat conf/mdadm
MD_HOMEHOST='linhost-th'
MD_DEVS=all
root@linhost-th:/tmp/initrd# cat etc/mdadm/mdadm.conf
HOMEHOST <system>
ARRAY /dev/md/2 metadata=1.2 UUID=00e84ce1:d96de981:375caa64:dac234f9 name=grml:2
ARRAY /dev/md/3 metadata=1.2 UUID=c9871cb8:46a3dd98:d9505965:5bd7dfe2 name=grml:3
(I tend to number my md's to match the partitions they sit on,
hence the 2 & 3)
3) save a copy of your old initrd.img somewhere, then run:
update-initramfs -u
and try a reboot -- if it works, unpack both initrd's in adjacent
directories, and use diff -ur to spot what changed, and report back
here.
4) If it didn't work, once in the emergency shell, try running:
sh -x /scripts/local-top/mdadm
and see if you can see why it's not working when starting things by
hand does.
5) If that fails to be diagnostic, is there anything hiding in your
uboot configuration that might be causing this? (assuming this box
has u-boot)
HTH
Cheers, Phil.
P.S. While you have the initrd unpacked, you might want to note that:
root@linhost-th:/tmp/initrd# grep -r systemd .
./init:# Mount /usr only if init is systemd (after reading symlink)
./init:if [ "${checktarget##*/}" = systemd ] && read_fstab_entry /usr; then
./scripts/init-top/udev:/lib/systemd/systemd-udevd --daemon --resolve-names=never
./etc/lvm/lvm.conf: # systemd's socket-based service activation or run as an initscripts service
./lib/udev/rules.d/63-md-raid-arrays.rules:# Tell systemd to run mdmon for our container, if we need it.
Binary file ./lib/systemd/systemd-udevd matches
Binary file ./lib/x86_64-linux-gnu/libselinux.so.1 matches
Binary file ./bin/kmod matches
Binary file ./bin/udevadm matches
while the scripts on the initrd image are systemd-aware, it's init
is actually a shell script -- so you're running busybox as your init
at this point.
Also:
root@linhost-th:/tmp/initrd# grep -r 'Gave up waiting for' .
./scripts/local: echo "Gave up waiting for $2 device. Common problems:"
this is the script that's dropping you into the emergency shell.
The thing that starts the shell is the panic() function from
scripts/functions -- I can see that that will do a timed reboot if
you've got panic=... on the kernel command line, but otherwise not.
Would you have something like that on your command line? (as
mentioned in the warning you quoted, /proc/cmdline tells you)
If not, do you perhaps have a hardware watchdog, or some such?
--
|)| Philip Hands [+44 (0)20 8530 9560] HANDS.COM Ltd.
|-| http://www.hands.com/ http://ftp.uk.debian.org/
|(| Hugo-Klemm-Strasse 34, 21075 Hamburg, GERMANY
Attachment:
signature.asc
Description: PGP signature