[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: xen, raid and initramfs failure



reviving an old thread as I have more to add...

ping martin krafft...


On Sat, Mar 31, 2007 at 11:45:38AM +0200, martin f krafft wrote:
> also sprach Andrew Sackville-West <andrew@farwestbilliards.com> [2007.03.30.1920 +0200]:
> > Failure: failed to load Module 0 no such module
> > Failure: failed to load Module 1 no such module
> > Failure: failed to load Module 5 no such module
> 
> I don't even know what creates those messages.

these mesages are created by the initramfs script local-top/mdadm. 

> 
> > looks like it should just interate through the list and load the
> > modules. I have confirmed that it works the way I expect in bash,
> > but it doesn't work properly when booting. for some reason the
> > module names seem to get replaced with just hte numbers "0" "1"
> > and "5".
> 
> No, I don't think this is what's happening, but I also don't know
> what is going on.

it is indeed what is happening. the local-top/mdadm script sources
conf/md.conf. That file resets the value of MD_MODULES from stuff like
"raid0 raid1..." to "0 1 5" and thus it bombs out on the boot. 

> 
> > I have hacked the script and rebuilt my initrds by commenting out
> > the above section and just putting in a bunch of modprobes and it
> > works. But clearly something wacky is going on here. 
> 
> Can you edit the script, add set -x at the top and post the output?

I've done that. here is the relevant output from one of my DomU's when booting:


Begin: Running /scripts/local-top ...
+ PREREQ=udev_helper
+ prereqs
+ echo udev_helper
+ exit 0
+ PREREQ=udev_helper
+ . /scripts/functions
+ [ -e /scripts/local-top/md ]
+ MDADM=/sbin/mdadm
+ [ -x /sbin/mdadm ]
+ MD_DEVS=all
+ MD_MODULES=linear multipath raid0 raid1 raid456 raid5 raid6 raid10

note that MD_MODULES is correctly set here but...

+ [ -s /conf/md.conf ]
+ . /conf/md.conf

.../conf/md.conf gets sourced here and ...

+ MD_HOMEHOST=bigmomma
+ MD_DEVPAIRS=/dev/md1:5 /dev/md0:1 /dev/md11:1 /dev/md12:1
/dev/md10:0 /dev/md2:5
+ MD_LEVELS=5 1 1 1 0 5
+ MD_DEVS=all
+ MD_MODULES=0 1 5

... that resets MD_MODULES here, before  the modprobes begin below...

+ verbose
+ return 0
+ log_begin_msg Loading MD modules
+ [ -x /sbin/usplash_write ]
+ _log_msg Begin: Loading MD modules ...
+ [ n = y ]
+ echo Begin: Loading MD modules ...
Begin: Loading MD modules ...
+ modprobe --syslog -v 0
modprobe: FATAL: Module 0 not found.

+ log_failure_msg failed to load module 0.
+ _log_msg Failure: failed to load module 0.
+ [ n = y ]
+ echo Failure: failed to load module 0.
Failure: failed to load module 0.
+ modprobe --syslog -v 1
modprobe: FATAL: Module 1 not found.

+ log_failure_msg failed to load module 1.
+ _log_msg Failure: failed to load module 1.
+ [ n = y ]
+ echo Failure: failed to load module 1.
Failure: failed to load module 1.
+ modprobe --syslog -v 5
modprobe: FATAL: Module 5 not found.

+ log_failure_msg failed to load module 5.
+ _log_msg Failure: failed to load module 5.
+ [ n = y ]
+ echo Failure: failed to load module 5.
Failure: failed to load module 5.
+ log_end_msg
+ [ -x /sbin/usplash_write ]
+ _log_msg Done.
+ [ n = y ]
+ echo Done.
Done.
+ update_progress
+ [ -d /dev/.initramfs ]
+ [ -z 2 ]
+ PROGRESS_STATE=3
+ echo PROGRESS_STATE=3
+ [ -x /sbin/usplash_write ]
+ [ ! -f /proc/mdstat ]
+ verbose
+ return 0
+ panic cannot initialise MD subsystem (/proc/mdstat missing)
+ [ -x /sbin/usplash_write ]
+ [  = 0 ]

clearly there is a conflict in the way md.conf is set up and I
frankly don't know where that happens. 

> 
> > 2. How do I unpack my initrd to actually look at the script that is in
> >    the initrd (maybe it gets changed somehow?) so I can check that out
> >    directly. 
> 
> zcat initrd.img | cpio -i
> 
> > 3. is this a bug? 
> 
> We'll see.

I'm not sure what really brought this all about, but I suspect that
perhaps this machine got something messed up in its raid configuration
at the end of the etch release cycle. honestly, i haven't spent much
time digging for this problem as my uptimes are so long, that booting
just doesn't happen that much. It came up recently due to some minor
problems in one of my DomU's and i took it offline for a fsck. upon
reboot, i noticed the set -x output streaming by and went back to take
a look. My DomU's use the same initrd (I should probably fix that) as
Dom0 and end up providing a great test bed for this problem as i don't
have to take the whole system down to troubleshoot it. 

I haven't had a kernel upgrade since the problem surfaced, and so no
opportunity to see if the problem has been resolved yet either. But I
now have multiple initrd's floating around for this system and so can
test it a little more easily.

If there is more i can do to determine if this is a real problem, or
just some local anomaly I'm experiencing, please let me know.

A

Attachment: signature.asc
Description: Digital signature


Reply to: