Etch Software RAID Upgrade Trouble & Suggested Installer Improvements
Various mishaps when recovering a botched software RAID system.
The rescue functionality of the installer should be improved.
After a somewhat nightmarish (yet finally successful)
upgrade of my main workhorse PC to Linux software RAID,
I have decided to make this list of suggested improvements.
Following the list is a more detailed account of the reasons.
This is in no way meant to diminish or belittle the nice
work that Debian folks have done so far; I appreciate that
very much. However, doing something about the one or other
of those points might help other users in the future.
1. Rescue mode needs MD devices
The rescue mode of the installer needs a step
to activate MD devices. Currently, only the plain
disk partitions are visible; that's no help.
2. Netinstall image needs a ping
There should be a ping command available on the
netinstall image. Otherwise, for a multi-card PC
it is hard to check whether the right interface
has been configured right.
3. Netinstall's ifconfig needs to set MAC address
The ifconfig on the netinstall image (from busybox)
does not allow to set the hardware ethernet address.
In some scenarios this is important and necessary.
4. Netinstall image should have some packages
I'm not sure on that ... but having grub, a
kernel and a modules package would have been
an immense help.
5. Rescue functionality needs improvement
The rescue functionality of the installer is
nice but practically not very useful.
Polishing the rescue system would have helped me
in many situations before, not just this case.
I would love to have more of a standalone
system (from RAMDISK and/or "Live"-CD).
In particular the fact that one can't run many
elementary linux commands (tar, gzip, networking,
e2fsck, mke2fs, dd, nfs-mount...) without going
far along in the install process, is a hindrance.
And the point where the actual installation gets
manipulated by the installer is not always clear.
6. Grub's built-in documentation is incomprehensible
Grub is one of those tools that one needs to work
with when the box isn't running. Grub's and
grub-install's help are not practically useful.
7. There needs to be a command to copy all data
Between cp, tar, rsync & friends there are dozens
of variations how to copy over the files of a
running system to another location, but none is
- leave out lost+found
- leave out /proc, /sys, the automatic /dev
- copy all "real" files
- copy the /dev on harddisk under the mounted devfs
(using mount -bind or so)
There is really need for a good program that does it;
IMHO that program should be cp.
8. hdparms' error messages unsatisfying
When some ATA drivers are not loaded, the hdparms command
does not let you set DMA mode for a drive. Unfortunately
the error message is not very helpful in localizing and
fixing the problem.
9. cdrecord's miserable state is well known
Like the majority of other Linux users, I wonder when
$ burn_my_iso_to_cd <iso-file> /dev/cdrom
will work as expected.
Now, on to the specifics. Here is the account what
happened to me and how I arrived at those suggestions.
A) The upgrade
I decided to buy another IDE disk for my workhorse PC,
to mirror the old one (Software RAID 1) and get some
additional (un-mirrored) space on the new disk for
junk data (VDR movies etc.)
Being an old Debian user, I surely could do that
in-flight without a backup ... :-)
(Some sins get instant punishment).
B) The guide
I followed the excellent guide in
- create degraded RAID on new disk
- copy data to new disk
- modify initrd, fstab, grub
- test booting new system
- re-format old disk and add to RAID
- finalize initrd, fstab, grub
C) Trouble begins
It was at the testing stage, having successfully booted
into the degraded RAID system on the new disk, where
I decided to record a movie.
Re-formatting the old disk and adding it to the RAID,
I noticed that the system became very unresponsive and xine
had trouble writing the movie to disk. I found out that
the DMA was turned off and reconstruction of the RAID
took a lot of CPU and disk activity.
I could not set the DMA mode with hdparm, apparently some
modules for that were missing. (I can't reconstruct since
now the DMA is miraculously turned on).
D) The fatal mistake
I had to stop recording since the movie would get chopped
and RAID reconstruction would take forever (20 h).
I decided to reboot to get the DMA working and forgot that
I had just re-formatted the /boot partition on the old disk,
so grub would not find any chain loader, obviously.
E) The painful recovery
- Grub wouldn't load anything, the system did not boot.
- I tried a sarge installer CD that didn't recognize the
md signatures of the partitions.
- I couldn't figure out how to run the grub installer from
a mounted pseudo-root directory where the devices were
named differently (old /dev/hde vs. new /dev/sda for SATA).
- An old Knoppix allowed me to configure the router
functionality and download the installer image.
- To burn the image, I had to download k3b since
I couldn't figure out either cdrecord or cdrdao
within reasonable time (USB-CDrom external writer
with broken original writer in Laptop).
- The rescue mode of the netinst RC1 CD didn't
let me choose the MD partitions for root device.
- I could not get to the Internet since my cable
modem only responds to a certain MAC address that
can't be set with the ifconfig on netinst.
- Finally running the install process far enough
to get the md devices mounted (it's unclear how
to do that manually instead of using the partitioner),
I had access to a ping and a working ifconfig
to get Internet access.
- From the Internet, I could then download grub
and install it manually after fighting against
/proc and /sys.
- The installer had overwritten my /etc/fstab
which I then fixed.
I have purposefully omitted the many other failures,
most of them results of my own faults, that made this
endeavour take a total 11 hours into the night.
I think the steps that I described show that while the
new installer has gotten very well in its main function
(as an INSTALLER), it still lacks most features as a
Going through various attempts at unbootable USB-stick-rescuers,
and old Knoppix and Sarge installers, I'm quite convinced that
an effective rescue system MUST be based on the same kernel
series and system setup philosophies as the primary installed
system (what with udev, /sys, /proc, md5 partition autostart
for new superblocks, copyable kernel that allows mounting the
target partition as root etc.).
Therefore I'll conclude with the plead that the fine folks
who did such a great work on the new installer might now
turn their eye on its rescue functionality, and I hope this
comment is helpful.
Tired but finally successful
Claus Fischer <email@example.com>