[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#61065: Adam Di Carlo: Re: Bug#61065: Need /dev/md0 for boot/install RAID support in root.bin



On 2 Apr 2000, Adam Di Carlo wrote:

> Mike Bilow <mikebw@colossus.bilow.com> writes:

> > > Note that the standard, "vanilla" kernel has changed in the following way:
> [...]
> > > Is this enough?  I guess not... it doesn't have md support?
> > 
> > I will have to check into this.  The start of the problem is that there is
> > no proper software RAID support in the kernel source tree.  The software
> > RAID support which is present is years old and is referenced as v0.4x
> > RAID.  The newer code, which completely replaces the earlier code and is
> > currently being distributed as kernel source patches by Ingo Molnar (at
> > http://people.redhat.com/mingo) which is referenced as v0.90 RAID.
> 
> Ok -- you're asking for kernel patching.  This is not for us to
> decide.  This is for the kernel-source maintainer (Herbert Xu) to
> decide.  I would assume it's too late in the freeze process to change
> this in potato. I've CC'd him.

For potato, I don't think there is any serious question that kernel
patching should not go in for the v0.90 RAID.  The best approach for v0.90
RAID support in potato would be to make up an alternate boot set.  Other
than the install issues and kernel patches, v0.90 RAID suport is well
handled by potato now and I have the system running to prove it.

Under no circumstances do I think there is sufficent trouble with RAID
support to warrant any delay to potato.  I have also asked (bug 61165) if
there is going to be an update to Lilo for potato, not only because of the
newly added v0.90 RAID support but also because of LBA32 support, and the
answer was that it is not happening for potato.

We ought to have the /dev/md? device nodes in root.bin for both v0.4x and
v0.90 RAID support, though.  These need to be present both to make
installation easier and to support operation in rescue mode.

> I thought if you are *booting* into a RAID system, you really really
> need a initrd/linuxrc system.  Since we don't support this in the
> boot-floppies, doesn't that mean by definition we don't support
> booting on a RAID fs?

No.  There have been Lilo patches from Doug Ledford of Red Hat which
support booting from RAID for some time, and these were incorporated into
the upstream version as of 21.4.0; see bug 61165.  Technically, each piece
of a v0.90 RAID-1 set looks like an ordinary filesystem on its own, but
has (1) a special superblock addendum that references the other pieces of
the RAID set and (2) an optionally different partition type code that just
triggers kernel autodetection of the RAID set.  The new Lilo simply writes
a copy of the MBR to each of the physical volumes participating in the
RAID set, allowing any of the surviving volumes to be used for boot.

In other words, if /dev/hda1 and /dev/hdc1 are united into a RAID-1 and
called /dev/md0, then Lilo can be made to write an appropriate MBR onto
both /dev/hda and /dev/hdc.  Then the system can be booted normally or, in
degraded mode, from either volume alone.

> > Well, raidstart would be nice, too, especially if the rescue disk is
> > really expected to be able to serve as a "rescue" disk.  Most of the other
> > RAID components in raidtool2 are actually just symlinks to either mkraid
> > or raidstart, anyway, so they consume essentially no space in root.bin.
> 
> Well, see the above question.

If you cannot boot the RAID-1 system normally, and it will not boot in
degraded mode, then you would need to bring the system up with the
rescue/root floppies.  You should be able to mount a piece of the RAID-1
set as a normal filesystem if you absolutely have to get data from it.  It
is the middle ground, where a disk has failed and is being gracefully
swapped with its replacement, where the tools would be handy on floppy.

> > I am not sure if /dev/md? devices are created in the base file system.  I
> > suspect they are not, and that this is currently the responsibility of the
> > raidtools2 package rather than the base installation.  This would be
> > consistent with Debian Policy, I think, except that it is now possible to
> > boot on software RAID and therefore the responsibility should be moved.
> > 
> > As long as the /proc filesystem is in the kernel, you can do an existence
> > check for /proc/mdstat.  If it exists, then the kernel has RAID support
> > and the /dev/md? devices should probably be created during base install.
> 
> So long as raidtools2 currently handles it and everything works, I'm
> perfectly happy to let it stay that way.  We're trying to release very
> soon and we simply cannot embark on coordinated changes like this at
> this phase.

Agreed, but the issue is installation.  If one can get to the point of
selecting packages and then manually select to install the raidtools2
package, then that would be fine.  It is specifically during the install
or rescue that we need to have /dev/md? device node available in root.bin,
but I think people would understand that a RAID installation is going to
fail if they do not elect to install the raidtools2 package.  The failure
mode would be something like "VFS: Panic: cannot mount root fs from 09:00"
or some such message that would probably be reasonably clear to anyone
doing this sort of advanced thing.

> > > > In general, support for installing to bootable software RAID should
> > > > probably be added to boot-floppies at some point for the 2.4 kernel.
> > > 
> > > There are limitations on what we can add to the kernel.  But I think
> > > this is in the cards for woody.
> > 
> > Once software RAID is in the mainstream 2.4 kernel, combined with the fact
> > that it was added to mainstream Lilo last week, this is going to be an
> > expected part of a Linux distribution and we absolutely must provide it.
> > The fact that I was actually able to do it successfully with potato by
> > replacing the kernel and Lilo is an indication that we are very close.
> 
> Yes, I understand.  I'm happy to do whatever is pretty easy to do
> (such as creating the md devices in root.bin).  I don't think RAID
> will be fully supported in potato, however -- too many changes are
> required and we are too far into the freeze.

I agree again, and I think that the best way to do potato support for RAID
install is just to either have an alternate boot set or at least provide
extensive documentation.  The alternate boot set would be preferable if
someone knowledgeable is willing to do the work of maintaining it.  At
this point, what I am really asking for is exactly the creation of the
/dev/md? devices in root.bin.

Over the long term, it was just announced that the v0.90 RAID is being
merged into the 2.3.x kernel code, so it seems nearly definite that it
will be making it into the 2.4 kernel.  By that time, I think Debian needs
to have some resolution on how to cope with installing on RAID, since it
will be a standard feature of Linux.  The situation presently with kernel
patches being required is entirely temporary.  (Note that the 2.2.14
kernel with RAID patches is far more stable than the 2.3.x kernels in any
case, and is currently in wide use for production systems.)

The RAID code, although stable in the sense that it works reliably, is not
really stable in that it is undergoing active development.  I have just
proposed some changes to the kernel RAID code dealing with how swapping
gets turned on, and as a stopgap measure that startup scripts be modified
to make sure swapping does not get turned on while a RAID resync is taking
place (see CRITICAL bug 61227).  These issues should be cleaned up by the
time the v0.90 RAID code makes it into the upstream 2.4 kernel.

Eventually it will become necessary for the installation program to
understand, first, that /dev/md0 devices can be installation targets, and,
second, that /dev/hd? and /dev/sd? devices which are participating in
/dev/md? devices cannot be installation targets.

-- Mik

--
-------------------------------------------------------------------------------
Bilow Computer Science, Inc. | http://www.bilow.com/ | Michael S. Bilow
Cranston, RI 02920-5554, USA | mike@bilow.com        | President
-------------------------------------------------------------------------------
PGP Public Key fingerprint  =  4B 06 23 FB 3E 24 A5 24  14 B5 A2 14 96 73 B4 B2
PGP Public Key fingerprint  =  A5 13 63 7F E3 9F AB 0A  52 62 49 26 BF 0C 01 AD
-------------------------------------------------------------------------------


Reply to: