[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Swapping activated while RAID reconstruction in progress



On 2001-01-29 at 01:16 -0600, Sanjeev Gupta wrote:

> > If you are absolutely desperate, you could try booting from the hard drive
> > using a boot prompt argument which specifies a shell as the grandfather
> > process instead of init as usual; something like "Linux init=/bin/bash"
> > ought to do it.  This would bypass login and init processing, which would
> > therefore bypass /etc/init.d and /etc/rc?.d processing where swapping is
> > ordinarily turned on, allowing your v0.90 RAID resync to complete.
> 
> I am using v0.90.
> 
> I tried this first, but got the same error, on the kernel not being able
> to mount the root partition.
> 
> I then used the Debain install CD to "install" on my /dev/hdb5 which I
> use as a place to store backup .tar files.  No network configured, just
> installed "base".
> 
> Rebooted with old kernel, root=/dev/hdb5 .  The md devices were detected
> and picked up, resync started.
> 
> resync ended 10 minutes later.
> 
> Now I have md0 (hda2 & hdb2)
> md1 (hda3 & hdb3)

Well, if you are getting to the completion of RAID resync, then whatever
problem you are having is not the one that I was reporting in those bugs.  
At the time, the behavior if swapping was started during the RAID resync
was a hard system crash with a kernel panic.  In no case did this ever
result in actual corruption of the data on the filesystem being resynced,
but that is certainly a possibility.  The bugs I reported really were
fixed in the Potato release.

> md1 is swap, and I can now (old kernel, root=/dev/hdb5) use it as swap, so
> I am sure the signature, etc, is OK.  Howeever, I donot have an
> /etc/raidtab file yet, is this OK?  I think 0.90 persistent superblock
> does not require this?  /proc/mdstat shows md0 as 2 disks (2/2), but md1
> as only 1 (/dev/hdb3).  I am reading up on the raidtools2 package.

Something has gone wrong if /dev/md1 is only made up of one component
partition when it should be made up of two.  This could be a result of
running with RAID in degraded mode, which seems likely if you are able to
access it as /dev/md1 at all.

It is correct that /etc/raidtab is only read by the userland tools, not by
the kernel.  If you use kernel autodetection, it is still important that
you maintain a valid /etc/raidtab file which matches the real system, but
only in case you need to use some of the userland tools in the raidtools2
package.  If you do not use kernel autodetection, then you must have a
valid /etc/raidtab file in all cases.

Note that kernel autodetection will only occur if the partition type is
marked as type 0xFD, regardless of the real filesystem type which is made
onto the /dev/md? logical RAID devices.  This means that all partitions
which participate in ext2 filesystems get marked 0xFD, all partitions
which participate in swap space get marked 0xFD, and so on.  You certainly
should see all of your logical RAID devices contain all of their
participants, or something has gone wrong:

$ cat /proc/mdstat
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md0 : active raid1 hdc1[1] hda1[0] 29768320 blocks [2/2] [UU]
md1 : active raid1 hdc2[1] hda2[0] 264960 blocks [2/2] [UU]
unused devices: <none>

On this system, /dev/md0 is mounted as the root filesystem and /dev/md1 is
swap space.  Not surprisingly, the actual partition tables of the RAID
component physical disks (/dev/hda and /dev/hdc) are identical:

Disk /dev/hda: 255 heads, 63 sectors, 3739 cylinders
Units = cylinders of 16065 * 512 bytes
   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *        34      3739  29768445   fd  Linux raid autodetect
/dev/hda2             1        33    265041   fd  Linux raid autodetect

Disk /dev/hdc: 255 heads, 63 sectors, 3739 cylinders
Units = cylinders of 16065 * 512 bytes
   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1   *        34      3739  29768445   fd  Linux raid autodetect
/dev/hdc2             1        33    265041   fd  Linux raid autodetect

When this system boots, the kernel sees the partition types 0xFD and
therefore uses its internal RAID code to read the persistent superblock
information.  From this, the kernel learns which physical partitions are
to be joined up with which other physical partitions, and assembles the
/dev/md? logical devices from them.  The persistent superblocks also keep
track of a lot of housekeeping information such as whether the component
partitions were synced at the last shutdown, how many times resync has
been performend, and so on.

>From here, once the kernel has automatically assembled the /dev/md? RAID
devices, we treat them as conventional block devices.  For example, we
reference them in /etc/fstab:

$ grep 'md' /etc/fstab
/dev/md0        /        ext2    defaults,errors=remount-ro    0 1
/dev/md1        none     swap    sw                            0 0

$ mount
/dev/md0 on / type ext2 (rw,errors=remount-ro,errors=remount-ro)

$ cat /proc/swaps
Filename                        Type            Size    Used    Priority
/dev/md1                        partition       264952  11028   -1

(Whether to swap onto a RAID device is a little bit controversial, and my
view in favor of it should be obvious here, but the reasoning is beyond
the scope of this e-mail message.)

> md0 does not seem to be a e2fs fs, however.  e2fsck says bad superblock,
> even a -b 8193.  
> 	dd if=/dev/md0 | strings | less
> shows recognisable stuff, so I am sure that the /dev/md0 is the correct
> disks.
> 
> Waht other tools, except e2fsck and dumpe2fs, are available?

It is important to note here that the term "superblock" can be ambiguous.  
The software RAID code uses "persistent superblocks" to record information
about how to assemble the RAID logical devices from component partitions,
and individual filesystems such as ext2 use "superblocks" to record
information about how to interpret the organization of the logical devices
upon which the filesystem has been made.  These are really different kinds
of "superblocks" and have nothing to do with each other, except for some
coarse similarity of function and an extremely unfortunate choice of name.

In other words, what you should see on /dev/md0 is a perfectly standard
ext2 filesystem, which means that e2fsck and so forth should understand
it.  If not, then something has gone wrong somewhere, either in how
/dev/md0 is being assembled or in what data has been written onto it.

The main exception to this clean interface is that, in emergencies, the
filesystem tools such as e2fsck can be used as a last resort against the
component partitions with RAID totally disabled.  This is not the proper
way to recover a broken RAID set, and the raidtools2 utilities should be
used for that.  However, if something catastrophic were to happen with the
RAID utilities themselves or the kernel RAID code, provision has been made
for accessing the component partitions directly.  This was done through
the simple expedient of putting the RAID-specific information -- the
"persistent superblocks" -- at the end of the component partitions instead
of at the beginning, so that the first 99.999% or so of each component
partition looks to e2fsck as if it has a meaningful ext2 superblock at the
beginning.  Note that I am not actually advising you to access the data
this way, but simply advising you of the possibility should all else fail.  
(This is also done to allow booting from a RAID component, since clearly
the ROM BIOS has no knowledge of Linux software RAID and Lilo has to get
files loaded within that constraint.)

There are some extensive discussions of all of this in the RAID HOWTO --
http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ -- and most of what you
are dealing with is not Debian-specific.  As a result, you might be more
likely to get help with a serious problem on the Linux-RAID mailing list.

-- Mike




Reply to: