[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: RAID5 (mdadm) array hosed after grow operation



On Monday January 5, jpiszcz@lucidpixels.com wrote:
> cc linux-raid
> 
> On Mon, 5 Jan 2009, whollygoat@letterboxes.org wrote:
> 
> > I think growing my RAID array after replacing all the
> > drives with bigger ones has somehow hosed the array.
> >
> > The system is Etch with a stock 2.6.18 kernel and
> > mdadm v. 2.5.6, running on an Athlon 1700 box.
> > The array is 6 disk (5 active, one spare) RAID 5
> > that has been humming along quite nicely for
> > a few months now.  However, I decided to replace
> > all the drives with larger ones.
> >
> > The RAID reassembled fine at each boot as the drives
> > were replaced one by one.  After the last drive was
> > partitioned and added to the array, I issued the
> > command
> >
> >   "mdadm -G /dev/md/0 -z max"
> >
> > to grow the array to the maximum space available
> > on the smallest drive.  That appeared to work just
> > fine at the time, but booting today the array
> > refused to assemble with the following error:
> >
> >    md: hdg1 has invalid sb, not importing!
> >    md: md_import_device returned -22
> >
> > I tried to force assembly but only two of the remaining
> > 4 active drives appeared to be fault free.  dmesg gives
> >
> >    md: kicking non-fresh hde1 from array!
> >    md: unbind<hde1>
> >    md: export_rdev(hde1)
> >    md: kicking non-fresh hdi1 from array!
> >    md: unbind<hdi1>
> >    md: export_rdev(hdi1)

Please report
   mdadm --examine /dev/whatever
for every device that you think should be a part of the array.

> >
> > I also noticed that "mdadm -X <drive>" shows
> > the pre-grow device size for 2 of the devices
> > and some discrepancies between event and event cleared
> > counts.

You cannot grow an array with an active bitmap... or at least you
shouldn't be able to.  Maybe 2.6.18 didn't enforce that.  Maybe that
is what caused the problem - not sure.

> >
> > One last thing I found curious---from dmesg:
> >
> >    EXT3-fs error (device hdg1): ext3_check_descriptors: Block
> >    bitmap for group 0 not in group (block 2040936682)!
> >    EXT3-fs: group descriptors corrupted!
> >
> > There is not ext3 directly on hdg1.  LVM sits between the
> > and the filesystem, so the above message seems suspect.

Seems like something got confused during boot and the wrong device got
mounted.  That is bad.

NeilBrown


Reply to: