[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian machine not booting



James Allsopp wrote:
> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
> md126 : active raid1 sdb3[0] sdc3[1]
>       972550912 blocks [2/2] [UU]

So sdb3 and sdc3 are assembled into /dev/md126.  That seems good.  One
full array is assembled.

Is /dev/md126 your preferred name for that array?  I would guess not.
Usually it is /dev/md0 or some such.  But when that name is not
available because it is already in use then mdadm will rotate up to a
later name like /dev/md126.

You can fix this by using mdadm with --update=super-minor to force it
back to the desired name.  Something like this using your devices:

  mdadm --assemble /dev/md0 --update=super-minor /dev/sdb3 /dev/sdc3

But that can only be done at assembly time.  If it is already
assembled then you would need to stop the array first and then
assemble it again.

> md127 : active raid1 sdd1[0]
>       1953510841 blocks super 1.2 [2/1] [U_]
> 
> md1 : active raid1 sde1[1]
>       1953510841 blocks super 1.2 [2/1] [_U]

I think this array is now has a split brain problem.  At this point
the original single mirrored array has had both halves of the mirror
assembled and both are running.  So now you have two clones of each
other and both are active.  Meaning that each think they are newer
than the other.  Is that right?  In which case you will eventually
need to pick one and call it the master.  I think the sde1 is the
natural master since it is assembled on /dev/md1.

> cat /etc/mdadm/mdadm.conf
> ...
> # definitions of existing MD arrays
> ARRAY /dev/md0 UUID=a529cd1b:c055887e:bfe78010:bc810f04

Only one array specified.  That is definitely part of your problem.
You should have at least two arrays specified there.

> mdadm --detail --scan:
> 
> ARRAY /dev/md/0_0 metadata=0.90 UUID=a529cd1b:c055887e:bfe78010:bc810f04

That mdadm --scan only found one array is odd.

> fdisk -l
> 
> Disk /dev/sda: 120.0 GB, 120033041920 bytes
> 255 heads, 63 sectors/track, 14593 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0002ae52
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1               1       14593   117218241   83  Linux

I take it that this is your boot disk?  Your boot disk is not RAID?

I don't like that the first used sector is 1.  That would have been 63
using the previous debian-installer to leave space for the MBR and
other things.  But that is a different issue.

> Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
> 255 heads, 63 sectors/track, 243201 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
                              ^^^^         ^^^^

That is an Advanced Format 4k sector drive.  Meaning that the
partitions should start on a 4k sector alignment.  The
debian-installer would do this automatically.

> Disk identifier: 0xe044b9be
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdd1               1      243201  1953512001   fd  Linux raid autodetect
                      ^^^^^
> /dev/sde1               1      243201  1953512001   fd  Linux raid autodetect
                      ^^^^^
> Partition 1 does not start on physical sector boundary.


I don't recall if the first sector is 0 or 1 but I think the first
sector is 0 for the partition table.  Meaning that sector 1 is not
going to be 4k aligned.  (Can someone double check me on this?)
Meaning that this will require a lot of read-modify-write causing
performance problems for those drives.

The new standard for sector alignment would start at 2048 to leave
space for the partition table and other things and still be aligned
properly.

> I don't know if this helps or where to go from here, but I think I need to
> get the mdadm up and running properly before I do anything.

Probably a good idea.

> If there's any commands you need me to run, please ask,

How are you booted now?  Are you root on the system through something
like the debian-installer rescue boot?  Or did you use a live cd or
something?

Please run:

  # mdadm --detail /dev/sdd1
  # mdadm --detail /dev/sde1

Those are what look to be the split brain of the second array.  They
will list something at the bottom that will look like:

        Number   Major   Minor   RaidDevice State
  this     1       8       17        1      active sync   /dev/sdb1

     0     0       8        1        0      active sync   /dev/sda1
     1     1       8       17        1      active sync   /dev/sdb1

Except in your case each will list one drive and will probably have
the other drive listed as removed.  But importantly it will list the
UUID of the array in the listing.

            Magic : a914bfec
          Version : 0.90.00
             UUID : b8eb34b1:bcd37664:2d9e4c59:117ab348
    Creation Time : Fri Apr 30 17:21:12 2010
       Raid Level : raid1
    Used Dev Size : 497856 (486.27 MiB 509.80 MB)
       Array Size : 497856 (486.27 MiB 509.80 MB)
     Raid Devices : 2
    Total Devices : 2
  Preferred Minor : 0

Check each physical volume and verify that the UUID and other stats
verify that the same array has been forked and is running on both.
The data in that header should be the same for both halves of the
cloned and split mirror.

Corrective Action:

I _think_ you should stop the array on /dev/md127.  Then add that disk
to the array running on /dev/md1.  Don't do this until you have
confirmated that the two drives are clones of each other.  If they are
split then you need to join them.  I think something like this:

  mdadm --stop /dev/md127
  mdadm --manage /dev/md1 --add /dev/sdd1

Be sure to double check all of my device nodes and agree with those
before you do these commands.  But I think those are what you want to
do.  That will basically destroy anything what is currently sdd1 and
sync sde1 upon sdd1.

At that point you should have both arrays running.  You could stop
there and live with /dev/md126 but I think you want to fix the device
minor numbering on /dev/md126 by stopping the array and assembling it
again with the correct name.

  mdadm --stop /dev/md126
  mdadm --assemble /dev/md0 --update=super-minor /dev/sdb1 /dev/sdc1

At that point you should have two arrays up and running on /dev/md0
and /dev/md1 and both should have the low level lvm physical volumes
needed to assemble the lvm volume groups.  Run the --scan again.

  mdadm --detail --scan

Any errors at this time?  Hopefully it will list two arrays.  If not
then something is still wrong.  Here are some additional commands to
get the same information anyway.

  mdadm --detail /dev/md0
  mdadm --detail /dev/md1

  mdadm --examine /dev/sdb3
  mdadm --examine /dev/sdc3

  mdadm --examine /dev/sdd1
  mdadm --examine /dev/sde1

If that turns out favorable then edit the /etc/mdadm/mdadm.conf file
and update the list of ARRAY lines there.  I don't have the UUID
numbers from your system so can't suggest anything.  But the above
will list out the UUID numbers for the arrays.  Use them to update the
mdadm.conf file.

Then after updating that file update the initramfs.  I usually
recommend using dpkg-reconfigure of the current kernel package.  But
using 'update-initramfs -u' if you want is okay too.  The important
concept is that the initrd needs to be rebuilt including the new
arrays as listed in mdadm.conf so that the arrays are assembled at
initramfs time.

  dpkg-reconfigure linux-image-$(uname -r)

At this point if everything worked then you should be good to go.  I
would cross your fingers and reboot.  If all is good then it should
reboot okay.

Just as additional debug, after having both arrays up and online then
you can activate the lvm manually.  I would probably try letting the
system reboot first.  But just as low-level commands to further debug
things as hints of where to look next in case they might be needed.

  modprobe dm-mod
  vgscan
  vgchange -aly

That should activate the LVM.  You should have devices in
/dev/mapper/* corresponding to them.  You should be able to see a
listing of the logical volumes on the system.

  lvs

Good luck!
Bob

Attachment: signature.asc
Description: Digital signature


Reply to: