[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian installer and raid0



Francesco Pietra wrote:
> Bob Proulx wrote:
> > After installing simply run the grub install script against both
> > disks manually and then you will be assured that it has been
> > installed on both disks.
> 
> I had problems with that methodology and was unable to detect my error.
> >From a thread on debian dated Mar 2, 2013:
> ...
> > grub-install /dev/sdb
> >  was reported by complete installation. No error, no warning.
> > On rebooting, GRUB was no more found. Then entering in
> > grub rescue >
> > prefix/root/ were now wrong.

If the command does not work on the command line then it won't work
from the installer either.  The installer is doing the same things
that you can do from the command line.  Therefore asking if it is in
the installer won't help.  Because if it doesn't work then it doesn't
work either place.  If it does work then it will work either place.
That is my conjecture at least.  And since I have been using this
feature I believe it does work.  Works for me anyway.

I have been using RAID1 for a long time and have not encountered the
problem you describe.  That doesn't mean that such an error doesn't
occur.  Just that I can't recreate it.  Or rather after much user have
never recreated it.  This applies to both the good grub version 1 as
well as the newer and IMNHO buggier grub version 2 rewrite.  They are
completely different from each other.  Statements made about one do
not apply to the other because it was a complete rewrite.  But it is
certainly possible that in your configuration that you have a case
that does not work.

I have a workbench with a variety of hardware.  When I want to test
something like this I construct a victim system in which to try the
action.  If you could do the same I think it would help to get to the
root cause of the problem.  I would create a victim machine with two
drives for installation testing.  Then test the installation.  After
install and reboot then shutdown, unplug one disk, test boot.  Do not
boot all of the way to the system.  Simply boot to the grub menu and
stop there.  Then power off, switch disks, and test boot again.  Do
not boot all of the way to the system.  Simply boot to the grub menu
and again stop there.  If you can get to the grub menu from either
disk then grub has been installed on both disks.  If not then plug
both disks in and boot the system and test the grub-install script on
the non-booting disk and then repeat the single disk boot.

The reason to only boot to the grub menu is of course so that the
RAID1 doesn't get split.  If booting with one disk and then the other
one disk it will get a split brain of course.  No real problem on a
victim machine.  But it is faster to keep them in sync.  So I only
boot to the grub menu when testing the grub boot code.  Avoiding
booting the system avoids splitting the raid unnecessarily and speeds
up the debugging.

By testing this way you can verify that you can boot either disk in
isolation after the other disk has failed.  By using a victim machine
you can experiment.  Then if you find a bug you will have a recipe to
recreate it and can file a bug report on it.  Being able to recreate
the problem is the most valuable part.

And here is the challenge.  I think if you do this you will find that
it does actually work.  But feel free to write back here and tell me
that I am wrong and that there is a problem with it. :-) As the great
Mark Twain wrote "There is nothing so annoying as a good example."  If
you can get to a repeatable test case that fails that would be awesome.

> Now I am in the same situation, two servers with mirroring raid, grub on
> /dev/sda only. Identical data on both servers to cope with grub on one disk
> only. Not smart from my side.

Two servers so that you can switch your services from one server to
the other in case one of the servers cannot boot?

If you have two servers and one is the hot spare for the other then
perhaps after doing your own victim machine testing then you can
perform the fix on the spare and test there.  Then apply the fix to
the running server.  I think that should be a safe way to "sneak up"
on the solution.

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: