On Sat, 19 Sep 2015 01:01:28 +0100 Steve McIntyre wrote: > On Fri, Sep 18, 2015 at 07:18:56PM +0200, Francesco Poli wrote: > >On Tue, 15 Sep 2015 23:54:36 +0200 Francesco Poli wrote: > >> On Sat, 12 Sep 2015 11:05:28 +0200 Francesco Poli wrote: > >> > > >> > Please tell me if my reasoning makes sense to you or, otherwise, > >> > explain where I am being naive. > >> > >> Please clarify whether my reasoning is flawed... > > > >I am trying hard to address this issue, but I need some explanations: > >that's why I would like to discuss my ideas... > >Please help me to help you! > > Apologies for delayed response - the last week has been hellishly > busy with $dayjob stuff and I've had almost no time at all for > discussions elsewhere. That's understandable, I just have my new box sitting there waiting for me to install Debian stretch on it and I would like to avoid botching the installation plan and having to start over multiple times... > > So: > > >The problem is: if one ESP is considered to be the "master" one, and > >the other ESPs are "slave" ones, kept in sync with the "master", what > >happens when the drive hosting the "master" ESP breaks? The system > >should be able to boot from one "slave" ESP (assuming boot priorities > >are set with efibootmgr), but it won't be able to mount /boot/efi > >(since the fstab will refer to the inaccessible "master" ESP); at > >that point, if an upgrade of grub-efi-amd64 has to be performed > >before the dead drive is replaced, a new "temporarily-master" ESP has > >to be found and selected, mounted on /boot/efi, its content updated, > >and any remaining ESPs (if present) have to be synced to this > >"temporarily-master" ESP... > > To be honest, I think you're making it sound more complicated than it > needs to be there. If the normal "master" disk has failed, simply pick > by hand the first of the replicas and call that the new master. What do you mean "pick by hand"? If I understand correctly, you mean that, after one drive breaks, the user would have to: • learn about it (by, e.g., receiving local mail about degraded arrays, the usual stuff) *before* the next grub-efi-amd64 upgrade • manually check whether the broken drive is the one hosting the "master" ESP • in case it is, manually alter /etc/fstab to mount one of the "slave" ESPs on /boot/efi • manually mount that "slave" ESP on /boot/efi • manually instruct grub-efi-amd64 to consider this "slave" ESP as the "temporarily-master" one and begin to sync other remaining ESPs (if any) to this one (maybe this may be automated, by having grub-efi-amd64 consider the ESP mounted on /boot/efi as the "master" one, but the package needs to find the accessible "slave" ESPs anyway and this may be tricky) I would like to spare every user all this trouble, and I think that having to mount all the ESPs on parallel mount points (/boot/efi, /boot/efi2, ...) could be considered as a fair cost! I am still convinced that using multiple mount points is the simplest way to go... It's also more similar to what grub-pc does with BIOS-based machines, if I understand correctly: it repeats the MBR updating process for each of the configured devices, treating them as independent equals, without syncing "slave" ones to a "master" one. Is that right? > > >Instead, if all ESPs are mounted on distinct mount points (/boot/efi > >, /boot/efi2 , /boot/efi3 , and so forth) and updated independently, > >there should be no need for special tricks whenever one of them is > >inaccessible (and thus not mounted). > > > >Please tell me if my reasoning makes sense to you or, otherwise, > >explain where I am being naive. > > The proliferation of mount points looks messy to me, I'll be > honest. In fundamental terms, there's no real difference between what > you'll get on all the ESPs but you'll forever have more noise in "df" > and friends. As I said, the extra "noise" in the output of df and friends seems to me a more than acceptable price for avoiding all the additional manual operations described above! > > Hmmm, pondering... Is there any way to hide RAID metadata in some way > so we really *could* just do RAID1? I am afraid I don't understand what you mean by this sentence: could you please elaborate a bit? [...] > we're still seeing implementors get things > wrong, either by incompetence or sheer laziness. [...] > > *However*, don't le my glib warning about broken implementations put > you off trying to do something clever and useful here. I'll try to do my best, but I was a little scared by this mess of broken UEFI implementations: after all, one wants RAID1 (or some more sophisticated RAID level) to get data redundancy and survive a drive failure; if the system fails to boot, when one drive breaks, then the usefulness of RAID1 is seriously reduced! > > >> > > I'm not sure how well most are likely to > >> > > deal with actual hardware failure. We'll find out, I guess... :-) > >> > > >> > That's not comforting! :-( > >> > > >> > What I can do to test the setup, is (at most) try to boot with one > >> > drive disconnected and see what happens. > >> > One thing's sure: I will *not* intentionally break one drive [1], just > >> > to test how the UEFI firmware implementation deals with actual hardware > >> > failure! > >> > > >> > > >> > [1] how, by the way? with a hammer?!? ;-) > > What I'd be tampted to do to start with is simply unplug a drive, > either physically or logically. The first test I had in mind was just that: unplug one drive, attempt to boot the system and see what happens. > For most UEFI development, using > qemu/KVM and OVMF as a firmware binary is really useful. You get to > work directly on your development system, and it's possible to debug > things much more quickly and easily. If you're not sure how to do > that, shout. I have unfortunately zero experience with KVM. It could be useful to test a modified grub-efi-amd64 package, when we reach that point of development. Hence, I'll sure get back to you and ask for help later. But now I need to install Debian stretch on physical hardware, instead... > I'm planning on adding a second UEFI page in the wiki > when I get some time, detailing how I do this kind of > development. Hopefully it will help others too... That would be highly appreciated. Please remember to license that page in a DFSG-free manner (recommended choices: the GNU GPL or the Expat license), so that others will be *legally* allowed to improve it. -- http://www.inventati.org/frx/ There's not a second to spare! To the laboratory! ..................................................... Francesco Poli . GnuPG key fpr == CA01 1147 9CD2 EFDF FB82 3925 3E1C 27E1 1F69 BFFE
Attachment:
pgpIS1VIONEu2.pgp
Description: PGP signature