severity 650819 serious|
tags 650819 + confirmed patch
retitle 650819 GRUB entries (grub.cfg) sometimes lacking other operating systems, particularly installing 686 or amd64 images (i386)
reassign 650819 os-prober, grub-common
I have to confirm this. I was hit by this when installing from the March 22 i386 wheezy netinst on my laptop, a typical Intel Core i3 (x86-64) laptop with Windows 7. Although d-i detected Windows, after the install Windows was not listed by GRUB.
I reproduced with a later businesscard, and then with a March 27 "flexible way" USB key with an updated netinst. I reproduced this about in 10-20 installs before precisely understanding when/why it happened.
Thanks Brian for reporting. All the information you reported was precious in nailing this one. This is indeed an os-prober bug, or at least a bug of interaction between os-prober and GRUB.
First of all, debian-installer typically calls os-prober 3 times. The last time is during finish-install (clock-setup) and although it nicely fills syslog, it is not relevant at all to this problem. The 2 other times are indeed from grub-installer.
There are 2 os-prober packages, a deb and a udeb. Typically, both are installed. The deb may however not be installed, when automatic installation of recommendations is disabled (os-prober is only installed because it's recommended by grub-common) or when it is not available (for example, when installing from a netinst without using a mirror).
Typically, grub-installer calls os-prober twice. The first is used mainly to verify the list of other operating systems detected, before asking whether GRUB should be installed. The (possible) second time is when grub-installer calls update-grub (line 845). update-grub's 30_os-prober hook calls os-prober if it is installed.
There is an important difference between these calls. The first, direct, call to os-prober happens in d-i's context (it uses os-prober-udeb). The second one happens in-target (it uses the os-prober deb). This problem comes from this second time. Starting from version 1.45, os-prober's 50mounted-tests attempts to mount partitions using grub-mount, rather than using mount, if the former is available: http://packages.qa.debian.org/o/os-prober/news/20110424T183244Z.html
What happens here is that grub-mount fails, but the if's condition still evaluates to true because grub-mount's exit status is 0, and the code above assumes 0 means success. From that point, 50mounted-tests considers the partition mounted, and subtests quietly fail to find anything.
This issue does not affect the first call to os-prober (which is outside the target) because which(1) is not available in the installer, so the condition is false and the tests fallback to the standard mount, which works. This bug (using which in os-prober-udeb) was fixed in os-prober 1.51: http://anonscm.debian.org/gitweb/?p=d-i/os-prober.git;a=commit;h=94048e4ec7a8896fb2c9c917433fa5e3ba71fbbe
However, that commit also introduced a check for grub-probe, which is not in grub-mount-udeb for now, as indicated in the commit message, so for now there is no functional difference; the first use of os-prober will keep falling back to the standard mount.
Brian's finding about the subtle "fuse init" line was a hint to the reason why grub-mount fails. grub-mount needs fuse, and fuse is not in the installer's 486 Linux. Here is what happens:
# grub-mount /dev/sdb1 /var/lib/os-prober/mount
fuse: device not found, try 'modprobe fuse' first
However, fuse is in stock (non-install) Linux images, so when installing the 486 image, grub-mount succeeds to load fuse because it's running in-target and it attemps loading the installed Linux's LKM, rather than failing to find a fuse LKM for the installer Linux. Of course, the installed Linux's fuse is compatible with the installer Linux's module ABI when installing the 486 image, but not when installing the 686 image. This is presumably also true on i386 for any non-486 image, such as amd64, however the 686 image is on netinsts and offered as a choice. It should be noted that at this time, the 486 image is more likely to be installed on 686 machines due to #655437, but this is merely a blessed misfortune.
I do not know other architectures, but I imagine that this doesn't affect amd64, as the only image proposed for installation will be amd64, which matches the installer. So I imagine this problem is largely specific to i386.
Back to the problem in 50mounted-tests's use of grub-mount, grub-mount's exit status is unspecified. However, it's clear that it generally attempts to return non-0 on error, but it doesn't do that in this case. I did not debug grub-mount, but this is my understanding of the problem from a summary code examination.
I believe grub_device_open() is failing, but the if still returns 0.
Oddly, GRUB_ERR_UNKNOWN_DEVICE seems to be defined as 0.
This I don't understand, if it's intentional.
So I see 4 ways to fix/workaround this:
if type grub-mount >/dev/null 2>&1 && \
if type grub-mount >/dev/null 2>&1 && \
I verified that this succeeds to workaround. Note that this assumes that grub-mount will write to stderr or stdout if and only if it fails.
umount's failure towards the end of 50mounted-tests ("warning:
failed to umount /var/lib/os-prober/mount") is therefore an
indication of the problem, but not its cause. It would greatly
help to avoid problems of this kind to give the reason for this
REASON=$(umount "$tmpmnt" 2>&1)
if [ "$mounted" ]; then
To clarify, this will not happen when os-prober is not installed
to the target. In that case, grub-installer hacks a static
30_otheros file by using the output of the first call to os-prober
(from the udeb). This appears to work fine, so the problem will
not be visible in this rare case, as grub.cfg will end up
containing the necessary entries.
Unfortunately, it's currently only the second call that fails
(due to the which / grub-probe issue(s) explained above), which
causes the installed system to lack the entries even though
grub-installer said they were detected in its prompt, hence