[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#650819: Confirmed, serious



severity 650819 serious
tags 650819 + confirmed patch
retitle 650819 GRUB entries (grub.cfg) sometimes lacking other operating systems, particularly installing 686 or amd64 images (i386)
reassign 650819 os-prober, grub-common
thanks

I have to confirm this. I was hit by this when installing from the March 22 i386 wheezy netinst on my laptop, a typical Intel Core i3 (x86-64) laptop with Windows 7. Although d-i detected Windows, after the install Windows was not listed by GRUB.
I reproduced with a later businesscard, and then with a March 27 "flexible way" USB key with an updated netinst. I reproduced this about in 10-20 installs before precisely understanding when/why it happened.

Thanks Brian for reporting. All the information you reported was precious in nailing this one. This is indeed an os-prober bug, or at least a bug of interaction between os-prober and GRUB.

First of all, debian-installer typically calls os-prober 3 times. The last time is during finish-install (clock-setup) and although it nicely fills syslog, it is not relevant at all to this problem. The 2 other times are indeed from grub-installer.
There are 2 os-prober packages, a deb and a udeb. Typically, both are installed. The deb may however not be installed, when automatic installation of recommendations is disabled (os-prober is only installed because it's recommended by grub-common) or when it is not available (for example, when installing from a netinst without using a mirror).
Typically, grub-installer calls os-prober twice. The first is used mainly to verify the list of other operating systems detected, before asking whether GRUB should be installed. The (possible) second time is when grub-installer calls update-grub (line 845). update-grub's 30_os-prober hook calls os-prober if it is installed.
There is an important difference between these calls. The first, direct, call to os-prober happens in d-i's context (it uses os-prober-udeb). The second one happens in-target (it uses the os-prober deb). This problem comes from this second time. Starting from version 1.45, os-prober's 50mounted-tests attempts to mount partitions using grub-mount, rather than using mount, if the former is available: http://packages.qa.debian.org/o/os-prober/news/20110424T183244Z.html
http://anonscm.debian.org/gitweb/?p=d-i/os-prober.git;a=commit;h=7ed9dec4d2c65056f211324f8e25a4d913b0f2a1

mounted=
if which grub-mount >/dev/null 2>&1 && \
   grub-mount "$partition" "$tmpmnt" 2>/dev/null; then
    mounted=1
    type="$(grub-probe -d "$partition" -t fs)"
    [ "$type" ] || type=fuseblk
else
    ro_partition "$partition"
    for type in $types; do
        if mount -o ro -t "$type" "$partition" "$tmpmnt" 2>/dev/null; then
            mounted=1
            break
        fi
    done
fi

What happens here is that grub-mount fails, but the if's condition still evaluates to true because grub-mount's exit status is 0, and the code above assumes 0 means success. From that point, 50mounted-tests considers the partition mounted, and subtests quietly fail to find anything.

This issue does not affect the first call to os-prober (which is outside the target) because which(1) is not available in the installer, so the condition is false and the tests fallback to the standard mount, which works. This bug (using which in os-prober-udeb) was fixed in os-prober 1.51: http://anonscm.debian.org/gitweb/?p=d-i/os-prober.git;a=commit;h=94048e4ec7a8896fb2c9c917433fa5e3ba71fbbe
However, that commit also introduced a check for grub-probe, which is not in grub-mount-udeb for now, as indicated in the commit message, so for now there is no functional difference; the first use of os-prober will keep falling back to the standard mount.


Brian's finding about the subtle "fuse init" line was a hint to the reason why grub-mount fails. grub-mount needs fuse, and fuse is not in the installer's 486 Linux. Here is what happens:
# grub-mount /dev/sdb1 /var/lib/os-prober/mount
fuse: device not found, try 'modprobe fuse' first

However, fuse is in stock (non-install) Linux images, so when installing the 486 image, grub-mount succeeds to load fuse because it's running in-target and it attemps loading the installed Linux's LKM, rather than failing to find a fuse LKM for the installer Linux. Of course, the installed Linux's fuse is compatible with the installer Linux's module ABI when installing the 486 image, but not when installing the 686 image. This is presumably also true on i386 for any non-486 image, such as amd64, however the 686 image is on netinsts and offered as a choice. It should be noted that at this time, the 486 image is more likely to be installed on 686 machines due to #655437, but this is merely a blessed misfortune.

I do not know other architectures, but I imagine that this doesn't affect amd64, as the only image proposed for installation will be amd64, which matches the installer. So I imagine this problem is largely specific to i386.


Back to the problem in 50mounted-tests's use of grub-mount, grub-mount's exit status is unspecified. However, it's clear that it generally attempts to return non-0 on error, but it doesn't do that in this case. I did not debug grub-mount, but this is my understanding of the problem from a summary code examination.

grub-mount.c:
static grub_err_t
fuse_init (void)
{
  int i;

  for (i = 0; i < num_disks; i++)
    {
      char *argv[2];
      char *host_file;
      char *loop_name;
      loop_name = grub_xasprintf ("loop%d", i);
      if (!loop_name)
    grub_util_error (grub_errmsg);

      host_file = grub_xasprintf ("(host)%s", images[i]);
      if (!host_file)
    grub_util_error (grub_errmsg);

      argv[0] = loop_name;
      argv[1] = host_file;

      if (execute_command ("loopback", 2, argv))
        grub_util_error (_("loopback command fails"));

      grub_free (loop_name);
      grub_free (host_file);
    }

  grub_lvm_fini ();
  grub_mdraid09_fini ();
  grub_mdraid1x_fini ();
  grub_raid_fini ();
  grub_raid_init ();
  grub_mdraid09_init ();
  grub_mdraid1x_init ();
  grub_lvm_init ();

  dev = grub_device_open (0);
  if (! dev)
    return grub_errno;

I believe grub_device_open() is failing, but the if still returns 0.

disk.c:

grub_disk_t
grub_disk_open (const char *name)
{
  const char *p;
  grub_disk_t disk;
  grub_disk_dev_t dev;
  char *raw = (char *) name;
  grub_uint64_t current_time;

  grub_dprintf ("disk", "Opening `%s'...\n", name);

  disk = (grub_disk_t) grub_zalloc (sizeof (*disk));
  if (! disk)
    return 0;

  p = find_part_sep (name);
  if (p)
    {
      grub_size_t len = p - name;

      raw = grub_malloc (len + 1);
      if (! raw)
    goto fail;

      grub_memcpy (raw, name, len);
      raw[len] = '\0';
      disk->name = grub_strdup (raw);
    }
  else
    disk->name = grub_strdup (name);
  if (! disk->name)
    goto fail;


  for (dev = grub_disk_dev_list; dev; dev = dev->next)
    {
      if ((dev->open) (raw, disk) == GRUB_ERR_NONE)
    break;
      else if (grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
    grub_errno = GRUB_ERR_NONE;
      else
    goto fail;
    }

  if (! dev)
    {
      grub_error (GRUB_ERR_UNKNOWN_DEVICE, "no such disk");
      goto fail;
    }

Oddly, GRUB_ERR_UNKNOWN_DEVICE seems to be defined as 0.

err.h:

typedef enum
  {
    GRUB_ERR_NONE = 0,
    GRUB_ERR_TEST_FAILURE,
    GRUB_ERR_BAD_MODULE,
    GRUB_ERR_OUT_OF_MEMORY,
    GRUB_ERR_BAD_FILE_TYPE,
    GRUB_ERR_FILE_NOT_FOUND,
    GRUB_ERR_FILE_READ_ERROR,
    GRUB_ERR_BAD_FILENAME,
    GRUB_ERR_UNKNOWN_FS,
    GRUB_ERR_BAD_FS,
    GRUB_ERR_BAD_NUMBER,
    GRUB_ERR_OUT_OF_RANGE,
    GRUB_ERR_UNKNOWN_DEVICE,
    GRUB_ERR_BAD_DEVICE,
    GRUB_ERR_READ_ERROR,
    GRUB_ERR_WRITE_ERROR,
    GRUB_ERR_UNKNOWN_COMMAND,
    GRUB_ERR_INVALID_COMMAND,
    GRUB_ERR_BAD_ARGUMENT,
    GRUB_ERR_BAD_PART_TABLE,
    GRUB_ERR_UNKNOWN_OS,
    GRUB_ERR_BAD_OS,
    GRUB_ERR_NO_KERNEL,
    GRUB_ERR_BAD_FONT,
    GRUB_ERR_NOT_IMPLEMENTED_YET,
    GRUB_ERR_SYMLINK_LOOP,
    GRUB_ERR_BAD_COMPRESSED_DATA,
    GRUB_ERR_MENU,
    GRUB_ERR_TIMEOUT,
    GRUB_ERR_IO,
    GRUB_ERR_ACCESS_DENIED,
    GRUB_ERR_EXTRACTOR,
    GRUB_ERR_BUG
  }
grub_err_t;

This I don't understand, if it's intentional.


So I see 4 ways to fix/workaround this:
  • Add fuse to the installer's Linux image(s), or add a fuse modules udeb
  • Always use traditional mount instead of grub-mount
  • Make grub-mount return non-0 on failure
  • Check grub-mount's output instead of just checking its exit status.


I used the last approach, changing both 50mounted-tests from

if type grub-mount >/dev/null 2>&1 && \
   type grub-probe >/dev/null 2>&1 && \
   grub-mount "$partition" "$tmpmnt" 2>/dev/null; then

 to

if type grub-mount >/dev/null 2>&1 && \
   type grub-probe >/dev/null 2>&1 && \
   [ -z `grub-mount "$partition" "$tmpmnt" 2>&1` ]; then

I verified that this succeeds to workaround. Note that this assumes that grub-mount will write to stderr or stdout if and only if it fails.

umount's failure towards the end of 50mounted-tests ("warning: failed to umount /var/lib/os-prober/mount") is therefore an indication of the problem, but not its cause. It would greatly help to avoid problems of this kind to give the reason for this failure:

    REASON=$(umount "$tmpmnt" 2>&1)
    if ! [ $? = "0" ]; then
        warn "failed to umount $tmpmnt ; $REASON"
    fi


It also wouldn't hurt to warn when the partition wasn't mounted. These changes would give something like this for the general 50mounted-tests:

if [ "$mounted" ]; then
    for test in /usr/lib/os-probes/mounted/*; do
        debug "running subtest $test"
        if [ -f "$test" ] && [ -x "$test" ]; then
            if "$test" "$partition" "$tmpmnt" "$type"; then
                debug "os found by subtest $test"
                if ! umount "$tmpmnt"; then
                    warn "failed to umount $tmpmnt"
                fi
                rmdir "$tmpmnt" || true
                exit 0
            fi
        fi
    done
    REASON=$(umount "$tmpmnt" 2>&1)
    if ! [ $? = "0" ]; then
        warn "failed to umount $tmpmnt ; $REASON"
    fi
else
    warn "mounted-tests: $partition not mounted"
fi

To clarify, this will not happen when os-prober is not installed to the target. In that case, grub-installer hacks a static 30_otheros file by using the output of the first call to os-prober (from the udeb). This appears to work fine, so the problem will not be visible in this rare case, as grub.cfg will end up containing the necessary entries.

Unfortunately, it's currently only the second call that fails (due to the which / grub-probe issue(s) explained above), which causes the installed system to lack the entries even though grub-installer said they were detected in its prompt, hence serious severity.


Reply to: