[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#989124: marked as done (grub-installer: occasional failure to install grub (when two DEs selected))



Your message dated Wed, 26 May 2021 18:07:07 +0200
with message-id <87v9755vis.fsf@hands.com>
and subject line Re: Bug#989124: grub-installer: occasional failure to install grub (when two DEs selected)
has caused the Debian Bug report #989124,
regarding grub-installer: occasional failure to install grub (when two DEs selected)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
989124: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989124
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: grub-installer
Version: 1.178
Severity: minor

Dear Maintainer,

While testing under openQA (so in qemu/kvm) if selecting more than one DE,
somthing like one in ten installs will fail to install grub, resulting in an
unbootable system.

Given that this is only happening in the unusual circumstance of selecting
multiple desktops, and even then is only an intermitent bug, I've tagged it as
minor.

An example of this can be found here:

  https://openqa.debian.net/tests/4457

which one can see hanging at the initial boot screen, rather than booting to a login prompt.

One of the assets being collected it a dump of the start of the target block
device, which in the failing case looks like this:

  https://openqa.debian.net/tests/4457/file/complete_install-dev_vda_dump.txt

whereas when things are working it looks like this:

  https://openqa.debian.net/tests/4439/file/complete_install-dev_vda_dump.txt

I have tried making it collect data earlier during the install
but doing so resulted in bug going away.

[I had it flip to the console when mandb is being installed, as that sits on the
screen for quite a while so provides a good trigger for the action, and run a
few commands to collect state, then flip back to the graphical screen.]

BTW The syslog from that failing run is here:

  https://openqa.debian.net/tests/4457/file/complete_install-syslog.txt

If there's more information that could usefully be collected, please mention
what you think might help and I'll add it to the openqa scripts.

Cheers, Phil.

--- End Message ---
--- Begin Message ---
Hi Cyril,

I'm going to close the bug for now, and reopen it if I manage to come up
with evidence this isn't just an artefact of the way I wrote the test.

Thanks for the helpful feedback -- in particular noticing the missing
boolean screenshot, which I'd missed (no need to read the rest of this
unless you're interested in the details)

Cyril Brulebois <kibi@debian.org> writes:

> Philip Hands <phil@hands.com> (2021-05-26):
>> Dear Maintainer,
>
> Dear Bug Reporter,
>
> (:D)

:-)

> I'm not sure I really trust the screenshots that show /dev/vda selected
> in both cases. After all, looking one step before, the boolean regarding
> installing GRUB wasn't captured at all in the failing case, compare the
> screenshots starting here:
>
>  - https://openqa.debian.net/tests/4457#step/grub/45 (ko)
>  - https://openqa.debian.net/tests/4439#step/grub/45 (ok)

Ah, well spotted -- I'd not noticed the missing boolean shot there.

Looking at the video, that would seem to be a real difference:

  https://openqa.debian.net/tests/4457/file/video.ogv#t=41.55,41.60
  
vs.

  https://openqa.debian.net/tests/4439/file/video.ogv#t=41.55,41.60

The OpenQA code assumes that that will be selected, and hits <TAB> <RET>
so if it comes up as having "No" selected, that's what happens, and you
get no bootloader.

It really ought to take a screenshot there though, because of the
assert, so I'm suspecting that it's somehow getting past that prompt
without needing to look for that screen.

If it had hit one return too many earlier, perhaps that's buffered and
getting used to jump past that prompt without a screenshot.

It seems a bit odd that d-i might occasionally present the alternative
default, so I suspect that's not what's happening at all.

It strikes me as rather more likely that the openqa worker might be
running unusually slowly on occasion, and that may provoke one of the:

    wait_screen_change {
        send_key 'ret';
    };

bits to send unneeded ret's, which may be messing things up later. BTW
That usage was inspired by some of Fedora's tests, but always struck me
as a bit suspect -- I'll probably eliminate that from out tests if it
turns out to be an issue.

> but maybe that's just a side effect of the console switching gymnastics
> you mentioned? (Sending left Ctrl or the like every few minutes avoids
> running into DPMS/blanking issues, I'm using that trick.)
>
> Anyway, any chance you could add `DEBCONF_DEBUG=developer` on the kernel
> command line, so that we have a chance of understanding what's happening
> on the debconf level? Otherwise, we might try and hotpatch
> grub-installer to add some more logging but if we could avoid that…

It's really easy to add the debug stuff on the kernel command line, so
if it turns out not to be an openQA issue, I'll try adding a job with
debugging turned on -- I'm not certain, but I have a feeling that I
tried that without it being very informative before, but I'm afraid I
forgot why (probably I just never saw it fail in this way).

Cheers, Phil.
-- 
|)|  Philip Hands  [+44 (0)20 8530 9560]  HANDS.COM Ltd.
|-|  http://www.hands.com/    http://ftp.uk.debian.org/
|(|  Hugo-Klemm-Strasse 34,   21075 Hamburg,    GERMANY

Attachment: signature.asc
Description: PGP signature


--- End Message ---

Reply to: