[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1039710: debian-installer: Grub installation fails and /var/log/syslog is empty



Hi everyone!

Somehow I missed this whole issue, - I didn't see it until now.
Will adjust my mail filters.

08.08.2023 00:49, Philip Hands wrote:
Steve McIntyre <steve@einval.com> writes:

On Wed, Jul 12, 2023 at 11:15:57AM +0200, Cyril Brulebois wrote:
Hi Michael,

Cyril Brulebois <kibi@debian.org> (2023-06-28):
Control: reassign -1 busybox-udeb 1:1.36.1-3

[…]

With a local build, confirmed -3 is buggy, and that reverting only
busybox-udeb to -1 is sufficient to restore syslog support in the
installer.

Reassigning there; the GRUB thing can be filed separately once we have
actual information via syslog.

A fix would be appreciated, we've got reports piling up about things we
have no logs for.

After weeks with this breakage, I've just uploaded a minimal NMU to
fix it, reverting the syslog changes since -1. I've buit and tested
successfully locally.

It turned out whole syslog thing in busybox is quite broken, - this is
obvious when you see my initial patch which started whole this issue.
Later on upstream did it in a different way which broke whole thing
entirely, which I tried to fix and it seemed to be working locally, I
notified upstream about the breakage and moved on, thinking it's all
set. But obviously it is not.

Thanks -- and I agree, it works :-)

   https://openqa.debian.net/tests/178534/logfile?filename=DI_syslog.txt

As it happens, over the weekend it occurred to me that I might be able
to pave the way to a fix for this bug by coming up with a test for the
failure.

Awkwardly, syslogd wants to open /dev/log and bails out if it's already
in use, so I resorted to (the somewhat disgusting hack of) using podman:

    https://salsa.debian.org/philh/busybox/-/commit/2697f7cce81d1a70de202a30e7062dc9f64a94b1

At least it allows syslogd to run well enough to succeed or fail
similarly to the behaviour seen in the bug.

Gosh..

Here it is going wrong with the -3 code:

   https://salsa.debian.org/philh/busybox/-/jobs/4523822#L3963
   (lines 3969-3975, with the last line showing the entire syslog)

and here is an example of it going right:

   https://salsa.debian.org/philh/busybox/-/jobs/4523808#L3668

   Line 3668 here, saying "PASS: syslogd-works" indicates that we
   succeeded in grepping the test string in /var/log/syslog

The difference between these two is simply disabling
CONFIG_FEATURE_REMOTE_LOG, as seen here:

   https://salsa.debian.org/philh/busybox/-/commit/89c253f75690dd41487b6fd6f9356a1e890a6ac2

I'm not proposing that as a fix, but it does seem to indicate where the
problem may be located. I'm afraid I've failed to work out what's
actually going wrong here (my C's pretty rusty).

BTW At one point I thought I'd narrowed it down to the while loop here:

   https://salsa.debian.org/philh/busybox/-/commit/328fdfbe43cd8d9e4425c3ee1c68aadfa44ee434

but if that did work, it does no longer. Either I was mistaken about it
having worked earlier (I'm at least 80% sure that's not the case) or
something non-deterministic is going on ... which makes me wonder if the
underlying cause might be something to do with using uninitialised data
somewhere.

Hopefully this will be of some use to those more familiar with the code.

Oh well. So much work for so minor thing.. :(  I'm sorry for missing whole
thing, I'd act right away and fix whole thing in a minutes.

The whole thing is.. well, quite bad.  We identified a few issues, upstream
syslogd is entirely broken now, remote logging isn't that important and it
has just been enabled, - the fix for now is to just disable remote logging
and to revert to the previous-to-breakage situation as is done in the NMU
(remote logging is a niche thing in this context, while it might be useful
for sure - provided it actually works.)  And ping upstream.

The thing is that upstream will most likely fix it in a different way anyway,
as Denys likes to keep it small even if the code becomes barely readable,
and he has a few common practices which he uses when changing anything.

Thank you all for all this huge work.  Adding podman to the tests is.. oh well...

/mjt


Reply to: