[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#991967: Simply ACPI powerdown/reset issue?



Hi Elliot and others,

Also including #994899 for once, since that's the bug number for the Xen
issue now.

On 9/26/21 5:27 AM, Elliott Mitchell wrote:
> On Tue, Sep 21, 2021 at 06:33:20AM -0400, Chuck Zmudzinski wrote:
>> I presume you are suggesting I try booting 4.19.181-1 on the
>> current version of Xen-4.14 for bullseye as a dom0. I am not
>> inclined to try it until an official Debian developer endorses
>> your opinion that the bug I am seeing is distinct
>> from #991967, at which point I will report the bug I am
>> seeing as a new bug.
> 
> Chuck Zmudzinski you are getting rather close to my threshold for calling
> harrassment.  You're not /quite/ there, but I'm concerned.
> 
> 
> Since the purpose of the bug reports is to find and diagnose bugs, I did
> a bit of experimentation and made some observations.
> 
> I checked out the Debian Xen source via git.  I got the current
> "master" branch which is presently the candidate 4.14.3-1 version,
> which includes urgent fixes.  The hash is:
> e7a17db0305c8de891b366ad37777528e5a43015
> 
> On top of this I cherry-picked 3 commits from Xen's main branch:
> 5a4087004d1adbbb223925f3306db0e5824a2bdc
> 0f089bbf43ecce6f27576cb548ba4341d0ec46a8
> bc141e8ca56200bdd0a12e04a6ebff3c19d6c27b
> 
> (these can be retrieved via Xen's gitweb at
> https://xenbits.xen.org/gitweb/?p=xen.git;a=patch;h=<$hash> which is
> suitable for the `git am` command)
> 
> With these I built 4.14.3-1 and then tried kernels 4.19.181-1 and
> 4.19.194-3 (this system is presently mostly on oldstable).  The results
> were:
> 
> Xen 4.14.3-1 with Linux 4.19.181-1: system reboots were successful
> 
> Xen 4.14.3-1 with Linux 4.19.194-3: system reboots hung

Ok, so it included 0f089bbf43, which is probably the most important of
the 3 fixes that we need indeed. And, it's good that the above
difference is still visible afterwards, since it confirms that we're
looking at two distinct problems.

> Unfortunately I was too quick at installing the rebuilt 4.14.3-1 and I
> missed trying the vanilla Debian 4.14.2+25-gb6a8c4f72d-2 with
> Linux 4.19.181-1.  I believe this combination would have hung during
> reboot.

The Xen related breakage was introduced in 4.14.0+88-g1d1d1f5391-2, so
with that combination, I would expect you would experience both of the
bugs at the same time, yes.

> As such, I believe there are in fact two distinct bugs being observed.
> The presence of EITHER of these is sufficient to cause hangs during
> powerdown or reboot.
> 
> First, some patch originally from Linux's main branch breaks Xen reboots
> was backported somewhere between 4.19.181-1 and 4.19.194-3.  This may
> either have been introduced before 5.10 diverged from main, or may also
> have been backported to 5.10.  THIS is Debian bug #991967.
> 
> Second, the Xen patch 3c428e9ecb1f290689080c11e0c37b793425bef1 which is
> valuable to ARM devices breaks reboots and powerdowns on x86.  This is
> correctly fixed by 0f089bbf43ecce6f27576cb548ba4341d0ec46a8.  Presently
> this has no Debian bug report.

Correct. Thanks a lot for your help with hunting down and confirming this.

And now we have #994899 for it. So, I would like to kindly ask everyone
to stop hijacking this one, #991967, for discussing the Xen problem.

> The first is presently unidentified, someone enthusiastic either needs to
> read git logs/source code, or bisect and build to find where it got
> broken.
> 
> The second we seem to have a fix.  The only question is how many patches
> to cherry pick?  bc141e8ca562 is non-urgent as it is merely superficial
> and not needed for functionality.
> 5a4087004d1a is a workaround for Linux kernel breakage, but how likely
> are we to see that fixed in the Linux kernel packages?  The fix is
> well-contained and needed for some highly popular ARM devices.

Diederik also helped with testing changes, and when combining results,
the best thing we can do is pick the 4 changes that were initially
posted in Nov 2020 as "x86: ACPI and DMI table mapping fixes", and ended
up in Xen 4.15 as well.

---- >8 ----

commit 8b6d55c1261820bb9db8d867ce9ee77397d05203
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Nov 24 11:26:02 2020 +0100

    x86/ACPI: fix mapping of FACS

commit f390941a92f102ebbbbce1b54be206a602187fd7
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Nov 24 11:26:34 2020 +0100

    x86/DMI: fix table mapping when one lives above 1Mb

commit 0f089bbf43ecce6f27576cb548ba4341d0ec46a8
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Jan 5 13:09:55 2021 +0100

    x86/ACPI: fix S3 wakeup vector mapping

commit 16ca5b3f873f17f4fbdaecf46c133e1aa3d623b2
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Jan 5 13:11:04 2021 +0100

    x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be
determined

---- >8 ----

The 4th one is not explicitly tagged with Fixes: 1c4aa69ca1e1, but I
agree with Diederik that we should keep them all together.

I do not know if this is also the thing Chuck tested in the end, but I'm
a bit lost in the walls of text that were produced in these two bugs.

https://salsa.debian.org/xen-team/debian-xen/-/merge_requests/14

These fixes were actually posted before 4.14.0+88-g1d1d1f5391-2
happened. It's unfortunate that we did not notice it, since the above
could have been part of the package that was in the archive when Debian
11 released. Or if anyone owning the specific type of hardware had ran
into it during testing during the freeze, we could also have found them
in time. But yeah, that happens.

Diederik, I think we should omit the 5th one, since it's a cosmetics
commit, which also starts touching (older) code unrelated to this issue.

What I plan to do is include these as regression fixes in the next
package update. The issue is only affecting a subset of hardware types.
There's a workaround (pull the plug), the fixes are known. There is no
security risk, there is no data corruption or unexpected crashes during
normal operation.

Hans


Reply to: