[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#991967: #991967: Simply ACPI powerdown/reset issue?



On 9/19/2021 1:05 AM, Chuck Zmudzinski wrote:

Hello Elliott and Salvatore,

I noticed this bug on bullseye ever since I have been
running bullseye as a dom0, but my testing indicates
there is no problem with src:linux but the problem
appeared in src:xen with the 4.14 version of xen on
bullseye.

I ask Elliott if you are only seeing the problem on Debian's
xen-4.14 hypervisor? Also, which architecture, arm or
amd64? I only see the problem on the Debian xen-4.14
hypervisor, and I have only tested on amd64, and I
have found a fix for my amd64 system which is as
follows:

Motherboard: ASRock B85M Pro4, BIOS P2.50 12/11/2015,
with a Haswell CPU (core i5-4590S)

xen hypervisor version: 4.14.2+25-gb6a8c4f72d-2, amd64

linux kernel version: 5.10.46-4 (the current amd64 kernel
for bullseye)

Boot system: EFI, not using secure boot, booting xen
hypervisor and dom0 bullseye with grub-efi package for
bullseye, and it boots the xen-4.14-amd64.gz file, not
the xen-4.14-amd64.efi file.

I also tested a buster dom0 with the 4.19 series kernel
on the xen-4.14 hypervisor from bullseye and saw the
problem, but I did not see the problem with either
a buster (linux 4.19) or bullseye (linux 5.10) dom0 on
the xen-4.11 hypervisor, so I think the problem is
with the Debian version of the xen-4.14 hypervisor,
not with src:linux.

I also found a fix in src:xen:

I noticed the series of patches in debian/patches of the
4.14.2+25-gb6a8c4f72d-2 version of src:xen (and
earlier versions of xen-4.14 on Debian) have several patches
backported from the unstable branch of xen upstream. By
removing some of these patches from the patches
series of the src:xen package, the dom0 shuts down
as expected on my ASRock Haswell motherboard.

I rebuilt the src:xen package after removing the following
patches from the debian/patches series and the result
was that the computer shuts down as expected if I boot
using the patched hypervisor:

0027-xen-rpi4-implement-watchdog-based-reset.patch
0028-tools-python-Pass-linker-to-Python-build-process.patch
0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch
0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch
0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch
0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch
0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch

Most of these patches seem unrelated to the amd64
architecture and instead affect the arm architecture, and
removing all these patches is probably more than is needed to
fix this bug, but I removed them all because I could not find
them upstream on the 4.14 branch but instead only saw them
on the xen unstable branch upstream (I did not check if they are
on the 4.15 branch upstream), and I wanted to test
a true upstream 4.14 version without these seemingly
aggressive patches added by Debian from the unstable
branch of xen upstream, and I discovered by being
more conservative and not adding these patches from the
unstable branch upstream fixed the problem!

I suspect the following patch is the culprit for problems
shutting down on the amd64 architecture:

0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch

The commit log for this patch states:

From: Julien Grall <jgrall@amazon.com>
Date: Sat, 26 Sep 2020 17:44:29 +0100
Subject: xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
while the __acpi_os_{un,}map_memory() are meant to be arch-specific.

Currently, the former are still containing x86 specific code.

To avoid this rather strange split, the generic helpers are reworked so
they are arch-agnostic. This requires the introduction of a new helper
__acpi_os_unmap_memory() that will undo any mapping done by
__acpi_os_map_memory().

Currently, the arch-helper for unmap is basically a no-op so it only
returns whether the mapping was arch specific. But this will change
in the future.

Note that the x86 version of acpi_os_map_memory() was already able to
able the 1MB region. Hence why there is no addition of new code.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
(cherry picked from commit 1c4aa69ca1e1fad20b2158051eb152276d1eb973)
---------------------------------------------------

This patch does affect amd64 acpi code, and is probably causing
the problem on my amd64 system, so my build of the xen-4.14
hypervisor without this patch fixed the problem.

I think this bug should be re-classified as a bug in src:xen.

I also would inquire with the Debian Xen Team about why they
are backporting patches from the upstream xen unstable
branch into Debian's 4.14 package that is currently shipping
on Debian stable (bullseye). IMHO, the aforementioned
patches that are not in the stable 4.14 branch upstream
should not be included in the xen package for Debian stable.

Regards,

Chuck Zmudzinski

As a follow-up to my last comment on this bug, the
problems I see with my bullseye amd64 dom0 point to
problems with ACPI powerdown/reset issue, but only on
the Debian version of Xen-4.14. I do not see the problem
on any version of the linux kernel, neither on bare metal
nor on the Debian version of the Xen-4.11 hypervisor
from buster. For example, the problem manifests itself
on the Debian Xen-4.14 hypervisor with the Debian
dom0 reaching the systemd power off target but the
power does not actually turn off. Moreover, I can only
recover by manually resetting the computer by pressing
the physical reset button on the computer or removing
power by physically unplugging the computer.

One slight difference I see from what Elliott reported -
not only does the power supply remain powered after
shutdown, but also messages on the console about
powering down remain on the display monitor after
reaching the systemd power down target and power
to the display/monitor also persists.

For my amd64 system, this bug would be probably fixed
on Debian stable by having a separate Xen-4.14 package
for Debian stable that removes at least the following
patches from the debian/patches series of the current
Xen-4.14 package for stable:

0027-xen-rpi4-implement-watchdog-based-reset.patch
0029-xen-arm-acpi-Don-t-fail-if-SPCR-table-is-absent.patch
0030-xen-acpi-Rework-acpi_os_map_memory-and-acpi_os_unmap.patch
0031-xen-arm-acpi-The-fixmap-area-should-always-be-cleare.patch
0032-xen-arm-Check-if-the-platform-is-not-using-ACPI-befo.patch
0033-xen-arm-Introduce-fw_unreserved_regions-and-use-it.patch
0034-xen-arm-acpi-add-BAD_MADT_GICC_ENTRY-macro.patch
0035-xen-arm-traps-Don-t-panic-when-receiving-an-unknown-.patch

The 0028-tools-python-Pass-linker-to-Python-build-process.patch
is probably not related to this bug, but I have not verified that
the bug is fixed without removing that patch also. I would
defer to more knowledgeable people about the problems with
building Xen on Debian using various versions of python to decide
whether or not to remove the 0028-tools-python... patch.

I think perhaps the aforementioned patches to xen/arm and
xen/acpi would be suitable for testing a Debian Xen package
targeting bookworm/testing or sid/unstable, but not for
Debian bullseye/stable. As it is now, it appears the Debian Xen
Team is not making any distinction between stable, testing,
and unstable for its current Xen-4.14 package, and IMHO
that is the root cause of this bug on Debian stable.

If the Debian Xen Team wants to experiment with patches
from the unstable branch of upstream Xen on a Debian
version of Xen-4.14, I respectfully ask that it do so only on
bookworm/testing or unstable/sid and ship a separate
more conservative package for bullseye/stable that is
closer to the official upstream Xen 4.14.x version than
the package that is currently shipping on bullseye/stable.

Regards,

Chuck Zmudzinski


Reply to: