Bug#1103153: linux-image-6.1.0-33-amd64: kernel fails to boot in xen DomU with pci-passthrough
Control: forcemerge 1103153 -1
Hi
On Tue, Apr 15, 2025 at 12:13:44AM +1000, Cameron Davidson wrote:
> Package: src:linux
> Version: 6.1.133-1
> Severity: normal
>
> Dear Maintainer,
>
>
> I recently carried out an upgrade to a home server running Debian
> It is a VM host with 3 guest VMs.
> One guest is Debian 11, the Dom0 and 2 other guests are Debian 12.10.
> On April 13 I upgraded the various kernel packages on all 3 Debian 12
> systems to version 6.1.0-33-amd64 and rebooted.
> It also upgraded the usbip package to 2.0+6.1.133-1 (but it is not
> currently in use)
>
> On reboot, the Dom0 host system came up as expected, as did the
> Debian 11 guest. Neither VM guest running the new kernel came up,
> however it is possible that the mail gateway server failed because
> of a dependency on the other. (At that stage "up" was defined as
> responding to ssh login).
> Later tests showed that the mail gateway can boot either kernel 6.1.0-32
> or 6.1.0-32 with equally good results.
> The same cannot be said for the main guest VM. It consistently boots the
> old kernel OK, but invariably fails with 6.1.0-33.
> * I guess the source of the problem might be that I use PCI-passthrough
> to provide the non-working VM with the xHCI USB controller
> * after quite a few minutes it finishes the boot process without
> attaching a root file system. (So of course nothing works and there is
> no saved log) - It only "finishes" because everything times out.
> * I was able to log the terminal text to capture the boot messages,
> which I did for both working and non-working systems. Only the
> non-working version is included below.
> * Comparing the good and bad logs show only trivial differences such as
> timing and sizes until it reaches the point of starting systemd-udevd
> at (1.37s)
> * At that point you see "BUG", "Oops" and a stack trace.
>
> The part of the xen config file relating to PCI passthrough simply has
> pci = [
> '00:14.0'
> ]
>
>
> There is an initial error that appears in the host system and says:
> "libxl: error: libxl_pci.c:1573:libxl__device_pci_reset: The kernel
> doesn't support reset from sysfs for PCI device 0000:00:14.0"
> and has been present for a long time. It does not seem to have any
> negative consequence.
>
> The capture from the ssh session during the failed boot is as follows
I believe the same root cause as #1102889.
Merging the two bugs.
Regards,
Salvatore
Reply to: