[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1103153: linux-image-6.1.0-33-amd64: kernel fails to boot in xen DomU with pci-passthrough



Control: forcemerge 1103153 -1 

Hi

On Tue, Apr 15, 2025 at 12:13:44AM +1000, Cameron Davidson wrote:
> Package: src:linux
> Version: 6.1.133-1
> Severity: normal
> 
> Dear Maintainer,
> 
> 
> I recently carried out an upgrade to a home server running Debian 
> It is a VM host with 3 guest VMs.
> One guest is Debian 11, the Dom0 and 2 other guests are Debian 12.10.
> On April 13 I upgraded the various kernel packages on all 3 Debian 12
> systems to version 6.1.0-33-amd64 and rebooted.
> It also upgraded the usbip package to 2.0+6.1.133-1 (but it is not
> currently in use)
> 
> On reboot, the Dom0 host system came up as expected, as did the
> Debian 11 guest. Neither VM guest running the new kernel came up,
> however it is possible that the mail gateway server failed because
> of a dependency on the other. (At that stage "up" was defined as
> responding to ssh login).
> Later tests showed that the mail gateway can boot either kernel 6.1.0-32
> or 6.1.0-32 with equally good results.
> The same cannot be said for the main guest VM. It consistently boots the
> old kernel OK, but invariably fails with 6.1.0-33.
> * I guess the source of the problem might be that I use PCI-passthrough
>   to provide the non-working VM with the xHCI USB controller
> * after quite a few minutes it finishes the boot process without
>   attaching a root file system. (So of course nothing works and there is
>   no saved log) - It only "finishes" because everything times out.
> * I was able to log the terminal text to capture the boot messages,
>   which I did for both working and non-working systems. Only the
>   non-working version is included below.
> * Comparing the good and bad logs show only trivial differences such as 
>   timing and sizes until it reaches the point of starting systemd-udevd 
>   at (1.37s)
> * At that point you see "BUG", "Oops" and a stack trace.
> 
> The part of the xen config file relating to PCI passthrough simply has
> pci = [
> 	'00:14.0'
> 	]
> 
> 
> There is an initial error that appears in the host system and says:
>   "libxl: error: libxl_pci.c:1573:libxl__device_pci_reset: The kernel
>   doesn't support reset from sysfs for PCI device 0000:00:14.0"
> and has been present for a long time. It does not seem to have any
> negative consequence.
> 
> The capture from the ssh session during the failed boot is as follows

I believe the same root cause as #1102889.

Merging the two bugs.

Regards,
Salvatore


Reply to: