[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1098261: linux: kernel panic on boot with certain large NVME configurations



Hi Noah,

On Tue, Feb 18, 2025 at 08:32:25AM -0500, Noah Meyerhans wrote:
> Source: linux
> Version: 5.10.223-1
> Severity: important
> Tags: upstream patch
> X-Debbugs-Cc: debian-cloud@lists.debian.org, jaboutboul@microsoft.com
> 
> Microsoft has observed that the 5.10.y kernels in bullseye are susceptible
> to crashes due to race conditions in the NVME/PCI subsystem.  See below for
> a representative kernel log.  The problem appears most frequently in larger
> systems, e.g. with 4 or more NVME devices and >= 64 CPUs, but it could
> potentially occur on smaller systems as well.
> 
> The issue was fixed with the 5.14 kernel upstream in e4b9852a0 ("nvme-pci:
> fix multiple races in nvme_setup_io_queues"), so this only impacts
> oldstable.  I have provided a backport of this commit upstream in
> https://lore.kernel.org/stable/E1tj8vO-00471h-2H@lore/
> 
> I'm requesting that this commit be included in a bullseye kernel update.

AFAICS, this backport has not been accepted back then for 5.10.y. Can
you re-ping upstream to make sure it get included in the 5.10.y
series? Once this has happened as we follow the 5.10.y series it will
be included (or can be included in advance once it has been queued).

Regards,
Salvatore


Reply to: