Control: tag -1 moreinfo You wrote: > I've encountered an issue booting up Debian Testing with the latest > kernel. The system reports something about kernel panic caused by > aacraid module, then hangs completely during fsck step. Information > about the installed system is in the attached .txt file. Note that > hardware is different, because the SSD with Debian Testing has been > extracted from the server for investigation. > > Using the kernel parameters "pci=nocrs single" I was able to boot into > emergency mode and see the error messages related to kernel panic. > Photos attached. > > Pulling the RAID controller out of the PCI-E slot restores normal boot. [...] There are (at least) 2 bugs here: 1. Something is preventing aacraid and ehci-hcd from allocating DMA buffers: "ehci-pci 0000:00:1a.0: init 0000:00:1a.0 fail, -12" "aacraid: unable to create mapping." 2. aacraid then double-frees a chunk of memory while handling the failure, causing the panic: "kernel BUG at mm/slub.c:448!" I'm attaching a patch which should fix bug #2, which may help to get more information about bug #1. In principle you should be able to test this by following the instructions at <https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4> but currently the test-patches script has not been updated along with the package and will take a lot more time and space than it should. So instead I would suggest building a custom kernel based on the Debian configuration, following the instructions at <https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s-common-building> and applying this patch before you run "make clean". Let us know if you have any difficulty with this. If the system is able to boot with a patched kernel, please send the full kernel log. Ben. -- Ben Hutchings Knowledge is power. France is bacon.
From 6ef2851f75411b379868119e693ce63440dde869 Mon Sep 17 00:00:00 2001
From: Ben Hutchings <benh@debian.org>
Date: Wed, 10 Jul 2024 18:41:07 +0200
Subject: [PATCH] aacraid: Fix double-free on probe failure
aac_probe_one() calls hardware-specific init functions through the
aac_driver_ident::init pointer, all of which eventually call down to
aac_init_adapter().
If aac_init_adapter() fails after allocating memory for
aac_dev::queues, it frees the memory but does not clear that member.
After the hardware-specific init function returns an error,
aac_probe_one() goes down an error path that frees the memory pointed
to by aac_dev::queues, resulting.in a double-free.
Reported-by: Michael Gordon <m.gordon.zelenoborsky@gmail.com>
References: https://bugs.debian.org/1075855
Fixes: 8e0c5ebde82b ("[SCSI] aacraid: Newer adapter communication iterface support")
Signed-off-by: Ben Hutchings <benh@debian.org>
---
drivers/scsi/aacraid/comminit.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index bd99c5492b7d..0f64b0244303 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -642,6 +642,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
if (aac_comm_init(dev)<0){
kfree(dev->queues);
+ dev->queues = NULL;
return NULL;
}
/*
@@ -649,6 +650,7 @@ struct aac_dev *aac_init_adapter(struct aac_dev *dev)
*/
if (aac_fib_setup(dev) < 0) {
kfree(dev->queues);
+ dev->queues = NULL;
return NULL;
}
Attachment:
signature.asc
Description: This is a digitally signed message part