[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1021918: debian-installer: Kernel module blacklisting inconsistent



On 22/10/2022 at 03:39, Olaf Meeuwissen wrote:
Pascal Hambourg <pascal@plouf.fr.eu.org> writes:
On 17/10/2022 at 13:13, Olaf Meeuwissen wrote:
I recently tried this version with hardware that triggers loading of the
mt7921e kernel module.  Loading the module fails due to a firmware file
load error but the installer starts okay.  However, the installer later
crashes when probing for network hardware (when it tries to rmmod the
kernel module).

How does the installer crash exactly ? Kernel panic ? Freeze ? Error ?

Freeze.  Even after ten minutes the network hardware probe does not
complete.  FWIW, I have seen an error log as well but that may have
been with Devuan's preview installer for daedalus.

I could not reproduce this after tricking the installer into unloading and reloading the mt7921e module. The module unloads and reloads cleanly. But I do not have any hardware matching this module.

I've attached the installer's syslog.  /dev/sdc is the installer ISO.
The other disks, /dev/sda, /dev/sdb and /dev/nmve0n1 are the machine's
internal disks.  I just ran the installer after the machine was already
installed with the workaround I mentioned in the original bug report.

The error starts at

   Oct 19 23:06:13 check-missing-firmware: removing and loading kernel module mt7921e

Oct 19 23:06:13 kernel: [   40.024088] BUG: unable to handle page fault for address: 0000000000006500
Oct 19 23:06:13 kernel: [   40.024092] #PF: supervisor write access in kernel mode
Oct 19 23:06:13 kernel: [   40.024094] #PF: error_code(0x0002) - not-present page
(...)
Oct 19 23:06:13 kernel: [   40.024120] Call Trace:
Oct 19 23:06:13 kernel: [   40.024121]  <TASK>
Oct 19 23:06:13 kernel: [   40.024124]  __cancel_work_timer+0x3c/0x190
Oct 19 23:06:13 kernel: [   40.024128]  ? __kernfs_remove.part.0+0x190/0x2b0
Oct 19 23:06:13 kernel: [   40.024131]  mt7921_pci_remove+0x2c/0x110 [mt7921e]

It looks like a kernel bug when unloading this module. Can you trigger the bug in an installed system ? If yes it means that it not specific to the installer.

The first issue I ran into was that the documented[1] way to blacklist
kernel modules is no longer correct
   [1]:
https://www.debian.org/releases/testing/amd64/ch05s03.en.html#module-blacklist
Instead of
    mt7921e.blacklist=yes
I had to use
    modprobe.blacklist=mt7921e

/lib/debian-installer-startup.d/S02module-params has the following comment:

# Before udev is started, parse kernel command word for module params of
# the form module.param=value and register them so they will be used when
# modules are loaded. Also check for modules to be blacklisted.

But udev is actually started earlier, so the first method does not
work with modules included in initrd.gz (e.g. storage drivers).

In that case, shouldn't that be mentioned in the installation manual?
Actually, a single method that works for *all* modules, whether in the
initrd.gz or installed later is much preferred.

However it should work with network driver modules which are installed
much later.

You may want to double check how the kernel command parse results are
used then.

I did, and <module>.blacklist works as expected with NIC modules matching my hardware (iwlwifi and e1000e).

Or maybe the mt7921e module is in the initrd.gz?
Just checked, it is not.

Indeed, it is in the package nic-wireless-modules-<kernel-version>-di.

However, upon booting I saw a pile of ATA bus and I/O errors that made
me suspicious.  The disk is brand new and a smartmontools extended test
reports no errors.
I found a /etc/modprobe.d/blacklist.local.conf file with
    blacklist modprobe

This is a minor bug in
/lib/debian-installer-startup.d/S02module-params which can be easily
fixed. However, it should not have any actual impact as "modprobe"
does not match any kernel module name or alias.

Strange, because removing it made those ATA bus and I/O errors go away,
reproducibly at that.

Yes, really strange. I cannot explain nor reproduce it.

Seeing that the kernel boot argument is added correctly to the GRUB
configuration, there is no need to create a file in /etc/modprobe.d/.

It avoids cluttering the kernel command line with module parameters.

The blacklist.local.conf file is created as documented but using the
alternative syntax I had to use leads to the oxymoronic

   blacklist modprobe

entry, trying to tell modprobe to blacklist itself :-)

As I wrote, "modprobe" is not a module name so this should be a no-op.

You mentioned above that's a minor bug and easily fixed.  If so, then
please fix it.

I am not a Debian developer. The best I can do is submit a patch.

Later, network driver modules are installed and loaded.

Seeing that the module is not in initrd.gz, this is where it would be
loaded according to your understanding.  Does this step happen *before*
the installer screen appears?

No, it should happen at the "Detect network hardware" (or so) step.

If no, and network driver modules are installed and loaded
at the network hardware probe step of the installer then that does *not*
correspond to what I have seen.  That is to say, unless I blacklist the
module with modprobe.blacklist=mt7921e, I see piles or firmware loading
error fly by before the installer screen appears, asking me to select a
language.

Which installation image did you use ?


Reply to: