[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1021918: debian-installer: Kernel module blacklisting inconsistent



Sorry for the belated follow-up.

Pascal Hambourg <pascal@plouf.fr.eu.org> writes:

> On 22/10/2022 at 03:39, Olaf Meeuwissen wrote:
>> Pascal Hambourg <pascal@plouf.fr.eu.org> writes:
>>> On 17/10/2022 at 13:13, Olaf Meeuwissen wrote:
>>>> I recently tried this version with hardware that triggers loading of the
>>>> mt7921e kernel module.  Loading the module fails due to a firmware file
>>>> load error but the installer starts okay.  However, the installer later
>>>> crashes when probing for network hardware (when it tries to rmmod the
>>>> kernel module).
>>>
>>> How does the installer crash exactly ? Kernel panic ? Freeze ? Error ?
>> Freeze.  Even after ten minutes the network hardware probe does not
>> complete.  FWIW, I have seen an error log as well but that may have
>> been with Devuan's preview installer for daedalus.
>
> I could not reproduce this after tricking the installer into unloading
> and reloading the mt7921e module. The module unloads and reloads
> cleanly. But I do not have any hardware matching this module.

The freeze probably only occurs when an attempt to loadh the firmware
file is made.  Unlikely that will happen if the hardware is not found.
Anyway, this issue is not with this particular kernel module but with
the installer's inconsistent module blacklisting behaviour.

FWIW, I've since installed firmware-misc-nonfree and removed all the
blacklisting bits for the module.  WiFi works fine.

>> I've attached the installer's syslog.  /dev/sdc is the installer ISO.
>> The other disks, /dev/sda, /dev/sdb and /dev/nmve0n1 are the machine's
>> internal disks.  I just ran the installer after the machine was already
>> installed with the workaround I mentioned in the original bug report.
>> The error starts at
>>    Oct 19 23:06:13 check-missing-firmware: removing and loading
>> kernel module mt7921e
>
>> Oct 19 23:06:13 kernel: [   40.024088] BUG: unable to handle page fault for address: 0000000000006500
>> Oct 19 23:06:13 kernel: [   40.024092] #PF: supervisor write access in kernel mode
>> Oct 19 23:06:13 kernel: [   40.024094] #PF: error_code(0x0002) - not-present page
> (...)
>> Oct 19 23:06:13 kernel: [   40.024120] Call Trace:
>> Oct 19 23:06:13 kernel: [   40.024121]  <TASK>
>> Oct 19 23:06:13 kernel: [   40.024124]  __cancel_work_timer+0x3c/0x190
>> Oct 19 23:06:13 kernel: [   40.024128]  ? __kernfs_remove.part.0+0x190/0x2b0
>> Oct 19 23:06:13 kernel: [   40.024131]  mt7921_pci_remove+0x2c/0x110 [mt7921e]
>
> It looks like a kernel bug when unloading this module. Can you trigger
> the bug in an installed system ? If yes it means that it not specific
> to the installer.

Machine's at the office, will see if I can test tomorrow or the day after.

>>>> The first issue I ran into was that the documented[1] way to blacklist
>>>> kernel modules is no longer correct
>>>>    [1]:
>>>> https://www.debian.org/releases/testing/amd64/ch05s03.en.html#module-blacklist
>>>> Instead of
>>>>     mt7921e.blacklist=yes
>>>> I had to use
>>>>     modprobe.blacklist=mt7921e
>>>
>>> /lib/debian-installer-startup.d/S02module-params has the following comment:
>>>
>>> # Before udev is started, parse kernel command word for module params of
>>> # the form module.param=value and register them so they will be used when
>>> # modules are loaded. Also check for modules to be blacklisted.
>>>
>>> But udev is actually started earlier, so the first method does not
>>> work with modules included in initrd.gz (e.g. storage drivers).
>> In that case, shouldn't that be mentioned in the installation
>> manual?
>> Actually, a single method that works for *all* modules, whether in the
>> initrd.gz or installed later is much preferred.
>>
>>> However it should work with network driver modules which are installed
>>> much later.
>> You may want to double check how the kernel command parse results
>> are
>> used then.
>
> I did, and <module>.blacklist works as expected with NIC modules
> matching my hardware (iwlwifi and e1000e).

I meant double check by reading the source code, not by trying with some
rather commonly used modules.

>> Or maybe the mt7921e module is in the initrd.gz?
>> Just checked, it is not.
>
> Indeed, it is in the package nic-wireless-modules-<kernel-version>-di.
>
>>>> However, upon booting I saw a pile of ATA bus and I/O errors that made
>>>> me suspicious.  The disk is brand new and a smartmontools extended test
>>>> reports no errors.
>>>> I found a /etc/modprobe.d/blacklist.local.conf file with
>>>>     blacklist modprobe
>>>
>>> This is a minor bug in
>>> /lib/debian-installer-startup.d/S02module-params which can be easily
>>> fixed. However, it should not have any actual impact as "modprobe"
>>> does not match any kernel module name or alias.
>> Strange, because removing it made those ATA bus and I/O errors go
>> away,
>> reproducibly at that.
>
> Yes, really strange. I cannot explain nor reproduce it.
>
>>>> Seeing that the kernel boot argument is added correctly to the GRUB
>>>> configuration, there is no need to create a file in /etc/modprobe.d/.
>
> It avoids cluttering the kernel command line with module parameters.
>
>> The blacklist.local.conf file is created as documented but using the
>> alternative syntax I had to use leads to the oxymoronic
>>    blacklist modprobe
>> entry, trying to tell modprobe to blacklist itself :-)
>
> As I wrote, "modprobe" is not a module name so this should be a no-op.

It may be a no-op for modprobe but it's not a no-op for my brain ;-)
Actually, it's more like a WTF for my brain.

>> You mentioned above that's a minor bug and easily fixed.  If so, then
>> please fix it.
>
> I am not a Debian developer. The best I can do is submit a patch.

By all means, please do.  Thanks in advance.

>>> Later, network driver modules are installed and loaded.
>> Seeing that the module is not in initrd.gz, this is where it would
>> be
>> loaded according to your understanding.  Does this step happen *before*
>> the installer screen appears?
>
> No, it should happen at the "Detect network hardware" (or so) step.
>
>> If no, and network driver modules are installed and loaded
>> at the network hardware probe step of the installer then that does *not*
>> correspond to what I have seen.  That is to say, unless I blacklist the
>> module with modprobe.blacklist=mt7921e, I see piles or firmware loading
>> error fly by before the installer screen appears, asking me to select a
>> language.
>
> Which installation image did you use ?

The netinst one, according to the syslog I attached earlier.  Looking at
that log, I don't see any proof of the firmware loading error fly by
before the installer starts.  Weird.  I'll see if I can reproduce it.
Maybe it is a default vs. advanced install difference.  I do remember
using both while trying to get the blacklisting to work but I think I
captured that syslog using the default install.

Hope this helps,
--
Olaf Meeuwissen


Reply to: