[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1086638: linux-image-6.11.5: usbguard-daemon invalid opcode: 0000, usb's not usable



I thought kernel 6.11.5 with nvidia open kernel dkms was ok, but finally i got kernel RIP message at poweroff. 
Log: journal_2024-11-13_02:31:50.log, kernel.6.11.5.lsmod.log

Finally I booted with kernel 6.10.12 to test the nvidia open kernel modules 545 and found errors related to the installation. Aptitude installed only for linux-image-6.11.5.
find /lib/modules/ -name '*nvidia*'
```
/lib/modules/6.10.12-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/6.10.12-amd64/kernel/drivers/usb/typec/altmodes/typec_nvidia.ko.xz
/lib/modules/6.10.12-amd64/kernel/drivers/platform/x86/nvidia-wmi-ec-backlight.ko.xz
/lib/modules/6.10.11-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/6.10.11-amd64/kernel/drivers/usb/typec/altmodes/typec_nvidia.ko.xz
/lib/modules/6.10.11-amd64/kernel/drivers/platform/x86/nvidia-wmi-ec-backlight.ko.xz
/lib/modules/6.1.0-23-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/6.1.0-23-amd64/kernel/drivers/usb/typec/altmodes/typec_nvidia.ko
/lib/modules/6.1.0-23-amd64/kernel/drivers/platform/x86/nvidia-wmi-ec-backlight.ko
/lib/modules/6.11.5-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/6.11.5-amd64/kernel/drivers/usb/typec/altmodes/typec_nvidia.ko.xz
/lib/modules/6.11.5-amd64/kernel/drivers/platform/x86/nvidia-wmi-ec-backlight.ko.xz
/lib/modules/6.11.5-amd64/updates/dkms/nvidia-current-open.ko.xz
/lib/modules/6.11.5-amd64/updates/dkms/nvidia-current-open-uvm.ko.xz
/lib/modules/6.11.5-amd64/updates/dkms/nvidia-current-open-peermem.ko.xz
/lib/modules/6.11.5-amd64/updates/dkms/nvidia-current-open-drm.ko.xz
/lib/modules/6.11.5-amd64/updates/dkms/nvidia-current-open-modeset.ko.xz
/lib/modules/6.1.0-26-amd64/kernel/drivers/net/ethernet/nvidia
/lib/modules/6.1.0-26-amd64/kernel/drivers/usb/typec/altmodes/typec_nvidia.ko
/lib/modules/6.1.0-26-amd64/kernel/drivers/platform/x86/nvidia-wmi-ec-backlight.ko
```

systemctl --failed
```
  UNIT                         LOAD   ACTIVE SUB    DESCRIPTION              
● nvidia-persistenced.service  loaded failed failed NVIDIA Persistence Daemon
● systemd-modules-load.service loaded failed failed Load Kernel Modules
```
Log: journal_2024-11-13_08:59:00.log

Sorry for the verbosity, but I'm trying to provide as much context as I can to help in the analisis. 
Let me know if i must create a debian bug report for the buggy nvidia-open-kernel-dkms 545 package ot it is going to be internally handled.

Thank you guys for your time.
Regards

On Wed, Nov 13, 2024 at 2:24 AM Matias Fritz <fritzmatias@gmail.com> wrote:
1) 
I backed up /etc/usbguard/rules.conf and removed some usbs.
Same for /etc/udev/rules.d/*
restarted successfully usbguard service via 'systemctl restart usbguard' (I forget do the same with bluetooth service) and immediately i got 'usbguard list-devices' expected behaviour (unresponsive). At running 'poweroff' as root, the system did not answer properly. So a forced power off was done.
Log: journal_2024-11-12_23:41:55.log

2)
At booting up again with kernel 6.11.5 the systemctl status was running. But `usbguard list-devices` failed as expected and again the poweroff command didn't work.After a second call to poweroff. "Call to PowerOff failed: Action poweroff already in progress, refusing requested poweroff operation.". The system was kind of workable (is was able to type this lines, and store it as draft on an open chrome browser). 
Log: journal_2024-11-13_00:02:47.log

So I will consider step 1 was successfully accomplish.
3) checking my guess, not ok. Adding back only the udev rules was not enough. Again forced power off required. Log: journal_2024-11-13_00:29:03.log
5) I remove all the packages related to nvidia-driver-full, and again falied & forced power off . Log: dpkgWithoutNvidia journal_2024-11-13_00:50:29.log
 After a clean boot without nvidia driver and running 'usbguard list-devices' it behave as expected , unresponsive, and at poweroff i got 
DMAR: DRHD: handling fault status reg 3.
DMAR: [DMA Read NO_PASID] Request device [01:00.0] fault addr 0xfe001000 [fault reason 0x06] PTE Read access is not set
Log: journal_2024-11-13_00:57:02.log

Surprisingly my 2rd external monitor connected via usb-C showed a white screen. (I had expected to be completely black).
As part of my validation, i booted with kernel 6.10.12 and runed 'usbguard list-devices' without issues. But i got the same DMAR errors on device 01:00.0, i'm sure that is the address of Nvidia.
poweroff worked as a charm.
Log: journal_2024-11-13_01:10:46.log 

A clean boot, without USB-C monitor connected with kernel 6.11.5  die at poweroff.
Log: journal_2024-11-13_01:16:07.log

6)
Installed back nvidia-driver-full 545, and after clean boot without usb-c monitor with nvidia-open-kernel-dkms. usbguard list-devices & usbguard allow-device worked properly.
Poweroff worked, but kernel crash at some point after closing the journal.
But, Some kernel errors have been shown:
usb 3-14: device descriptor read/64, error -110 (mt7921e)
usb 3-14: device not accepting address 7, error -62 (mt7921e)
usb usb3-port14: unable to enumerate USB device (mt7921e)
NVRM rmapiAllocWithSecInfo: RMAPI_GPU_LOCK_INTERNAL alloc requested without holding the RMAPI lock
Log: journal_2024-11-13_01:44:41.log

after a boot with USB-C monitor connected, the mt7921e was falling heavily blocking the system intermittently, and some soft lockup happened.
Log: journal_2024-11-13_02:01:06.log

7) finally i added the udev/rules/* and usbguard rules to make the System to work, and booted with all my monitors.
i runned usbguard list-devices & poweroff without issues.
Except now, some kernel error messages related to nvidia open kernel driver. 
Logs: journal_2024-11-13_02:10:00.log

Hope the data I send is good and clear enough.
Regards

Attachment: data1.tar.bz2
Description: BZip2 compressed data


Reply to: