Yes, I also believe it's all caused by the xhci_hcd issue.
You can see in that report that the card worked ok with the newest vanilla kernel. But I don't know what fixed it yet andI have not had time to do any furher investigation.
FYI, I won't update this thread anymore, but if anyone is interested in contributing ideas, information, etc, please would you do it on the issue report itself?
Thanks for all the help!
Flacusbigotis <flacusbigotis@gmail.com> writes:
> The kernel logs indicating issues in Bullseye include a warning of a "host
> failure" by xhci_hcd, and several write/read errors by the ax88179 ethernet
> driver/module for the card, as follows:
>
> Feb 22 17:22:53 server1 kernel: [ 1.380198] xhci_hcd 0000:1c:00.0: xHCI
> Host Controller
> Feb 22 17:22:53 server1 kernel: [ 1.380205] xhci_hcd 0000:1c:00.0: new
> USB bus registered, assigned bus number 5
> Feb 22 17:22:53 server1 kernel: [ 1.380209] xhci_hcd 0000:1c:00.0: Host
> supports USB 3.0 SuperSpeed
> Feb 22 17:22:53 server1 kernel: [ 1.380260] usb usb5: New USB device
> found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10
> Feb 22 17:22:53 server1 kernel: [ 1.380261] usb usb5: New USB device
> strings: Mfr=3, Product=2, SerialNumber=1
> Feb 22 17:22:53 server1 kernel: [ 1.380263] usb usb5: Product: xHCI Host
> Controller
> Feb 22 17:22:53 server1 kernel: [ 1.380264] usb usb5: Manufacturer:
> Linux 5.10.0-11-amd64 xhci-hcd
> Feb 22 17:22:53 server1 kernel: [ 1.380265] usb usb5: SerialNumber:
> 0000:1c:00.0
> Feb 22 17:22:53 server1 kernel: [ 1.380396] hub 5-0:1.0: USB hub found
> Feb 22 17:22:53 server1 kernel: [ 1.380411] hub 5-0:1.0: 4 ports detected
> Feb 22 17:22:53 server1 kernel: [ 5.508457] ax88179_178a 5-1:1.0 eth0:
> register 'ax88179_178a' at usb-0000:1c:00.0-1, ASIX AX88179 USB 3.0 Gigabit
> Ethernet, 00:11:22:33:44:55
> Feb 22 17:23:25 server1 kernel: [ 39.576966] xhci_hcd 0000:1c:00.0:
> WARNING: Host System Error
> Feb 22 17:26:00 server1 kernel: [ 194.596335] ax88179_178a 5-1:1.0
> enx001122334455: Failed to read reg index 0x0002: -22
I am guessing that the random mac address is a symptom caused by a
failure to read the permanent mac from the USB ethernet
controller. Which again probably is caused by one or more of these read
errors.
But I believe those are only symptoms, and that the real error is that
unspecified "Host System Error".
I wonder is this could be related to some of the quirks that have been
added for this xhci controller since v4.19? There have been a few since
the VL805 is used in the RPi4. Some of these might very well be
misunderstood and RPI related only. There is also an odd code path in
drivers/usb/host/pci-quirks.c where we select a different path on RPi
than on other systems because "things are taken care of by the board's
co-processor". I find that very suspiscious.
And I must admit that my interest in this bug is because I'm worried
that the quirk I recently pushed could have unexpected side effects...
I have no clue.
but the most likely cause is some power managenment issue. Test
disabling ASPM e.g. by adding pcie_aspm=off to the kernel command line.
Or disabling USB autosuspend, e.g by adding usbcore.autosuspend=-1 to the
kernel command line.
I do NOT suggest that you run with those settings by default. Only
testing to try to narrow down the problem.
It would also be intersting to know if removing the XHCI_LPM_SUPPORT
quirk would make a difference, since this was added to the VL805 between
v4.19 and v5.10 without anyone really knowing if it works.. But I can't
figure out how to disable a device specific quirk like that without
patching the kernel. Anyone?
Bjørn