[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1060706: linux-image-6.1.0-17-amd64: intel i225 NIC loses PCIe link, network becomes unusable



Control: tags -1 + moreinfo

On Sat, Jan 13, 2024 at 11:45:29AM +0100, Arno Lehmann wrote:
> Package: src:linux
> Version: 6.1.69-1
> Severity: normal
> Tags: upstream
> 
> Dear Maintainer,
> 
> 
> just having the computer run for a while, the network loses connection because
> the NIC detached from PCIe. I suspect this is related to power management but
> am not really sure.
> 
> As this seemed to be a known problem, I added pcie_aspm=off to the kernel
> command line.
> 
> The problem happens more or less randomly, the computer is usually running 24/7:
> 
> # journalctl --grep 'PCIe link lost' --quiet | cat
> Sep 20 14:21:17 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 06 05:44:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 07 16:39:10 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
> Okt 23 18:31:25 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 30 11:16:06 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Okt 31 13:50:06 Zwerg kernel: igc 0000:0a:00.0 (unnamed net_device) (uninitialized): PCIe link lost, device now detached
> Nov 22 18:59:11 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Nov 23 15:45:49 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Dez 19 07:33:02 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 01 09:57:40 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 10 16:15:20 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> 
> 
> This is what I find in the kernel or system log:
> 
> Jan 13 11:16:31 Zwerg kernel: igc 0000:0a:00.0 eno1: PCIe link lost, device now detached
> Jan 13 11:16:31 Zwerg kernel: ------------[ cut here ]------------
> Jan 13 11:16:31 Zwerg kernel: igc: Failed to read reg 0xc030!
> Jan 13 11:16:31 Zwerg kernel: WARNING: CPU: 18 PID: 6389 at drivers/net/ethernet/intel/igc/igc_main.c:6482 igc_rd32+0x91/0xa0 [igc]
> Jan 13 11:16:31 Zwerg kernel: Modules linked in: rfcomm cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative nfsv3 nfs_acl rpcs>
> Jan 13 11:16:31 Zwerg kernel:  configfs efivarfs ip_tables x_tables autofs4 xfs libcrc32c crc32c_generic dm_crypt dm_mod hid_generic amdgpu crc32_pc>
> Jan 13 11:16:31 Zwerg kernel: CPU: 18 PID: 6389 Comm: kworker/18:1 Not tainted 6.1.0-17-amd64 #1  Debian 6.1.69-1
> Jan 13 11:16:31 Zwerg kernel: Hardware name: ASUS System Product Name/ROG STRIX X670E-A GAMING WIFI, BIOS 1410 04/28/2023
> Jan 13 11:16:31 Zwerg kernel: Workqueue: events igc_watchdog_task [igc]
> Jan 13 11:16:31 Zwerg kernel: RIP: 0010:igc_rd32+0x91/0xa0 [igc]
> Jan 13 11:16:31 Zwerg kernel: Code: 48 c7 c6 d0 55 56 c0 e8 0b 7d 6c f8 48 8b bd 28 ff ff ff e8 31 c7 23 f8 84 c0 74 b4 89 de 48 c7 c7 f8 55 56 c0 e>
> Jan 13 11:16:31 Zwerg kernel: RSP: 0018:ffffac56d5f13df0 EFLAGS: 00010286
> Jan 13 11:16:31 Zwerg kernel: RAX: 0000000000000000 RBX: 000000000000c030 RCX: 0000000000000027
> Jan 13 11:16:31 Zwerg kernel: RDX: ffffa046f85a03a8 RSI: 0000000000000001 RDI: ffffa046f85a03a0
> Jan 13 11:16:31 Zwerg kernel: RBP: ffffa03f45710c28 R08: 0000000000000000 R09: ffffac56d5f13c68
> Jan 13 11:16:31 Zwerg kernel: R10: 0000000000000003 R11: ffffa04717f7ffe8 R12: ffffa03f45710000
> Jan 13 11:16:31 Zwerg kernel: R13: 0000000000000000 R14: ffffa03f456efd40 R15: 000000000000c030
> Jan 13 11:16:31 Zwerg kernel: FS:  0000000000000000(0000) GS:ffffa046f8580000(0000) knlGS:0000000000000000
> Jan 13 11:16:31 Zwerg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 13 11:16:31 Zwerg kernel: CR2: 00007f1fc894f000 CR3: 00000008a8538000 CR4: 0000000000750ee0
> Jan 13 11:16:31 Zwerg kernel: PKRU: 55555554
> Jan 13 11:16:31 Zwerg kernel: Call Trace:
> Jan 13 11:16:31 Zwerg kernel:  <TASK>
> 
> 
> Obviously, the kernel parameter to disable PCIe power management was not solving this problem.
> 
> The way to recover is to restart the computer.

Just to be clear, can you confirm this is or is not a regression from
a previous running 6.1.y kernel? I'm asking because I suspect that
this similar to
https://lore.kernel.org/intel-wired-lan/20221031170535.77be0eb5@kernel.org/
and did not ever worked reliably with your hardware?

Regards,
Salvatore


Reply to: