Control: reassign -1 kernel
Hi,
Although as you said, it may be something else, as it's the kernel
logging this, I reassign to the kernel package, in the hope they are
more able to pinpoint where the problem lies.
Paul
On 24-08-2021 10:53, hoxp18@noramail.jp wrote:
> Package: upgrade-reports
> Severity: normal
>
> Dear maintainers,
>
> I've upgraded one of my buster 10.10 into bullseye 11.
>
> * Upgrading was mostly successfull.
>
> * As "buster" GNOME desktop it had almost no problem,
> and upgraded "bullseye" GNOME session seems fine for now.
>
> However, after rebooting the issue below stared on ttys.
>
> The machine has one NVMe and one SATA SSD (separated /home).
> The Entire system installed into LUKS2 encrypted LVM volumes.
> Mostly "main" packages except Intel microcode ("non-free").
>
> NVMe continual syslog messsage on terminal
> ==========================================
>
> 1. kernel reports nvme RxErr related messages.
> 2. on any tty (even in rescue.target).
> 3. almost always (I/O access on NVMe seems to be a trigger).
>
> AFAIK, the reported device (WD BLACK NVMe) works fine
> and "nvme smartlog" have not reported any error in years.
>
> After bullseye upgrading kernel started this reporting on ttys.
>
> At the same time, "smartd" seems to have a minor systemd unit problem,
> and it also seems to start tracking NVMe just like below syslog sample
> (at the smartd starting log section).
> "smartd" in buster only monitored SATA SSD.
>
> I can not tell what actual package and/or whether that device caused this,
> posting this bug report to "upgrade-reports".
>
> Actual syslog (partially modified; DATETIME/HOSTNAME/SERIAL_NUMBER)
> ---------------------------------------------------------------------
>
> 1. Cold booted.
> 2. systemctl shows no error.
> 3. target was shifted during this log sampling (graphical, rescue, multiuser, graphical).
>
> syslog and my comments follow:
>
> Aug 23 19:50:27 hostname kernel: [ 0.289634] pci 0000:01:00.0: [15b7:5002] type 00 class 0x010802
> Aug 23 19:50:27 hostname kernel: [ 0.289657] pci 0000:01:00.0: reg 0x10: [mem 0xdf000000-0xdf003fff 64bit]
> Aug 23 19:50:27 hostname kernel: [ 0.289691] pci 0000:01:00.0: reg 0x20: [mem 0xdf004000-0xdf0040ff 64bit]
> Aug 23 19:50:27 hostname kernel: [ 0.718254] pci 0000:01:00.0: Adding to iommu group 11
> Aug 23 19:50:27 hostname kernel: [ 1.020213] nvme nvme0: pci function 0000:01:00.0
> Aug 23 19:50:27 hostname kernel: [ 1.029047] nvme nvme0: 4/0/0 default/read/poll queues
> Aug 23 19:50:27 hostname kernel: [ 1.031074] nvme0n1: p1 p2 p3
>
> This NVMe contains EFI, /boot, and LUKS root filesystem except /home (in SATA SSD).
>
> Aug 23 19:50:27 hostname kernel: [ 15.626412] input: HDA Intel PCH HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input18
> Aug 23 19:50:27 hostname kernel: [ 15.626483] input: HDA Intel PCH HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input19
> Aug 23 19:50:27 hostname kernel: [ 15.626548] input: HDA Intel PCH HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input20
>
> The first NVMe RxErr reporting starts after the kernel dmesg (after sound subsystem).
>
> Aug 23 19:50:27 hostname kernel: [ 17.365057] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:50:27 hostname kernel: [ 17.365089] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:50:27 hostname kernel: [ 17.365117] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:50:27 hostname kernel: [ 17.365142] nvme 0000:01:00.0: [ 0] RxErr
>
> Then, on any tty (regardless the login status),
> it keep showing that messages like this.
> DATETIME was modified but kernel timing is the real log.
>
> Aug 23 19:51:16 hostname kernel: [ 70.416916] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:51:16 hostname kernel: [ 70.417029] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:51:16 hostname kernel: [ 70.417145] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:51:16 hostname kernel: [ 70.417268] nvme 0000:01:00.0: [ 0] RxErr
> Aug 23 19:51:17 hostname kernel: [ 71.693968] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:51:17 hostname kernel: [ 71.699967] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:51:17 hostname kernel: [ 71.701511] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:51:17 hostname kernel: [ 71.703040] nvme 0000:01:00.0: [ 0] RxErr
> Aug 23 19:51:19 hostname kernel: [ 73.739331] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:51:19 hostname kernel: [ 73.745023] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:51:19 hostname kernel: [ 73.746547] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:51:19 hostname kernel: [ 73.748053] nvme 0000:01:00.0: [ 0] RxErr
> Aug 23 19:51:20 hostname kernel: [ 74.506475] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:51:20 hostname kernel: [ 74.512092] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:51:20 hostname kernel: [ 74.513594] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:51:20 hostname kernel: [ 74.515037] nvme 0000:01:00.0: [ 0] RxErr
> Aug 23 19:51:22 hostname kernel: [ 76.552810] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:51:22 hostname kernel: [ 76.558081] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:51:22 hostname kernel: [ 76.559548] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:51:22 hostname kernel: [ 76.560989] nvme 0000:01:00.0: [ 0] RxErr
> Aug 23 19:51:24 hostname kernel: [ 78.599100] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
> Aug 23 19:51:24 hostname kernel: [ 78.604922] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Aug 23 19:51:24 hostname kernel: [ 78.606389] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
> Aug 23 19:51:24 hostname kernel: [ 78.607832] nvme 0000:01:00.0: [ 0] RxErr
>
> then smartd initialization follows,
>
> Aug 24 11:34:33 hostname smartd[1026]: smartd 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-17-amd64] (local build)
> Aug 24 11:34:33 hostname smartd[1026]: Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
> Aug 24 11:34:33 hostname smartd[1026]: Opened configuration file /etc/smartd.conf
> Aug 24 11:34:33 hostname smartd[1026]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
> Aug 24 11:34:33 hostname smartd[1026]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
> Aug 24 11:34:33 hostname smartd[1026]: Device: /dev/sda, type changed from 'scsi' to 'sat'
> Aug 24 11:34:33 hostname smartd[1026]: Device: /dev/sda [SAT], opened
>
> (minor issue) smartd.service (please see the log).
>
> Aug 23 19:23:18 hostname systemd[1]: /lib/systemd/system/smartd.service:9: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
> Aug 23 19:23:18 hostname systemd[1]: Reloading.
> Aug 23 19:23:18 hostname systemd[1]: /lib/systemd/system/smartd.service:9: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
> Aug 23 19:23:18 hostname systemd[1]: Starting Network Time Synchronization...
> Aug 23 19:23:18 hostname systemd[1]: Started Network Time Synchronization.
> Aug 23 19:23:18 hostname systemd[1]: Reached target System Time Set.
>
> FYI: buster smartd did not track NVMe device, only SATA SSD.
> bullseye smartd shows it monitor both.
> This might be a potential cause.
> (info: S/N: section is not the real one)
>
> Note that smartd 7.2 shows 4.19 kernel version,
> but this session is booted with bullseye 5.10
> (4.19 image was purged, and "apt autoremove --purge" had done).
>
> Aug 23 19:26:12 hostname smartd[28840]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-4.19.0-17-amd64] (local build)
> Aug 23 19:26:12 hostname smartd[28840]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> Aug 23 19:26:12 hostname smartd[28840]: Opened configuration file /etc/smartd.conf
> Aug 23 19:26:12 hostname smartd[28840]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
> Aug 23 19:26:12 hostname smartd[28840]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda, type changed from 'scsi' to 'sat'
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], opened
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], CT500MX500SSD1, S/N:SDASN_HEX, WWN:5-00a075-1e1e6e854, FW:M3CR023, 500 GB
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], found in smartd database: Crucial/Micron Client SSDs
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.CT500MX500SSD1-SDASN_HEX.ata.state
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, opened
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, WDS250G3X0C-00SJG0, S/N:NVME0SN_HEX, FW:102000WD, 250 GB
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
> Aug 23 19:26:12 hostname smartd[28840]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.CT500MX500SSD1-SDASN_HEX.ata.state
> Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.WDS250G3X0C_00SJG0-NVME0SN_HEX.nvme.state
> Aug 23 19:26:12 hostname systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
> Aug 23 19:26:15 hostname systemd[1]: Reloading.
>
> "lspci" is here (as bullseye output).
>
> I'm not sure about 01:00.1 NVMe product name is correct.
> Probably this one is WD Black NVMe 2nd gen. or 3rd gen.
>
> 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:591f] (rev 05)
> 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 [8086:5912] (rev 04)
> 00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
> 00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba]
> 00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282]
> 00:1b.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #17 [8086:a2e7] (rev f0)
> 00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #1 [8086:a290] (rev f0)
> 00:1c.6 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #7 [8086:a296] (rev f0)
> 00:1d.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #9 [8086:a298] (rev f0)
> 00:1f.0 ISA bridge [0601]: Intel Corporation 200 Series PCH LPC Controller (Z270) [8086:a2c5]
> 00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1]
> 00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
> 00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3]
> 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]
> 01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black 2018/SN750 / PC SN720 NVMe SSD [15b7:5002]
>
> Please tell me if there are something I can do for this.
>
> Thanks for all maintainers.
>
> Regards,
>
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature