Bug#992855: upgrade-reports: buster to bullseye, nvme logs make ttys messy.
Package: upgrade-reports
Severity: normal
Dear maintainers,
I've upgraded one of my buster 10.10 into bullseye 11.
* Upgrading was mostly successfull.
* As "buster" GNOME desktop it had almost no problem,
and upgraded "bullseye" GNOME session seems fine for now.
However, after rebooting the issue below stared on ttys.
The machine has one NVMe and one SATA SSD (separated /home).
The Entire system installed into LUKS2 encrypted LVM volumes.
Mostly "main" packages except Intel microcode ("non-free").
NVMe continual syslog messsage on terminal
==========================================
1. kernel reports nvme RxErr related messages.
2. on any tty (even in rescue.target).
3. almost always (I/O access on NVMe seems to be a trigger).
AFAIK, the reported device (WD BLACK NVMe) works fine
and "nvme smartlog" have not reported any error in years.
After bullseye upgrading kernel started this reporting on ttys.
At the same time, "smartd" seems to have a minor systemd unit problem,
and it also seems to start tracking NVMe just like below syslog sample
(at the smartd starting log section).
"smartd" in buster only monitored SATA SSD.
I can not tell what actual package and/or whether that device caused this,
posting this bug report to "upgrade-reports".
Actual syslog (partially modified; DATETIME/HOSTNAME/SERIAL_NUMBER)
---------------------------------------------------------------------
1. Cold booted.
2. systemctl shows no error.
3. target was shifted during this log sampling (graphical, rescue, multiuser, graphical).
syslog and my comments follow:
Aug 23 19:50:27 hostname kernel: [ 0.289634] pci 0000:01:00.0: [15b7:5002] type 00 class 0x010802
Aug 23 19:50:27 hostname kernel: [ 0.289657] pci 0000:01:00.0: reg 0x10: [mem 0xdf000000-0xdf003fff 64bit]
Aug 23 19:50:27 hostname kernel: [ 0.289691] pci 0000:01:00.0: reg 0x20: [mem 0xdf004000-0xdf0040ff 64bit]
Aug 23 19:50:27 hostname kernel: [ 0.718254] pci 0000:01:00.0: Adding to iommu group 11
Aug 23 19:50:27 hostname kernel: [ 1.020213] nvme nvme0: pci function 0000:01:00.0
Aug 23 19:50:27 hostname kernel: [ 1.029047] nvme nvme0: 4/0/0 default/read/poll queues
Aug 23 19:50:27 hostname kernel: [ 1.031074] nvme0n1: p1 p2 p3
This NVMe contains EFI, /boot, and LUKS root filesystem except /home (in SATA SSD).
Aug 23 19:50:27 hostname kernel: [ 15.626412] input: HDA Intel PCH HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input18
Aug 23 19:50:27 hostname kernel: [ 15.626483] input: HDA Intel PCH HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input19
Aug 23 19:50:27 hostname kernel: [ 15.626548] input: HDA Intel PCH HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1f.3/sound/card0/input20
The first NVMe RxErr reporting starts after the kernel dmesg (after sound subsystem).
Aug 23 19:50:27 hostname kernel: [ 17.365057] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:50:27 hostname kernel: [ 17.365089] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:50:27 hostname kernel: [ 17.365117] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:50:27 hostname kernel: [ 17.365142] nvme 0000:01:00.0: [ 0] RxErr
Then, on any tty (regardless the login status),
it keep showing that messages like this.
DATETIME was modified but kernel timing is the real log.
Aug 23 19:51:16 hostname kernel: [ 70.416916] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:51:16 hostname kernel: [ 70.417029] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:51:16 hostname kernel: [ 70.417145] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:51:16 hostname kernel: [ 70.417268] nvme 0000:01:00.0: [ 0] RxErr
Aug 23 19:51:17 hostname kernel: [ 71.693968] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:51:17 hostname kernel: [ 71.699967] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:51:17 hostname kernel: [ 71.701511] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:51:17 hostname kernel: [ 71.703040] nvme 0000:01:00.0: [ 0] RxErr
Aug 23 19:51:19 hostname kernel: [ 73.739331] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:51:19 hostname kernel: [ 73.745023] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:51:19 hostname kernel: [ 73.746547] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:51:19 hostname kernel: [ 73.748053] nvme 0000:01:00.0: [ 0] RxErr
Aug 23 19:51:20 hostname kernel: [ 74.506475] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:51:20 hostname kernel: [ 74.512092] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:51:20 hostname kernel: [ 74.513594] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:51:20 hostname kernel: [ 74.515037] nvme 0000:01:00.0: [ 0] RxErr
Aug 23 19:51:22 hostname kernel: [ 76.552810] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:51:22 hostname kernel: [ 76.558081] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:51:22 hostname kernel: [ 76.559548] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:51:22 hostname kernel: [ 76.560989] nvme 0000:01:00.0: [ 0] RxErr
Aug 23 19:51:24 hostname kernel: [ 78.599100] pcieport 0000:00:1b.0: AER: Corrected error received: 0000:01:00.0
Aug 23 19:51:24 hostname kernel: [ 78.604922] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
Aug 23 19:51:24 hostname kernel: [ 78.606389] nvme 0000:01:00.0: device [15b7:5002] error status/mask=00000001/0000e000
Aug 23 19:51:24 hostname kernel: [ 78.607832] nvme 0000:01:00.0: [ 0] RxErr
then smartd initialization follows,
Aug 24 11:34:33 hostname smartd[1026]: smartd 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-17-amd64] (local build)
Aug 24 11:34:33 hostname smartd[1026]: Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
Aug 24 11:34:33 hostname smartd[1026]: Opened configuration file /etc/smartd.conf
Aug 24 11:34:33 hostname smartd[1026]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
Aug 24 11:34:33 hostname smartd[1026]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Aug 24 11:34:33 hostname smartd[1026]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Aug 24 11:34:33 hostname smartd[1026]: Device: /dev/sda [SAT], opened
(minor issue) smartd.service (please see the log).
Aug 23 19:23:18 hostname systemd[1]: /lib/systemd/system/smartd.service:9: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
Aug 23 19:23:18 hostname systemd[1]: Reloading.
Aug 23 19:23:18 hostname systemd[1]: /lib/systemd/system/smartd.service:9: Standard output type syslog is obsolete, automatically updating to journal. Please update your unit file, and consider removing the setting altogether.
Aug 23 19:23:18 hostname systemd[1]: Starting Network Time Synchronization...
Aug 23 19:23:18 hostname systemd[1]: Started Network Time Synchronization.
Aug 23 19:23:18 hostname systemd[1]: Reached target System Time Set.
FYI: buster smartd did not track NVMe device, only SATA SSD.
bullseye smartd shows it monitor both.
This might be a potential cause.
(info: S/N: section is not the real one)
Note that smartd 7.2 shows 4.19 kernel version,
but this session is booted with bullseye 5.10
(4.19 image was purged, and "apt autoremove --purge" had done).
Aug 23 19:26:12 hostname smartd[28840]: smartd 7.2 2020-12-30 r5155 [x86_64-linux-4.19.0-17-amd64] (local build)
Aug 23 19:26:12 hostname smartd[28840]: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Aug 23 19:26:12 hostname smartd[28840]: Opened configuration file /etc/smartd.conf
Aug 23 19:26:12 hostname smartd[28840]: Drive: DEVICESCAN, implied '-a' Directive on line 21 of file /etc/smartd.conf
Aug 23 19:26:12 hostname smartd[28840]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], opened
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], CT500MX500SSD1, S/N:SDASN_HEX, WWN:5-00a075-1e1e6e854, FW:M3CR023, 500 GB
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], found in smartd database: Crucial/Micron Client SSDs
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], state read from /var/lib/smartmontools/smartd.CT500MX500SSD1-SDASN_HEX.ata.state
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, opened
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, WDS250G3X0C-00SJG0, S/N:NVME0SN_HEX, FW:102000WD, 250 GB
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, is SMART capable. Adding to "monitor" list.
Aug 23 19:26:12 hostname smartd[28840]: Monitoring 1 ATA/SATA, 0 SCSI/SAS and 1 NVMe devices
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.CT500MX500SSD1-SDASN_HEX.ata.state
Aug 23 19:26:12 hostname smartd[28840]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.WDS250G3X0C_00SJG0-NVME0SN_HEX.nvme.state
Aug 23 19:26:12 hostname systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
Aug 23 19:26:15 hostname systemd[1]: Reloading.
"lspci" is here (as bullseye output).
I'm not sure about 01:00.1 NVMe product name is correct.
Probably this one is WD Black NVMe 2nd gen. or 3rd gen.
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:591f] (rev 05)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 630 [8086:5912] (rev 04)
00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af]
00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba]
00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282]
00:1b.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #17 [8086:a2e7] (rev f0)
00:1c.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #1 [8086:a290] (rev f0)
00:1c.6 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #7 [8086:a296] (rev f0)
00:1d.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #9 [8086:a298] (rev f0)
00:1f.0 ISA bridge [0601]: Intel Corporation 200 Series PCH LPC Controller (Z270) [8086:a2c5]
00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1]
00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3]
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]
01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black 2018/SN750 / PC SN720 NVMe SSD [15b7:5002]
Please tell me if there are something I can do for this.
Thanks for all maintainers.
Regards,
Reply to: