Bug#1076372: Re.: linux-image-6.5.0-0.deb12.4-amd64: ext4 file corruption with newer kernels
- To: Stefan <debian@simg.de>
- Cc: 1076372@bugs.debian.org
- Subject: Bug#1076372: Re.: linux-image-6.5.0-0.deb12.4-amd64: ext4 file corruption with newer kernels
- From: Salvatore Bonaccorso <carnil@debian.org>
- Date: Thu, 24 Oct 2024 06:01:20 +0200
- Message-id: <[🔎] ZxnGkOyBcz5aYdrF@eldamar.lan>
- Reply-to: Salvatore Bonaccorso <carnil@debian.org>, 1076372@bugs.debian.org
- In-reply-to: <180ae177-d321-4594-8688-b796df304888@simg.de>
- References: <172104075060.7102.3621600478475051128.reportbug@ws7> <bf152ceb-9e44-44e1-be5d-d5a12f5b22ab@simg.de> <172104075060.7102.3621600478475051128.reportbug@ws7> <ZrugTv7d5j8sZqzq@eldamar.lan> <172104075060.7102.3621600478475051128.reportbug@ws7> <64a82ea9-6b85-4863-b44a-ff71e5e9d4f8@simg.de> <172104075060.7102.3621600478475051128.reportbug@ws7> <ZvRdJOMpdiSpCeex@eldamar.lan> <ZxlYZemC1e5B01p-@eldamar.lan> <180ae177-d321-4594-8688-b796df304888@simg.de> <172104075060.7102.3621600478475051128.reportbug@ws7>
Control: reopen -1
On Wed, Oct 23, 2024 at 11:46:16PM +0200, Stefan wrote:
> Hi
>
> sorry, I already tested it last week, but did not found the time to
> report the results:
>
> I moved the Lexar NM790 NVMe to the 2nd M2 socket and installed a newly
> purchased SSD (Kingston FURY Renegade) in 1st M2 socket, see lcpci
> outputs below.
>
> I only tested two kernels:
>
> 6.1:
> * Lexar in 2nd M2 socket works
> * Kingston in 1st M2 socket generates read errors with the f3 test, i.e.
> if I run f3read multiple times, different files are damaged
> (* Lexar in 1st M2 socket works)
>
> 6.10:
> * Lexar in 2nd M2 socket works
> * Kingston in 1st M2 socket works.
> (* Lexar in 1st M2 socket generates write errors)
>
> Thus, the error(s) depend on kernel version and occur with two different
> NVMe's ...
>
> Regards Stefan
>
>
>
>
>
> root@ws7:~# lspci -vv -s 02:00
> 02:00.0 Non-Volatile memory controller: Kingston Technology Company,
> Inc. FURY Renegade NVMe SSD with heatsink (rev 01) (prog-if 02 [NVM
> Express])
> Subsystem: Kingston Technology Company, Inc. FURY Renegade NVMe SSD
> with heatsink
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 40
> IOMMU group: 15
> Region 0: Memory at f6d00000 (64-bit, non-prefetchable) [size=16K]
> Capabilities: [80] Express (v2) Endpoint, MSI 00
> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1
> unlimited
> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75W
> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
> MaxPayload 256 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
> LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 16GT/s, Width x4
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
> 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS- TPHComp- ExtTPHComp-
> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+
> 10BitTagReq- OBFF Disabled,
> AtomicOpsCtl: ReqEn-
> LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+
> 2Retimers+ DRS-
> LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
> ComplianceSOS-
> Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+
> EqualizationPhase1+
> EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
> Retimer- 2Retimers- CrosslinkRes: Upstream Port
> Capabilities: [d0] MSI-X: Enable+ Count=33 Masked-
> Vector table: BAR=0 offset=00002000
> PBA: BAR=0 offset=00003000
> Capabilities: [e0] MSI: Enable- Count=1/8 Maskable- 64bit+
> Address: 0000000000000000 Data: 0000
> Capabilities: [f8] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [100 v1] Latency Tolerance Reporting
> Max snoop latency: 1048576ns
> Max no snoop latency: 1048576ns
> Capabilities: [110 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=10us PortTPowerOnTime=220us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=32768ns
> L1SubCtl2: T_PwrOn=220us
> Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
> ARICap: MFVC- ACS-, Next Function: 0
> ARICtl: MFVC- ACS-, Function Group: 0
> Capabilities: [1e0 v1] Data Link Feature <?>
> Capabilities: [200 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
> MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap+
> ECRCChkEn-
> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 04080001 0000000f 02070000 0f5913d0
> Capabilities: [290 v1] Device Serial Number 00-00-00-00-00-00-00-00
> Capabilities: [2a0 v1] Power Budgeting <?>
> Capabilities: [300 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> LaneErrStat: 0
> Capabilities: [340 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [378 v1] Lane Margining at the Receiver <?>
> Kernel driver in use: nvme
> Kernel modules: nvme
>
> root@ws7:~# lspci -vv -s 03:00
> 03:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics
> Co., Ltd. Lexar NM790 NVME SSD (DRAM-less) (rev 01) (prog-if 02 [NVM
> Express])
> Subsystem: Shenzhen Longsys Electronics Co., Ltd. Lexar NM790 NVME SSD
> (DRAM-less)
> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 39
> IOMMU group: 16
> Region 0: Memory at f6c00000 (64-bit, non-prefetchable) [size=16K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit+
> Address: 0000000000000000 Data: 0000
> Masking: 00000000 Pending: 00000000
> Capabilities: [70] Express (v2) Endpoint, MSI 1f
> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1
> unlimited
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75W
> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
> MaxPayload 256 bytes, MaxReadReq 512 bytes
> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 16GT/s, Width x4
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
> 10BitTagComp+ 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> FRS- TPHComp- ExtTPHComp-
> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+
> 10BitTagReq- OBFF Disabled,
> AtomicOpsCtl: ReqEn-
> LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+
> 2Retimers+ DRS-
> LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance-
> ComplianceSOS-
> Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+
> EqualizationPhase1+
> EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
> Retimer- 2Retimers- CrosslinkRes: Upstream Port
> Capabilities: [b0] MSI-X: Enable+ Count=17 Masked-
> Vector table: BAR=0 offset=00003000
> PBA: BAR=0 offset=00002000
> Capabilities: [100 v2] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
> MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
> MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+
> ECRCChkEn-
> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 00000000 00000000 00000000 00000000
> Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
> Capabilities: [158 v1] Power Budgeting <?>
> Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
> ARICap: MFVC- ACS+, Next Function: 0
> ARICtl: MFVC- ACS-, Function Group: 0
> Capabilities: [178 v1] Secondary PCI Express
> LnkCtl3: LnkEquIntrruptEn- PerformEqu-
> LaneErrStat: 0
> Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [1bc v1] Lane Margining at the Receiver <?>
> Capabilities: [220 v1] Latency Tolerance Reporting
> Max snoop latency: 1048576ns
> Max no snoop latency: 1048576ns
> Capabilities: [228 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=10us PortTPowerOnTime=1000us
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> T_CommonMode=0us LTR1.2_Threshold=32768ns
> L1SubCtl2: T_PwrOn=1000us
> Capabilities: [238 v1] Vendor Specific Information: ID=0002 Rev=4
> Len=100 <?>
> Capabilities: [338 v1] Vendor Specific Information: ID=0001 Rev=1
> Len=038 <?>
> Capabilities: [370 v1] Data Link Feature <?>
> Kernel driver in use: nvme
> Kernel modules: nvme
Thanks!
In this case let's reopen the bug for now.
Regards,
Salvatore
Reply to: