[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Error Messages



Mick Ab wrote: 
> I run a desktop PC with Debian 11, Ryzen 5 5600x CPU and
> MSI-B550 A Pro motherboard.
> 
> Recently, Hardware error messages such as the following have
> appeared every few weeks :-
> 
> And around the time of the error messages on the 22nd May, the
> syslog extracts say:
> 
> May 22 23:11:51 piglit kernel: [1684438.730488] mce: [Hardware Error]:
> Machine check events logged
> May 22 23:11:51 piglit kernel: [1684438.730489] [Hardware Error]:
> Corrected error, no action required.
> May 22 23:11:51 piglit kernel: [1684438.730493] [Hardware Error]: CPU:0
> (19:21:0) MC13_STATUS[Over|CE|-|AddrV|-|-|-|Poison|Scrub]:
> 0xc5048b48cbb60f00
> May 22 23:11:51 piglit kernel: [1684438.730497] [Hardware Error]: Error
> Addr: 0x0000000000000000
> May 22 23:11:51 piglit kernel: [1684438.730498] [Hardware Error]: IPID:
> 0x0000000000000000
> May 22 23:11:51 piglit kernel: [1684438.730500] [Hardware Error]: Bank
> 13 is reserved.
> May 22 23:11:51 piglit kernel: [1684438.730500] [Hardware Error]:
> internal: RESV

As the message says, the error was corrected. It's typically an
error in the CPU cache or RAM, and correction probably meant
that it was detected as an error and then the request re-run
without an error. If this is not frequent, I would not worry.


> Then last Saturday, there appeared Data Error messages such as the
> following :-
> 
> May 27 13:24:29 piglit kernel: [2081199.553917] ata5.00: failed command:
> READ FPDMA QUEUED
> May 27 13:24:29 piglit kernel: [2081199.553919] ata5.00: cmd
> 60/90:b8:10:14:6c/00:00:03:00:00/40 tag 23 ncq dma 73728 in
> May 27 13:24:29 piglit kernel: [2081199.553919]          res
> 40/00:b8:10:14:6c/00:00:03:00:00/40 Emask 0x50 (ATA bus error)
> May 27 13:24:29 piglit kernel: [2081199.553920] ata5.00: status: { DRDY
> }
> May 27 13:24:29 piglit kernel: [2081199.553923] ata5: hard resetting
> link
> May 27 13:24:30 piglit kernel: [2081200.273917] ata5: SATA link down
> (SStatus 0 SControl 310)
> May 27 13:24:34 piglit kernel: [2081203.967459] ata5.00: configured for
> UDMA/33

This is a disk failure. Disk, cable or controller.


> More error messages and then :-
> 
> processes of 1 users.
> May 27 13:58:08 piglit kernel: [2083218.337088] EXT4-fs warning (device
> dm-0): ext4_dirblock_csum_verify:400: inode #53: comm opera: No space
> for directory leaf checksum. Please run e2fsck -D.
> May 27 13:58:08 piglit kernel: [2083218.337091] EXT4-fs error (device
> dm-0): __ext4_find_entry:1591: inode #53: comm opera: checksumming
> Message from syslogd@piglit at May 27 13:58:09 ...
>   kernel:[2083218.760570] EXT4-fs (dm-0): failed to convert unwritten
> extents to written extents -- potential data loss!  (inode 394119, error
> -30)

Make backups.

Run fsck.

Prepare to replace disks (and/or cables, but usually disks).

-dsr-


Reply to: