[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: using ddrescue on the root partition - boot with / as read-only



On 2023-09-14 22:24:59 -0700, David Christensen wrote:
> On 9/14/23 03:17, Vincent Lefevre wrote:
> > I get UNC errors like
> > 
> > 2023-09-10T11:50:59.858670+0200 zira kernel: ata1.00: exception Emask 0x0 SAct 0xc00 SErr 0x40000 action 0x0
> > 2023-09-10T11:51:00.117366+0200 zira kernel: ata1.00: irq_stat 0x40000008
> > 2023-09-10T11:51:00.117431+0200 zira kernel: ata1: SError: { CommWake }
> > 2023-09-10T11:51:00.117474+0200 zira kernel: ata1.00: failed command: READ FPDMA QUEUED
> > 2023-09-10T11:51:00.117511+0200 zira kernel: ata1.00: cmd 60/00:50:b8:12:c5/02:00:1f:00:00/40 tag 10 ncq dma 262144 in
> >                                                        res 41/40:00:90:13:c5/00:02:1f:00:00/00 Emask 0x409 (media error) <F>
> > 2023-09-10T11:51:00.117537+0200 zira kernel: ata1.00: status: { DRDY ERR }
> > 2023-09-10T11:51:00.117560+0200 zira kernel: ata1.00: error: { UNC }
> > 2023-09-10T11:51:00.117583+0200 zira kernel: ata1.00: supports DRM functions and may not be fully accessible
> > 2023-09-10T11:51:00.117614+0200 zira kernel: ata1.00: supports DRM functions and may not be fully accessible
> > 2023-09-10T11:51:00.117651+0200 zira kernel: ata1.00: configured for UDMA/133
> > 2023-09-10T11:51:00.117681+0200 zira kernel: sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> > 2023-09-10T11:51:00.117953+0200 zira kernel: sd 0:0:0:0: [sda] tag#10 Sense Key : Medium Error [current]
> > 2023-09-10T11:51:00.118165+0200 zira kernel: sd 0:0:0:0: [sda] tag#10 Add. Sense: Unrecovered read error - auto reallocate failed
> > 2023-09-10T11:51:00.118366+0200 zira kernel: sd 0:0:0:0: [sda] tag#10 CDB: Read(10) 28 00 1f c5 12 b8 00 02 00 00
> > 2023-09-10T11:51:00.118557+0200 zira kernel: I/O error, dev sda, sector 533009296 op 0x0:(READ) flags 0x80700 phys_seg 37 prio class 2
> > 2023-09-10T11:51:00.118582+0200 zira kernel: ata1: EH complete
> > 2023-09-10T11:51:00.118608+0200 zira kernel: ata1.00: Enabling discard_zeroes_data
> 
> What is the make and model of the laptop?

HP ZBook 15 G2 (2015)

> What is the make and model of the disk drive?

Samsung 870 EVO 1TB SATA (since January 2022)

> When and where do you see the above error messages?

It seems that this occurs when bad sectors are read, either when some
files (using these bad sectors) are read or when I use the badblocks
utility (until now, I've used it only with the read test, i.e. with
no options). The messages appear in the journalctl output.

> > and after these errors, the kernel remount the root partition as
> > read-only.
> 
> That sounds like a reasonable boot loader response to an OS drive error
> during boot.

There are no errors during boot. Only when I read the affected files
or use badblocks, but only after some given number of errors.

> > Due to these errors, some files are unreadable.
> > 
> > badblocks says that there are 25252 bad blocks.
> > 
> > I'm using ddrescue before doing anything else (mainly in case things
> > would go worse), but I would essentially be interested in knowing
> > which files are affected.
> 
> Was the computer working correctly in the past?

Yes, except a few days before the first disk errors on 6 December 2022:
I got crashes from time to time (which never happened before). About
2 hours before the first errors, I upgraded the kernel and the NVIDIA
drivers from 390.154 to 390.157. In the changelog of 390.157-1:

nvidia-graphics-drivers-legacy-390xx (390.157-1) unstable; urgency=medium

  * New upstream legacy branch release 390.157 (2022-11-22).
    * Fixed CVE-2022-34670, CVE-2022-34674, CVE-2022-34675, CVE-2022-34677,
      CVE-2022-34680, CVE-2022-42257, CVE-2022-42258, CVE-2022-42259.
      https://nvidia.custhelp.com/app/answers/detail/a_id/5415
      (Closes: #1025281)
    * Improved compatibility with recent Linux kernels.

  [ Andreas Beckmann ]
  * Refresh patches.
  * Rename the internally used ARCH variable which might clash on externally
    set values.
  * Use substitutions for ${nvidia-kernel} and friends (510.108.03-1).
  * Try to compile a kernel module at package build time (510.108.03-1).

 -- Andreas Beckmann <anbe@debian.org>  Sat, 03 Dec 2022 22:17:01 +0100

I'm wondering whether the crashes were due to the compatibility
with the kernel (which was the latest Debian/unstable one).

> When did you first notice the error messages?  What was the computer doing
> at the time?

I first got errors on 6 December 2022 when I was reading these files.
At that time, I identified 5 files, which I put in a
private/unreadable-files directory. Then everything was OK
until a few days ago, when I wanted to duplicate a big directory
(to try to reproduce a bug).

> Did you make any changes to the computer (hardware, software, configuration,
> apps, other) immediately prior to the start of the error messages?

See above (and no hardware change).

> Does the computer now generate error messages?  Consistently?  What is it
> doing when the error messages are generated?

I get errors only when I read some particular files.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Reply to: