Re: fsck error on boot: /dev/sda1: UNEXPECTED INCONSISTENCY and Partition 1 does not start on physical sector boundary
Hello,
On Fri, Mar 19, 2021 at 10:36:37PM +0500, Alexander V. Makartsev wrote:
> Personally, I don't think it is wise to throw away any HDD as soon as it
> gets a few pending bad blocks for whatever reason.
It really depends upon your risk stance.
At home, on my home fileserver, it has RAID, it has backups, so if a
HDD sees a few remapped sectors I'm not going to throw the HDD out.
When it starts seeing many many increasing numbers of remapped
sectors then yes it's being replaced. But indeed it can be many
years between picking up a few remapped sectors and complete
meltdown.
https://gist.github.com/grifferz/64808f61079fe610c6f21f03ac7fd1aa
$ sudo ./blkleaderboard.sh
sdd 100418 hours (11.45 years) 0.29TiB ST3320620AS
sdb 95783 hours (10.92 years) 0.29TiB ST3320620AS
sda 94252 hours (10.75 years) 0.29TiB ST3320620AS
sdi 66276 hours ( 7.56 years) 0.45TiB ST500DM002-1BD14
sdk 55418 hours ( 6.32 years) 2.73TiB WDC WD30EZRX-00D
sdh 44511 hours ( 5.07 years) 0.91TiB Hitachi HUA72201
sde 24239 hours ( 2.76 years) 0.91TiB SanDisk SDSSDH31
sdc 17672 hours ( 2.01 years) 0.29TiB ST3320418AS
sdf 7252 hours ( 0.82 years) 1.82TiB Samsung SSD 860
sdj 7130 hours ( 0.81 years) 1.75TiB KINGSTON SUV5001
sdg 1560 hours ( 0.17 years) 1.75TiB KINGSTON SUV5001
I've replaced some drives in the last 2 years and those ones, once
they started gaining reallocated sectors they didn't survive long
even though I gave them the chance. Hence the three replacements in
the last 2 years. sdc and sdd are hanging on:
$ for d in /dev/sd?; do echo -n "$d: "; sudo smartctl -A $d | grep '^ 5'; done
/dev/sda: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
/dev/sdb: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
/dev/sdc: 5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 151
/dev/sdd: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 5
/dev/sde: 5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0
/dev/sdf: 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
/dev/sdg: 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
/dev/sdh: 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
/dev/sdi: 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
/dev/sdj: 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
/dev/sdk: 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
At work, where's it's other people's data on the line, drives get
replaced soon as they show any defect like that, as when it does
escalate it tends to do so very quickly.
My own risk stance doesn't even permit running without redundancy
(unless inherently impossible due to the machine in question not
supporting that), because once you encounter Offline_Uncorrectable
in normal daily use it means that without redundancy, data loss has
occurred.
The drive couldn't read one or more of its sectors. If it's just
file data you can get it from backup but if, like OP here, it's
filesystem metadata then your actual filesystem is damaged and needs
fsck. And if unluckier still, whole filesystem can be broken. I'd
really rather not have to spend time on fixing that sort of thing.
> Even brand new drives are shipped with information about factory remapped
> sectors in special section inside their firmware, to cover up platter
> imperfections.
That's true, and to some extent with the densities in use today all
reading from drive is probabilistic and corrected by checksums. But
when they arrive like that they are supposed to be in a stable
state, without such errors increasing, so when they do start to
appear it is a cause for serious concern.
> This is why performing regular backups and validating them is better, I mean
> you do it all anyway, than replacing drives as soon as they get a few bad
> sectors.
I would say the two strategies are orthogonal because backups and
self-tests are advisable for everyone. Once a drive gets some
Offline_Uncorrectable the data is gone from it; backups and
self-tests didn't stop that from happening, they just helped you
recover from it (backups) or spot it early by testing even unused
areas of the drive (self-tests).
Anyway in OP's position, they have lost data which they need to
restore and while they could wait and see if the errors are
increasing in number they probably just want to get it replaced
ASAP.
Cheers,
Andy
--
https://bitfolk.com/ -- No-nonsense VPS hosting
Reply to: