[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: smartd



Hello,

On Sat, Jan 22, 2022 at 09:16:53PM -0800, peter@easthope.ca wrote:
>     From: Andy Smith <andy@strugglers.net>
>     Date: Sat, 22 Jan 2022 19:07:23 +0000
> > You are better off finding the damaged sectors and causing the drive
> > to remap them by writing new content in there. Then you don't have
> > to keep track yourself of which sections of the disk are unusable.
> 
> I don't understand how bad sectors are "remapped".  The process is 
> internal to the drive?

Yes. When a drive sector goes bad, the drive cannot read from it, so
you get an error in Linux when a read is attempted.

But if you are *writing* to it, if a modern drive can't do the write
it just writes the data to a spare sector and remaps that sector
location to the location of the formerly spare one.

The operating system is unaware that this has happened, though it is
recorded in SMART attributes (the reallocated sector count).

So overwriting bad sectors will make the problem go away until there
are no more spare sectors.

> Depends on Linux software?

No, anything that can write to the drive will work, which is why I
suggested dd over the whole drive if you aren't currently using it.

hdparm makes it easy to write a specific sector but it's also
possible with dd and its "skip" and "count" arguments. If you are
careful.

> What about connecting the drive to another system and applying
> fsck to each part?

What would be the goal? A SMART long self-test should tell you which
bits are unreadable.

> > Consumer HDDs usually have a few hundred spare sectors for
> > remapping.
> 
> What happens when all spare sectors are allocated?

The next time a sector goes bad it would not be fixable by writing
to it and there would be a part of the drive that is permanently
unusable. In the old days the "badblocks" tool would be used to find
these areas and avoid their use. These days we let drives remap bad
areas and replace either pro-actively or when they can't remap any
more.

Drives often encounter severe problems before they get as far as
using all their spare sectors. They send so many errors up to Linux
that Linux disconnects the whole device.

> Any indication to prevent silent loss of data?

When a sector goes bad, whatever data that was in there is now lost.
Since you cannot prevent drives from failing, appropriate
countermeasures include:

- Introducing redundancy with RAID or filesystems that have it built
  in, like btrfs or zfs

- Having good backups

Both are generally considered a good idea. With redundancy no data
would be lost and a tedious recovery process involving your backups
is turned into a more mundane process of replacing a failed drive.

You also need to monitor both of those to make sure they are
functioning properly.

Cheers,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting


Reply to: