Re: smartd

To: debian-user@lists.debian.org
Subject: Re: smartd
From: Andy Smith <andy@strugglers.net>
Date: Sat, 22 Jan 2022 19:07:23 +0000
Message-id: <[🔎] 20220122190723.mjhf5sdao6ixq4t6@bitfolk.com>
In-reply-to: <[🔎] E1nBK1z-0000x0-AQ@joule>
References: <[🔎] E1nBK1z-0000x0-AQ@joule>

Hello,

On Sat, Jan 22, 2022 at 09:18:27AM -0800, peter@easthope.ca wrote:
> smartd reports to syslog.
> 
> Jan 22 08:49:17 joule smartd[563]: Device: /dev/sda [SAT], 155 Currently unreadable (pending) sectors
> Jan 22 08:49:17 joule smartd[563]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
> Jan 22 08:49:18 joule smartd[563]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
> Jan 22 08:49:18 joule smartd[563]: Device: /dev/sda [SAT], 132 Offline uncorrectable sectors
> 
> Two parts are available to mount /root; /root can be on /dev/sda1 or 
> /dev/sda2.

I don't understand what you mean by this statement. Either the disk
is already partitioned and / (you did mean "/", right, not "/root"?)
is on a known partition, or the disk isn't yet partitioned and / can
be on any partition you set it to be on.

> If the errors are clustered, the bad area might be avoided easily
> in partitioning.

You are better off finding the damaged sectors and causing the drive
to remap them by writing new content in there. Then you don't have
to keep track yourself of which sections of the disk are unusable.

> Feasible?  Can the locations of the errors be found?

Sure. Usually.

If the drive is currently not in use then it may be simpler to just
write over the entire drive with a simple

# dd if=/dev/zero of=/dev/sda

That should force a remap of any damaged sectors.

If you need to preserve what's currently on the drive then you can
use a SMART long self-test to try reading the whole drive. It should
report which LBA (sector) it got to when the test failed.

To start the test:

# smartctl -t long /dev/sda

To see the status of the test:

# smartctl -l selftest /dev/sda

You can instead do a "selective" test, to only test between certain
sector numbers.

Once you know the sector number you can verify that there's issues
by trying to read it with hdparm:

# hdparm --read-sector 9519790 /dev/sda

If that sector is truly damaged then this will show an error and
complaints in syslog.

You can force that sector to be written over with zeros, obviously
losing anything that was in it, again with hdparm:

# hdparm --yes-i-know-what-i-am-doing --write-sector 9519790 /dev/sda

This should force a remap and will complete successfully. If it
doesn't then the drive might be out of spare sectors, or is more
severely damaged, and it's done for.

If this drive is in use already then you possibly want to know which
files are affected by these bad sectors. I hope none, because you
use RAID. But if you need to know, I have done that before and can
dig out the scripts…

> Better to replace the drive?

Consumer HDDs usually have a few hundred spare sectors for
remapping. If I have a less important machine with a couple of bad
sectors I'll often be willing to force a remap like this. Seeing 155
bad sectors in a SMART report would worry me for any machine. But
it's your call.

Cheers,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

Reply to:

Follow-Ups:
- Re: smartd
  - From: peter@easthope.ca
- Re: smartd
  - From: peter@easthope.ca

References:
- smartd
  - From: peter@easthope.ca

Prev by Date: Re: Why is Debian not telling the truth about its security fixes?
Next by Date: Re: Why is Debian not telling the truth about its security fixes?
Previous by thread: Re: smartd
Next by thread: Re: smartd
Index(es):
- Date
- Thread