[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Failing Hard Drive, or False Alarms?



On Wed, 10 Sept 2025 at 17:51, Bruce Halco <bruce@halcomp.com> wrote:

> A couple of weeks ago I upgraded to trixie.  Since then I've gotten
> a number of messages like

>    Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors
> and
>    Device: /dev/sda [SAT], 30 Offline uncorrectable sectors

>    These seem to come within a day or so of a reboot, but it hasn't been
>    long enough to know if that's a red herring.

> I ran "smartctl -t offline /dev/sda",

Hi, a careful reading of 'man smartctl' gives me low confidence in that
"-t offline" option. I would not trust the results of that.

I suggest you use instead
  smartctl -t long <yourdevice>
followed by
  smartctl -l selftest <yourdevice>
to see the results.

> and the eventual result of
> "smartctl -a /dev/sda" shows

[...]

> SMART Error Log Version: 1
> No Errors Logged

I do not trust that "No errors logged". It seems inconsistent when errors
are being reported elsewhere.

This message is an attempt to explore possible explanations for that
inconsistency, and make suggestions for how to clarify that.

> I admit I'm not a smartctl wizard, but to me it seems that smartctl is
> contradicting itself.  Can anyone help me out?

I am not an expert, but below I have cited a few selected quotes from 'man
smartctl' (the version in Bookworm) that cause me to have low trust in the
"-t offline" option.

Note that "SAT" in your output indicates "SCSI to ATA Translation", there
is a section about that in 'man smartctl' under the heading "ATA, SCSI
command sets and SAT". I dunno for sure, but that section gives me doubt as
to whether SCSI or ATA commands are in use.

Note that if SCSI commands are in use, then 'man smartctl' says about "-t
offline":

  offline - [SCSI] runs the default self test in foreground.
  No entry is placed in the self test log.

  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  Note the above is a possible explanation for "No errors logged".

'man smartctl' also says:

  Note that the SMART automatic offline test command is listed as
  "Obsolete" in every version of the ATA and ATA/ATAPI Specifications.

'man smartctl' also says:

  [ATA] Note that the ATA command SMART EXECUTE OFF-LINE IMMEDIATE (the
  command to start a test) was declared obsolete in ATA ACS-4 Revision 10
  (Nov 2015).

and that is what "-t offline" activates. It may be unwise to trust an
obsolete test.

'man smartctl' also says:
  The third category of testing (and the only category for which the word
  'testing'  is really an appropriate choice) is "self" testing.  This
  third type of test is only performed (immediately) when a command to
  run it is issued.  The '-t'  and  '-X'  options can  be  used to carry
  out and abort such self-tests; please see below for further details.

  Any errors detected in the self testing will be shown  in  the  SMART
  self-test log, which can be examined using the '-l selftest' option.

  Note: in this manual page, the word "Test" is used in connection with the
  second category just described, e.g. for the "offline" testing.  The
  words "Self-test"  are  used in connection with the third category.

So according to that, for effective tests it seems we need to prefer the
"third category", which are described as "self-test", and avoid the "second
category" which are described as "offline" and "test".

The "-t long" test is in the "third category", and the "-t offline" test
is in the "second category".

This is why I suggest at the top of this message to use "-t long" and not
"-t offline", and to inspect the result with "-l selftest".

I hope that would clarify that the test has actually run, and show its
results, which I would trust more than what you showed above.


Reply to: