Re: HDD problems that do not follow SMART results

To: debian-user@lists.debian.org
Subject: Re: HDD problems that do not follow SMART results
From: Camaleón <noelamac@gmail.com>
Date: Tue, 28 Aug 2012 17:24:02 +0000 (UTC)
Message-id: <[🔎] k1iuri$ik9$21@ger.gmane.org>
References: <[🔎] 87ligz5chm.fsf@merciadriluca-station.MERCIADRILUCA>

On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote:

> I'm recurrently getting freezes because of HDD problems. During these
> freezes, that generally last until I shut down the computer, I get such
> messages:
> 
> ==
> smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen,
> http://smartmontools.sourceforge.net
> 
> === START OF INFORMATION SECTION ===
> Model Family:     Maxtor DiamondMax Plus 9 family 
> Device Model:     Maxtor 6Y160M0

(...)

Do you hear any "clicking" sound coming from the hard disk?

Anyway, if my memory serves me well, that hard disk model has to be at 
least 8 or more years...


> Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000030] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen 
> Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000035] ata6: SError: { UnrecovData Handshk } 
> Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000038] ata6.00: failed command: WRITE DMA EXT 

(...)


> After restarting, I got messages such as
> 
> ==
> Aug 28 11:01:35 merciadriluca-station kernel: [  233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen 
> Aug 28 11:01:35 merciadriluca-station kernel: [  233.816031] ata4: SError: { UnrecovData Handshk } 
> Aug 28 11:01:35 merciadriluca-station kernel: [  233.816035] ata4.00: failed command: WRITE DMA 
> Aug 28 11:01:35 merciadriluca-station kernel: [  233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out 

(...)

> and also
> 
> ==
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572582] Descriptor sense data with sense descriptors (in hex): 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572584]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572592]         00 00 00 00 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572608] end_request: I/O error, dev sdc, sector 361216 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572613] Buffer I/O error on device sdc5, logical block 43136 
> Aug 28 11:04:49 merciadriluca-station kernel: [  427.572615] lost page write due to I/O error on sdc5 

(...)

> It looks like the HDD associated with sdc is encountering some issues.

And more specifically, "/dev/sdc5" partition.

> But is sdc linked to ata4 or ata6? Do these two problems (before and
> after restarting) are the same ones or not?

Yes, it seems there are two hard disks affected. Run:

dmesg | grep -i ata[0-6]

> After running several short and long tests with S.M.A.R.T. on each of my
> 3 HDDs, I got these results:
> 
> 1) HDD associated with /dev/sda looks in some pre-failure state:

(...)

> SMART Error Log Version: 1
> Warning: ATA error count 454 inconsistent with error log pointer 5

I would run here the manufacturer's test disk but this one looks it's a bit 
tired. You can keep monitoring the tagged "pre-fail" values and proceed with 
a hard disk replacement as soon as these are quickly increased.

> 2) HDD associated with /dev/sdb verifies

(...)

> (this is the one that looks the healthiest, actually).

Agreed.
 
> 3) The HDD associated with /dev/sdc, which should be in some way broken
> (being given the messages that I wrote above from /var/log/syslog), does
> not look so through SMART:

(...)

Oh my... consider also to run the manufacturer's smart test utility for this 
one... and make a full backup _now_.

> What can I deduce from this? It looks like /dev/sdc is broken but SMART
> tells /dev/sda would have more chance being on the verge to broke than
> /dev/sdc.

I can deduce that Maxtor hard disks are very old and would deserve for a 
retirement, eventhough they are still up and (somehow) running.

> Note that I tried exchanging SATA cables, to no avail.

In your case there are logged errors regarding sectors and I/O errors and this 
is dangerous.

Greetings,

-- 
Camaleón

Reply to:

Follow-Ups:
- Re: HDD problems that do not follow SMART results
  - From: hvw59601 <hvw59601@care2.com>

References:
- HDD problems that do not follow SMART results
  - From: Merciadri Luca <Luca.Merciadri@student.ulg.ac.be>

Prev by Date: Re: [SOLVED] Is my processor 32-bit or 64-bit?
Next by Date: Re: Setting up to do repetitive installs on ONE machine (cf BabelBox)
Previous by thread: HDD problems that do not follow SMART results
Next by thread: Re: HDD problems that do not follow SMART results
Index(es):
- Date
- Thread