Re: HDD problems that do not follow SMART results
On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote:
> I'm recurrently getting freezes because of HDD problems. During these
> freezes, that generally last until I shut down the computer, I get such
> messages:
>
> ==
> smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen,
> http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family: Maxtor DiamondMax Plus 9 family
> Device Model: Maxtor 6Y160M0
(...)
Do you hear any "clicking" sound coming from the hard disk?
Anyway, if my memory serves me well, that hard disk model has to be at
least 8 or more years...
> Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000030] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
> Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000035] ata6: SError: { UnrecovData Handshk }
> Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000038] ata6.00: failed command: WRITE DMA EXT
(...)
> After restarting, I got messages such as
>
> ==
> Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
> Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816031] ata4: SError: { UnrecovData Handshk }
> Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816035] ata4.00: failed command: WRITE DMA
> Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out
(...)
> and also
>
> ==
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572582] Descriptor sense data with sense descriptors (in hex):
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572584] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572592] 00 00 00 00
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572608] end_request: I/O error, dev sdc, sector 361216
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572613] Buffer I/O error on device sdc5, logical block 43136
> Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572615] lost page write due to I/O error on sdc5
(...)
> It looks like the HDD associated with sdc is encountering some issues.
And more specifically, "/dev/sdc5" partition.
> But is sdc linked to ata4 or ata6? Do these two problems (before and
> after restarting) are the same ones or not?
Yes, it seems there are two hard disks affected. Run:
dmesg | grep -i ata[0-6]
> After running several short and long tests with S.M.A.R.T. on each of my
> 3 HDDs, I got these results:
>
> 1) HDD associated with /dev/sda looks in some pre-failure state:
(...)
> SMART Error Log Version: 1
> Warning: ATA error count 454 inconsistent with error log pointer 5
I would run here the manufacturer's test disk but this one looks it's a bit
tired. You can keep monitoring the tagged "pre-fail" values and proceed with
a hard disk replacement as soon as these are quickly increased.
> 2) HDD associated with /dev/sdb verifies
(...)
> (this is the one that looks the healthiest, actually).
Agreed.
> 3) The HDD associated with /dev/sdc, which should be in some way broken
> (being given the messages that I wrote above from /var/log/syslog), does
> not look so through SMART:
(...)
Oh my... consider also to run the manufacturer's smart test utility for this
one... and make a full backup _now_.
> What can I deduce from this? It looks like /dev/sdc is broken but SMART
> tells /dev/sda would have more chance being on the verge to broke than
> /dev/sdc.
I can deduce that Maxtor hard disks are very old and would deserve for a
retirement, eventhough they are still up and (somehow) running.
> Note that I tried exchanging SATA cables, to no avail.
In your case there are logged errors regarding sectors and I/O errors and this
is dangerous.
Greetings,
--
Camaleón
Reply to: