[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HDD problems that do not follow SMART results



Camaleón wrote:
On Tue, 28 Aug 2012 16:15:33 +0200, Merciadri Luca wrote:

I'm recurrently getting freezes because of HDD problems. During these
freezes, that generally last until I shut down the computer, I get such
messages:

==
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen,
http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Maxtor DiamondMax Plus 9 family Device Model: Maxtor 6Y160M0

(...)

Do you hear any "clicking" sound coming from the hard disk?

Anyway, if my memory serves me well, that hard disk model has to be at least 8 or more years...


Good memory. I just replaced a Model 6Y080P0 of that family with a SSD830. I can't find when I installed that disc. Must be about 8 years ago. And never anything wrong per smartctl.

Hugo


Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000030] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000035] ata6: SError: { UnrecovData Handshk } Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000038] ata6.00: failed command: WRITE DMA EXT

(...)


After restarting, I got messages such as

==
Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816031] ata4: SError: { UnrecovData Handshk } Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816035] ata4.00: failed command: WRITE DMA Aug 28 11:01:35 merciadriluca-station kernel: [ 233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out

(...)

and also

==
Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572582] Descriptor sense data with sense descriptors (in hex): Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572584] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572592] 00 00 00 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572608] end_request: I/O error, dev sdc, sector 361216 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572613] Buffer I/O error on device sdc5, logical block 43136 Aug 28 11:04:49 merciadriluca-station kernel: [ 427.572615] lost page write due to I/O error on sdc5

(...)

It looks like the HDD associated with sdc is encountering some issues.

And more specifically, "/dev/sdc5" partition.

But is sdc linked to ata4 or ata6? Do these two problems (before and
after restarting) are the same ones or not?

Yes, it seems there are two hard disks affected. Run:

dmesg | grep -i ata[0-6]

After running several short and long tests with S.M.A.R.T. on each of my
3 HDDs, I got these results:

1) HDD associated with /dev/sda looks in some pre-failure state:

(...)

SMART Error Log Version: 1
Warning: ATA error count 454 inconsistent with error log pointer 5

I would run here the manufacturer's test disk but this one looks it's a bit tired. You can keep monitoring the tagged "pre-fail" values and proceed with a hard disk replacement as soon as these are quickly increased.

2) HDD associated with /dev/sdb verifies

(...)

(this is the one that looks the healthiest, actually).

Agreed.
3) The HDD associated with /dev/sdc, which should be in some way broken
(being given the messages that I wrote above from /var/log/syslog), does
not look so through SMART:

(...)

Oh my... consider also to run the manufacturer's smart test utility for this one... and make a full backup _now_.

What can I deduce from this? It looks like /dev/sdc is broken but SMART
tells /dev/sda would have more chance being on the verge to broke than
/dev/sdc.

I can deduce that Maxtor hard disks are very old and would deserve for a retirement, eventhough they are still up and (somehow) running.

Note that I tried exchanging SATA cables, to no avail.

In your case there are logged errors regarding sectors and I/O errors and this is dangerous.

Greetings,



Reply to: