HDD problems that do not follow SMART results
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I'm recurrently getting freezes because of HDD problems. During these
freezes, that generally last until I shut down the computer, I get such
messages:
==
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family:     Maxtor DiamondMax Plus 9 family
Device Model:     Maxtor 6Y160M0
Serial Number:    Y44NQSTE
Firmware Version: YAR51HW0
User Capacity:    163,928,604,672 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Tue Aug 28 16:09:09 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000030] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000035] ata6: SError: { UnrecovData Handshk }
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000038] ata6.00: failed command: WRITE DMA EXT
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000044] ata6.00: cmd 35/00:80:00:4f:f5/00:01:12:00:00/e0 tag 0 dma 196608 out
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000046]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000049] ata6.00: status: { DRDY }
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.000056] ata6: hard resetting link
Aug 28 10:21:39 merciadriluca-station kernel: [ 2160.476042] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.597999] ata6.00: configured for UDMA/133
Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.598003] ata6.00: device reported invalid CHS sector 0
Aug 28 10:21:40 merciadriluca-station kernel: [ 2160.598008] ata6: EH complete
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965242] ata6.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965247] ata6: SError: { UnrecovData Handshk }
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965251] ata6.00: failed command: WRITE DMA EXT
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965257] ata6.00: cmd 35/00:80:00:4f:f5/00:01:12:00:00/e0 tag 0 dma 196608 out
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965258]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965261] ata6.00: status: { DRDY }
Aug 28 10:22:10 merciadriluca-station kernel: [ 2190.965269] ata6: hard resetting link
Aug 28 10:22:10 merciadriluca-station kernel: [ 2191.440043] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546566] ata6.00: configured for UDMA/133
Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546571] ata6.00: device reported invalid CHS sector 0
Aug 28 10:22:11 merciadriluca-station kernel: [ 2191.546578] ata6: EH complete
==
After restarting, I got messages such as
==
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816026] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816031] ata4: SError: { UnrecovData Handshk }
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816035] ata4.00: failed command: WRITE DMA
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816040] ata4.00: cmd ca/00:90:08:71:05/00:00:00:00:00/e0 tag 0 dma 73728 out
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816042]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816045] ata4.00: status: { DRDY }
Aug 28 11:01:35 merciadriluca-station kernel: [  233.816053] ata4: hard resetting link
Aug 28 11:01:35 merciadriluca-station kernel: [  234.292041] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 28 11:01:35 merciadriluca-station kernel: [  234.411821] ata4.00: configured for UDMA/133
Aug 28 11:01:35 merciadriluca-station kernel: [  234.411826] ata4.00: device reported invalid CHS sector 0
Aug 28 11:01:35 merciadriluca-station kernel: [  234.411831] ata4: EH complete
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780026] ata4: limiting SATA link speed to 1.5 Gbps
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780030] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780034] ata4: SError: { UnrecovData Handshk }
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780038] ata4.00: failed command: WRITE DMA EXT
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780044] ata4.00: cmd 35/00:90:00:83:05/00:03:00:00:00/e0 tag 0 dma 466944 out
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780045]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780048] ata4.00: status: { DRDY }
Aug 28 11:02:14 merciadriluca-station kernel: [  272.780056] ata4: hard resetting link
Aug 28 11:02:14 merciadriluca-station kernel: [  273.256538] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 28 11:02:14 merciadriluca-station kernel: [  273.382089] ata4.00: configured for UDMA/133
Aug 28 11:02:14 merciadriluca-station kernel: [  273.382093] ata4.00: device reported invalid CHS sector 0
Aug 28 11:02:14 merciadriluca-station kernel: [  273.382098] ata4: EH complete
Aug 28 11:02:44 merciadriluca-station kernel: [  303.380023] ata4.00: exception Emask 0x10 SAct 0x0 SErr 0x400101 action 0x6 frozen
Aug 28 11:02:44 merciadriluca-station kernel: [  303.380028] ata4: SError: { RecovData UnrecovData Handshk }
Aug 28 11:02:44 merciadriluca-station kernel: [  303.380032] ata4.00: failed command: WRITE DMA EXT
Aug 28 11:02:44 merciadriluca-station kernel: [  303.380038] ata4.00: cmd 35/00:90:00:83:05/00:03:00:00:00/e0 tag 0 dma 466944 out
Aug 28 11:02:44 merciadriluca-station kernel: [  303.380039]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Aug 28 11:02:44 merciadriluca-station kernel: [  303.380042] ata4.00: status: { DRDY }
==
and also
==
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572574] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572578] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572582] Descriptor sense data with sense descriptors (in hex):
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572584]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572592]         00 00 00 00 
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572596] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572600] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 00 05 83 00 00 03 90 00
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572608] end_request: I/O error, dev sdc, sector 361216
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572613] Buffer I/O error on device sdc5, logical block 43136
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572615] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572622] Buffer I/O error on device sdc5, logical block 43137
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572625] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572629] Buffer I/O error on device sdc5, logical block 43138
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572631] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572636] Buffer I/O error on device sdc5, logical block 43139
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572638] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572642] Buffer I/O error on device sdc5, logical block 43140
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572644] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572648] Buffer I/O error on device sdc5, logical block 43141
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572651] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572655] Buffer I/O error on device sdc5, logical block 43142
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572657] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572661] Buffer I/O error on device sdc5, logical block 43143
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572663] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572667] Buffer I/O error on device sdc5, logical block 43144
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572669] lost page write due to I/O error on sdc5
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572674] Buffer I/O error on device sdc5, logical block 43145
Aug 28 11:04:49 merciadriluca-station kernel: [  427.572676] lost page write due to I/O error on sdc5
==
It looks like the HDD associated with sdc is encountering some
issues. But is sdc linked to ata4 or ata6? Do these two problems (before
and after restarting) are the same ones or not?
After running several short and long tests with S.M.A.R.T. on each of my
3 HDDs, I got these results:
1) HDD associated with /dev/sda looks in some pre-failure state:
==
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   203   202   063    Pre-fail  Always       -       19440
  4 Start_Stop_Count        0x0032   252   252   000    Old_age   Always       -       3294
  5 Reallocated_Sector_Ct   0x0033   252   252   063    Pre-fail  Always       -       17
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   252   237   187    Pre-fail  Always       -       46578
  9 Power_On_Minutes        0x0032   172   172   000    Old_age   Always       -       1007h+24m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   245   245   000    Old_age   Always       -       3314
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       56
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       8324
196 Reallocated_Event_Count 0x0008   238   238   000    Old_age   Offline      -       15
197 Current_Pending_Sector  0x0008   252   252   000    Old_age   Offline      -       15
198 Offline_Uncorrectable   0x0008   237   001   000    Old_age   Offline      -       16
199 UDMA_CRC_Error_Count    0x0008   195   194   000    Old_age   Offline      -       5
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0
202 Data_Address_Mark_Errs  0x000a   253   226   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       8
204 Soft_ECC_Correction     0x000a   253   251   000    Old_age   Always       -       0
205 Thermal_Asperity_Rate   0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   194   189   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
SMART Error Log Version: 1
Warning: ATA error count 454 inconsistent with error log pointer 5
ATA Error Count: 454 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 454 occurred at disk power-on lifetime: 14837 hours (618 days + 5 hours)
  When the command that caused the error occurred, the device was in an unknown state.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 24 81 02 32 e0  Error: UNC 36 sectors at LBA = 0x00320281 = 3277441
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 d0 00 81 02 32 e0 00      02:36:40.624  READ DMA EXT
  25 d0 d2 af 01 32 e0 00      02:36:40.624  READ DMA EXT
  25 d0 2e 81 e0 31 e0 00      02:36:40.624  READ DMA EXT
  25 d0 00 81 df 31 e0 00      02:36:40.608  READ DMA EXT
  25 d0 d2 af de 31 e0 00      02:36:40.608  READ DMA EXT
Error 453 occurred at disk power-on lifetime: 12776 hours (532 days + 8 hours)
  When the command that caused the error occurred, the device was in an unknown state.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 52 27 0f e0  Error: UNC at LBA = 0x000f2752 = 993106
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d0 01 52 27 0f e0 00      03:46:51.472  READ VERIFY SECTOR(S) EXT
  25 d0 01 00 00 00 e0 00      03:46:51.472  READ DMA EXT
  42 d0 01 51 27 0f e0 00      03:46:50.464  READ VERIFY SECTOR(S) EXT
  25 d0 01 00 00 00 e0 00      03:46:50.448  READ DMA EXT
  42 d0 02 51 27 0f e0 00      03:46:49.440  READ VERIFY SECTOR(S) EXT
Error 452 occurred at disk power-on lifetime: 12776 hours (532 days + 8 hours)
  When the command that caused the error occurred, the device was in an unknown state.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 51 27 0f e0  Error: UNC at LBA = 0x000f2751 = 993105
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d0 01 51 27 0f e0 00      03:46:50.464  READ VERIFY SECTOR(S) EXT
  25 d0 01 00 00 00 e0 00      03:46:50.448  READ DMA EXT
  42 d0 02 51 27 0f e0 00      03:46:49.440  READ VERIFY SECTOR(S) EXT
  42 d0 02 4f 27 0f e0 00      03:46:49.440  READ VERIFY SECTOR(S) EXT
  42 d0 04 53 27 0f e0 00      03:46:48.640  READ VERIFY SECTOR(S) EXT
Error 451 occurred at disk power-on lifetime: 12776 hours (532 days + 8 hours)
  When the command that caused the error occurred, the device was in an unknown state.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 51 27 0f e0  Error: UNC at LBA = 0x000f2751 = 993105
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d0 02 51 27 0f e0 00      03:46:49.440  READ VERIFY SECTOR(S) EXT
  42 d0 02 4f 27 0f e0 00      03:46:49.440  READ VERIFY SECTOR(S) EXT
  42 d0 04 53 27 0f e0 00      03:46:48.640  READ VERIFY SECTOR(S) EXT
  25 d0 01 00 00 00 e0 00      03:46:48.624  READ DMA EXT
  42 d0 04 4f 27 0f e0 00      03:46:47.616  READ VERIFY SECTOR(S) EXT
Error 450 occurred at disk power-on lifetime: 12776 hours (532 days + 8 hours)
  When the command that caused the error occurred, the device was in an unknown state.
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 02 4f 27 0f e0  Error: UNC at LBA = 0x000f274f = 993103
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  42 d0 04 4f 27 0f e0 00      03:46:47.616  READ VERIFY SECTOR(S) EXT
  25 d0 01 00 00 00 e0 00      03:46:47.616  READ DMA EXT
  42 d0 08 57 27 0f e0 00      03:46:47.600  READ VERIFY SECTOR(S) EXT
  25 d0 01 00 00 00 e0 00      03:46:47.600  READ DMA EXT
  42 d0 08 4f 27 0f e0 00      03:46:46.576  READ VERIFY SECTOR(S) EXT
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%     26543         319759751
# 2  Short offline       Completed: read failure       60%     26542         319759751
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
==
Short offline test ends at 40% completed, and extended offline one ends
 at 90% completed, the LBA of the first error being 319759751 in both
 cases.
2) HDD associated with /dev/sdb verifies
==
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3320620AS
Serial Number:    9QFAYRCP
Firmware Version: 3.AAG
User Capacity:    320,072,933,376 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Aug 28 16:11:54 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   253   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   096   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1753
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       355938474
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       15739
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       1745
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   053   048   045    Old_age   Always       -       47 (Lifetime Min/Max 47/48)
194 Temperature_Celsius     0x0022   047   052   000    Old_age   Always       -       47 (0 20 0 0)
195 Hardware_ECC_Recovered  0x001a   065   055   000    Old_age   Always       -       1306602
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
==
(this is the one that looks the healthiest, actually).
3) The HDD associated with /dev/sdc, which should be in some way broken
(being given the messages that I wrote above from /var/log/syslog), does
not look so through SMART:
==
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Maxtor DiamondMax 21
Device Model:     MAXTOR STM3320820AS
Serial Number:    5QF2T6W6
Firmware Version: 3.AAE
User Capacity:    320,072,933,376 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Aug 28 16:12:32 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[...]
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   092   085   006    Pre-fail  Always       -       63613073
  3 Spin_Up_Time            0x0003   096   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2362
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       574383816
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       18552
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   020    Old_age   Always       -       2386
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   054   046   045    Old_age   Always       -       46 (Lifetime Min/Max 45/47)
194 Temperature_Celsius     0x0022   046   054   000    Old_age   Always       -       46 (0 12 0 0)
195 Hardware_ECC_Recovered  0x001a   065   052   000    Old_age   Always       -       222324542
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     18551         -
# 2  Extended offline    Completed without error       00%     18493         -
# 3  Short offline       Completed without error       00%     18492         -
# 4  Short offline       Completed without error       00%     13106         -
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
==
What can I deduce from this? It looks like /dev/sdc is broken but SMART
tells /dev/sda would have more chance being on the verge to broke than
/dev/sdc.
Note that I tried exchanging SATA cables, to no avail.
All the best,
- -- 
Merciadri Luca
See http://www.student.montefiore.ulg.ac.be/~merciadri/
- -- 
It's the early bird that gets the worm.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>
iEYEARECAAYFAlA80oQACgkQM0LLzLt8MhwUGgCbB9WOOBb3vHlorBnymavWCvmY
aBkAnRbCcc2WZK+AXQTcwqKTGyt0ph/b
=OzHm
-----END PGP SIGNATURE-----
Reply to: