[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Disk heads won't park [pat II]



Reviving this thread since i tried turning the machine on again (and
amybe another thread will bump this one).

And, again (well i wasn't expecting it to go away), as soon as the
machine starts - right after POST, even before GRUB - the drive starts
making "reading noise" (like when an antivirus is scanning or the
system is thrashing). The only way it stops is with hdparm -y (no
wonders there).

This is a Seagate Barracuda ST31000528AS drive with a CC49 firmware
upgrade. Here's a few other commands i tried:

~# hdparm -Z /dev/sdd
/dev/sdd:
 disabling Seagate auto powersaving mode
 HDIO_DRIVE_CMD(seagatepwrsave) failed: Input/output error

~# hdparm -B /dev/sdd
APM_level      = not supported

The message "Incorrect metadata area header checksum on /dev/sdd1 at
offset 4096" shows up in dmesg and on lvm operations. Here's some
SMART fun:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   113   099   006    -    50643073
  3 Spin_Up_Time            PO----   095   095   000    -    0
  4 Start_Stop_Count        -O--CK   099   099   020    -    1274
  5 Reallocated_Sector_Ct   PO--CK   047   047   036    -    2181
  7 Seek_Error_Rate         POSR--   075   060   030    -    35143152
  9 Power_On_Hours          -O--CK   079   079   000    -    18750
 10 Spin_Retry_Count        PO--C-   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   100   100   020    -    638
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   099   000    -    1
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   071   051   045    -    29 (Min/Max 22/29)
194 Temperature_Celsius     -O---K   029   049   000    -    29 (0 11 0 0)
195 Hardware_ECC_Recovered  -O-RC-   026   018   000    -    50643073
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
240 Head_Flying_Hours       ------   100   253   000    -    178838143258596
241 Total_LBAs_Written      ------   100   253   000    -    1457922426
242 Total_LBAs_Read         ------   100   253   000    -    1552877542
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

I've also run a few tests, but they also show as Aborted even when i
let it run for hours:

=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Self-test routine in progress 90%     18752         -
# 2  Short offline       Aborted by host               90%     18752         -
# 3  Short offline       Aborted by host               90%     18751         -
# 4  Short offline       Aborted by host               90%     18751         -
# 5  Short offline       Aborted by host               90%     18751         -
# 6  Extended offline    Aborted by host               90%     18750         -
# 7  Extended offline    Completed without error       00%     18746         -
# 8  Extended offline    Aborted by host               90%     18742         -
# 9  Extended offline    Interrupted (host reset)      90%     18742         -
#10  Short offline       Interrupted (host reset)      00%     18741         -
#11  Short offline       Completed without error       00%     18653         -

# smartctl -A /dev/sdd
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   113   099   006    Pre-fail
Always       -       50652210
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail
Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age
Always       -       1283
  5 Reallocated_Sector_Ct   0x0033   047   047   036    Pre-fail
Always       -       2181
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35273246
  9 Power_On_Hours          0x0032   079   079   000    Old_age
Always       -       18754
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age
Always       -       638
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age
Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age
Always       -       0
190 Airflow_Temperature_Cel 0x0022   054   051   045    Old_age
Always       -       46 (Min/Max 22/47)
194 Temperature_Celsius     0x0022   046   049   000    Old_age
Always       -       46 (0 11 0 0)
195 Hardware_ECC_Recovered  0x001a   026   018   000    Old_age
Always       -       50652210
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       219034742181864
241 Total_LBAs_Written      0x0000   100   253   000    Old_age
Offline      -       1457922426
242 Total_LBAs_Read         0x0000   100   253   000    Old_age
Offline      -       1552877978

Then i ran ~# smartctl -l error /dev/sdd every minute or so, these 2
change a lot:


  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35274663
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35274722
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35274773
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35274811
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35274850
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35274971
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail
Always       -       35275013
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       239070764617704
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       251835407421416
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       262632955203560
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       271287314305000
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       279546536415208
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       23424751652840
240 Head_Flying_Hours       0x0000   100   253   000    Old_age
Offline      -       32220844675048

I do assume it is failing, but i'd like to know why and which values
are really tell-tale (for instance the WHEN_FAILED column above is
empty, so i can't realyl draw any conclusions).

This is a recently installed, headless system with almost nothing installed.

Thanks,
Nuno


Reply to: