[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hard disk failure?



On Thu, 2006-03-30 at 13:38 +0200, Ramiro Aceves wrote:
> Hello Debian friends,
> 
> On september 2005 I bought a new Seagate 160 GB hard disk type
> ST3160021A UDMA (not SATA) and after some time of good working I am
> getting some kind of errors, mainly on Debian Sarge startup.
> 
> Sometimes my system do not boot because it says something like: "
> readonly filesystem".
> 
> The errors occur frequently now, and they often happen on the system
> "cold" booting, I mean, the first time I switch it on.
> 
> I cannot  tell you the exact messages cause I am not the normal user
> of this computer. My mother, who uses the computer, has written down
> the following message, so It could be it is not accurate:
> 
> "ext3 error device hda1 in start transation: readonly filesystem."
> 
> I also have some  /var/log/messages errors:
> 
> 
> Mar 26 10:49:23 debian-remix kernel: hda: dma_intr: status=0x51 {
> DriveReady SeekComplete Error }
> Mar 26 10:49:23 debian-remix kernel: hda: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=43778543, high=2, low=10224111,
> sector=43778543
> Mar 26 10:49:23 debian-remix kernel: end_request: I/O error, dev hda,
> sector 43778543
> 
> 
> I have also have run SMARTCTL tests with the following results:
> 
> 
> # smartctl -a /dev/hda
> 
> >From wich I have captured the last 5 errors:

<smart stuff>

> 
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   058   056   006    Pre-fail
> Always       -       129227943
>   3 Spin_Up_Time            0x0003   097   096   000    Pre-fail
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       1
>   5 Reallocated_Sector_Ct   0x0033   098   098   036    Pre-fail
> Always       -       80
>   7 Seek_Error_Rate         0x000f   073   060   030    Pre-fail
> Always       -       22255207
>   9 Power_On_Hours          0x0032   100   100   000    Old_age
> Always       -       795
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       559
> 194 Temperature_Celsius     0x0022   033   040   000    Old_age   Always
>       -       33
> 195 Hardware_ECC_Recovered  0x001a   058   056   000    Old_age   Always
>       -       129227943
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
>       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
>       -       0
> 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age
> Offline      -       0
> 202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always
>       -       0

<smart error stuff>

> 
> 
> What do you thing shoud I do?
> 
> 1-¿Does it make sense to check the disk cable? Or is it an "internal"
> disk drive error?
> 2- Should I return the disk to my seller?
> 
> 
> Normally, restarting the computer solves the problem after a fsck.
> Sometimes I have also run a "manual" fsck with no aparent data loss. I
> am concerned about a more serious  hard disk failure with real data
> loss. (I have done backups, no problem   ;-)  )
> 
> Many thanks in advance:
> 
> Ramiro
> 

Those errors are bad indeed!. I've seen those kernel messages on one of
my machines due to a faulty cable (few years ago). I've faced some hard
drive issues last week (still facing actually :)) and started looking
into smart. I've found this article which explains it quite good
(http://www.linuxjournal.com/article/6983 ). According to the article,
high values in the attribute-table are good. You have some pretty low
values, even below the treshold, which is not good.

You can also boot knoppix or some live distro and run badblocks on the
drive. This will scan the entire drive for badblocks.

Maybe Seagate provides a tool (on the site for instance) to examine the
drive. That could give some specific information.

I don't know if you can return a drive by saying that it is dying. I
think they will send you home with the message: "come back when it's
dead".

For information, my attribute table from my Maxtor 6Y120M0 (SATA):
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   138   128   063    Pre-fail  Always
-       24509
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always
-       455
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always
-       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail
Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always
-       0
  8 Seek_Time_Performance   0x0027   252   251   187    Pre-fail  Always
-       64510
  9 Power_On_Minutes        0x0032   218   218   000    Old_age   Always
-       103h+20m
 10 Spin_Retry_Count        0x002b   213   205   157    Pre-fail  Always
-       21
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always
-       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always
-       308
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always
-       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always
-       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always
-       30
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always
-       1435
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age
Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age
Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   164   010   000    Old_age
Offline      -       190
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always
-       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always
-       2
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always
-       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always
-       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always
-       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always
-       0
207 Spin_High_Current       0x002a   213   205   000    Old_age   Always
-       21
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always
-       0
209 Offline_Seek_Performnce 0x0024   193   192   000    Old_age
Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age
Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age
Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age
Offline      -       0

Good luck

Hope I helped (a little)

Philippe De Ryck



Reply to: