[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Understanding my SMART errors



Hi,

In the last couple of days, I've begun to see both kernel errors and
SMART warnings about my laptop's two and a half year old hard drive.

An excerpt of a current 'dmesg | grep hda' (these errors occurred upon
resuming from suspend to disk):

[34074.459505] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34074.459685] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34074.459886] hda: possibly failed opcode: 0x25
[34079.744751] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.744931] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.745135] hda: possibly failed opcode: 0x25
[34079.750086] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.750263] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.750466] hda: possibly failed opcode: 0x25
[34079.789002] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.789192] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.789411] hda: possibly failed opcode: 0x25
[34079.794851] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.795043] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.795261] hda: possibly failed opcode: 0x25

I ran the short and long SMART self-tests, and they seem clean:

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      5880         -
# 2  Short offline       Completed without error       00%      5879         -
# 3  Short offline       Completed without error       00%      1435         -

[#1 and #2 are the ones I ran yesterday, IIUC.]

I've attached the output of '# smartctl -a /dev/hda' to this mail.

Here's an excerpt of syslog ('grep smartd /var/log/syslog', with a bunch
of 'Temperature_Celsius changed' lines removed, since I think they're
normal):

Jun  9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 191 G-Sense_Error_Rate changed from 100 to 99 
Jun  9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 12 to 17 
Jun  9 15:12:29 lizzie smartd[3474]: Sending warning via mail to root@localhost ... 
Jun  9 15:12:29 lizzie smartd[3474]: Warning via mail to root@localhost: successful 
Jun  9 19:09:49 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 17 to 28 
Jun  9 20:42:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 191 G-Sense_Error_Rate changed from 99 to 100 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 2 Throughput_Performance changed from 100 to 105 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 3 Spin_Up_Time changed from 151 to 152 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 100 to 126 
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 28 to 34 

So far, the only actual problem that I've noticed is a (single) failure to
resume from disk yesterday, with some message (I neglected to save it)
about a checksum failure, which I believe was accompanied by some
kernel errors similar to the ones that I've reproduced above.

Is this drive going?  What further tests / diagnostics can I do?  [Yes,
I have backups, and I'm going to redouble my attention to keeping them
current making sure that they're comprehensive.]

Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator

Attachment: smart-info
Description: Binary data


Reply to: