Hi,
In the last couple of days, I've begun to see both kernel errors and
SMART warnings about my laptop's two and a half year old hard drive.
An excerpt of a current 'dmesg | grep hda' (these errors occurred upon
resuming from suspend to disk):
[34074.459505] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34074.459685] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34074.459886] hda: possibly failed opcode: 0x25
[34079.744751] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.744931] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.745135] hda: possibly failed opcode: 0x25
[34079.750086] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.750263] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.750466] hda: possibly failed opcode: 0x25
[34079.789002] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.789192] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.789411] hda: possibly failed opcode: 0x25
[34079.794851] hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
[34079.795043] hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
[34079.795261] hda: possibly failed opcode: 0x25
I ran the short and long SMART self-tests, and they seem clean:
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 5880 -
# 2 Short offline Completed without error 00% 5879 -
# 3 Short offline Completed without error 00% 1435 -
[#1 and #2 are the ones I ran yesterday, IIUC.]
I've attached the output of '# smartctl -a /dev/hda' to this mail.
Here's an excerpt of syslog ('grep smartd /var/log/syslog', with a bunch
of 'Temperature_Celsius changed' lines removed, since I think they're
normal):
Jun 9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 191 G-Sense_Error_Rate changed from 100 to 99
Jun 9 15:12:29 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 12 to 17
Jun 9 15:12:29 lizzie smartd[3474]: Sending warning via mail to root@localhost ...
Jun 9 15:12:29 lizzie smartd[3474]: Warning via mail to root@localhost: successful
Jun 9 19:09:49 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 17 to 28
Jun 9 20:42:29 lizzie smartd[3474]: Device: /dev/hda, SMART Usage Attribute: 191 G-Sense_Error_Rate changed from 99 to 100
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 2 Throughput_Performance changed from 100 to 105
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 3 Spin_Up_Time changed from 151 to 152
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 100 to 126
Jun 10 14:09:30 lizzie smartd[3474]: Device: /dev/hda, ATA error count increased from 28 to 34
So far, the only actual problem that I've noticed is a (single) failure to
resume from disk yesterday, with some message (I neglected to save it)
about a checksum failure, which I believe was accompanied by some
kernel errors similar to the ones that I've reproduced above.
Is this drive going? What further tests / diagnostics can I do? [Yes,
I have backups, and I'm going to redouble my attention to keeping them
current making sure that they're comprehensive.]
Celejar
--
mailmin.sourceforge.net - remote access via secure (OpenPGP) email
ssuds.sourceforge.net - A Simple Sudoku Solver and Generator
Attachment:
smart-info
Description: Binary data