Disk problems or worse?
Lenny install on newly acquired used Dell hangs and throws errors to
syslog. Do I have two bad disks or a more serious hardware problem?
Short of buying a new disk, how would I know? What would you recommend?
Or do I have a simple BIOS setting problem?
(My last post to debian-user was in 2008. Etch has continued to be rock
solid on two desktops. Now I felt was time to upgrade.)
First, an old DELL GX240 was obtained and Lenny/xfce installed; P4, 1Gb,
120 Gb WDC disk.
Syslog showed all kinds of errors while system would hang at times:
May 24 21:53:39 spike kernel: [ 5034.952013] hda: status timeout:
status=0x80 { Busy }
May 24 21:53:39 spike kernel: [ 5034.952021] ide: failed opcode was: unknown
May 24 21:53:39 spike kernel: [ 5034.952030] hda: DMA disabled
May 24 21:53:39 spike kernel: [ 5034.952066] hda: drive not ready for
command
May 24 21:54:14 spike kernel: [ 5064.952021] ide0: reset timed-out,
status=0x80
May 24 21:54:14 spike kernel: [ 5065.393331] hda: status timeout:
status=0x80 { Busy }
May 24 21:54:14 spike kernel: [ 5065.393331] ide: failed opcode was: unknown
May 24 21:54:14 spike kernel: [ 5065.393331] hda: drive not ready for
command
May 24 21:54:14 spike kernel: [ 5065.393331] Clocksource tsc unstable
(delta = 4686898152 ns)
May 24 21:54:44 spike kernel: [ 5099.964023] ide0: reset timed-out,
status=0x80
May 24 21:54:44 spike kernel: [ 5099.964040] end_request: I/O error, dev
hda, sector 10867375
May 24 21:54:44 spike kernel: [ 5099.964104] end_request: I/O error, dev
hda, sector 13826839
May 24 21:54:44 spike kernel: [ 5099.964115] Buffer I/O error on device
dm-2, logical block 360455
[snipped 20 Kb of I/O errors]
May 24 21:54:44 spike kernel: [ 5099.967007] end_request: I/O error, dev
hda, sector 208223535
May 24 21:54:44 spike kernel: [ 5099.967024] EXT3-fs error (device
dm-5): ext3_get_inode_loc: unable to read inode block - inode=5792911,
block=23167050
May 24 21:54:44 spike kernel: [ 5099.967128] Aborting journal on device
dm-5.
May 24 21:54:44 spike kernel: [ 5099.968575] ext3_abort called.
May 24 21:54:44 spike kernel: [ 5099.968587] EXT3-fs error (device
dm-5): ext3_journal_start_sb: Detected aborted journal
May 24 21:54:44 spike kernel: [ 5099.968594] Remounting filesystem read-only
I concluded the disk was dead (but SMART tests PASSED), and replaced it
with another used 120 Gb WDC, re-installed Lenny, and soon the system
would again hang, typically at start up.
Sylog entries of note with the second disk installed:
/var/log/syslog:Jun 2 08:52:40 spike smartd[2346]: Device: /dev/hda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 198
/var/log/syslog.1:Jun 1 08:13:56 spike kernel: [ 936.000023] hda:
dma_timer_expiry: dma status == 0x21
/var/log/syslog.1:Jun 1 08:28:44 spike smartd[2357]: Device: /dev/hda,
SMART Usage Attribute: 196 Reallocated_Event_Count changed from 196 to 195
May 31 09:54:09 spike kernel: [ 620.084022] hda: dma_timer_expiry: dma
status == 0x20
May 31 09:54:09 spike kernel: [ 620.084031] hda: DMA timeout retry
May 31 09:54:09 spike kernel: [ 620.084034] hda: timeout waiting for DMA
May 31 09:54:09 spike kernel: [ 624.232267] Clocksource tsc unstable
(delta = 4686697657 ns)
May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Prefailure
Attribute: 5 Reallocated_Sector_Ct changed from 200 to 199
May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Usage
Attribute: 196 Reallocated_Event_Count changed from 200 to 196
Meanwhile, SMART self-tests short and long passed. No errors were
reported by smartctl -a /dev/hda.
This morning I had to reboot a hung system with Alt SysRq b because X,
an ssh connection, VT1 and CrlAltDel failed.
Searching the net for "Clocksource tsc unstable" suggested disabling
acpi in bios. Hey, I'm just a desktop user, and this is beginning to
get beyond my 7 yrs capabilities of understanding the magic.
Suggestions welcomed, thanks!
Ralph
Reply to: