Disk problems or worse?

To: debian-user <debian-user@lists.debian.org>
Subject: Disk problems or worse?
From: Ralph Katz <ralph.katz@rcn.com>
Date: Wed, 02 Jun 2010 21:21:04 -0400
Message-id: <[🔎] 4C070380.700@rcn.com>

Lenny install on newly acquired used Dell hangs and throws errors to
syslog.  Do I have two bad disks or a more serious hardware problem?
Short of buying a new disk, how would I know?  What would you recommend?
 Or do I have a simple BIOS setting problem?

(My last post to debian-user was in 2008.  Etch has continued to be rock
solid on two desktops.  Now I felt was time to upgrade.)

First, an old DELL GX240 was obtained and Lenny/xfce installed; P4, 1Gb,
 120 Gb WDC disk.

Syslog showed all kinds of errors while system would hang at times:

May 24 21:53:39 spike kernel: [ 5034.952013] hda: status timeout:
status=0x80 { Busy }
May 24 21:53:39 spike kernel: [ 5034.952021] ide: failed opcode was: unknown
May 24 21:53:39 spike kernel: [ 5034.952030] hda: DMA disabled
May 24 21:53:39 spike kernel: [ 5034.952066] hda: drive not ready for
command
May 24 21:54:14 spike kernel: [ 5064.952021] ide0: reset timed-out,
status=0x80
May 24 21:54:14 spike kernel: [ 5065.393331] hda: status timeout:
status=0x80 { Busy }
May 24 21:54:14 spike kernel: [ 5065.393331] ide: failed opcode was: unknown
May 24 21:54:14 spike kernel: [ 5065.393331] hda: drive not ready for
command
May 24 21:54:14 spike kernel: [ 5065.393331] Clocksource tsc unstable
(delta = 4686898152 ns)
May 24 21:54:44 spike kernel: [ 5099.964023] ide0: reset timed-out,
status=0x80
May 24 21:54:44 spike kernel: [ 5099.964040] end_request: I/O error, dev
hda, sector 10867375
May 24 21:54:44 spike kernel: [ 5099.964104] end_request: I/O error, dev
hda, sector 13826839
May 24 21:54:44 spike kernel: [ 5099.964115] Buffer I/O error on device
dm-2, logical block 360455

[snipped 20 Kb of I/O errors]

May 24 21:54:44 spike kernel: [ 5099.967007] end_request: I/O error, dev
hda, sector 208223535
May 24 21:54:44 spike kernel: [ 5099.967024] EXT3-fs error (device
dm-5): ext3_get_inode_loc: unable to read inode block - inode=5792911,
block=23167050
May 24 21:54:44 spike kernel: [ 5099.967128] Aborting journal on device
dm-5.
May 24 21:54:44 spike kernel: [ 5099.968575] ext3_abort called.
May 24 21:54:44 spike kernel: [ 5099.968587] EXT3-fs error (device
dm-5): ext3_journal_start_sb: Detected aborted journal
May 24 21:54:44 spike kernel: [ 5099.968594] Remounting filesystem read-only

I concluded the disk was dead (but SMART tests PASSED), and replaced it
with another used 120 Gb WDC, re-installed Lenny, and soon the system
would again hang, typically at start up.

Sylog entries of note with the second disk installed:

/var/log/syslog:Jun  2 08:52:40 spike smartd[2346]: Device: /dev/hda,
SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 198
/var/log/syslog.1:Jun  1 08:13:56 spike kernel: [  936.000023] hda:
dma_timer_expiry: dma status == 0x21
/var/log/syslog.1:Jun  1 08:28:44 spike smartd[2357]: Device: /dev/hda,
SMART Usage Attribute: 196 Reallocated_Event_Count changed from 196 to 195

May 31 09:54:09 spike kernel: [  620.084022] hda: dma_timer_expiry: dma
status == 0x20
May 31 09:54:09 spike kernel: [  620.084031] hda: DMA timeout retry
May 31 09:54:09 spike kernel: [  620.084034] hda: timeout waiting for DMA
May 31 09:54:09 spike kernel: [  624.232267] Clocksource tsc unstable
(delta = 4686697657 ns)
May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Prefailure
Attribute: 5 Reallocated_Sector_Ct changed from 200 to 199
May 31 10:14:07 spike smartd[2331]: Device: /dev/hda, SMART Usage
Attribute: 196 Reallocated_Event_Count changed from 200 to 196

Meanwhile, SMART self-tests short and long passed.  No errors were
reported by smartctl -a /dev/hda.

This morning I had to reboot a hung system with Alt SysRq b because X,
an ssh connection, VT1 and CrlAltDel failed.

Searching the net for "Clocksource tsc unstable" suggested disabling
acpi in bios.  Hey, I'm just a desktop user, and this is beginning to
get beyond my 7 yrs capabilities of understanding the magic.

Suggestions welcomed, thanks!

Ralph

Reply to:

Follow-Ups:
- Re: Disk problems or worse?
  - From: Mark <mamarcac@gmail.com>
- Re: Disk problems or worse?
  - From: Jochen Schulz <ml@well-adjusted.de>

Prev by Date: Re: [OT] Bandwidth usage daemon recommendation
Next by Date: Re: lilo removal in squeeze (or, "please test grub2")
Previous by thread: Re: [OT] Bandwidth usage daemon recommendation
Next by thread: Re: Disk problems or worse?
Index(es):
- Date
- Thread