[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Dying hard drive?

Good morning all,

I have a Dell Precision desktop with two disks and a built in RAID 1 controller
that I am not using. I originally tried to install Wheezy on disk0 but ran
into issues with my video card, so I installed Squeeze on disk1 so I could
be up and running. I eventually went back and installed Squeeze on disk0
and did something I don't have a lot of experience with, but want to learn.
I created logical volumes on that disk.

All has been running fine on the disk0 system for about four weeks. Yesterday
I ran into issues that I think is probably an indication that disk0 is on its
way to the bit bucket in the sky, but since I'm dealing with unfamiliar
territory here I figured I would seek help from anyone with more knowledge
than I have.

I was working in a terminal window, running some complex queries on my
PostgreSQL server. I usually edit things like that in vi in a separate
terminal, and was doing so, saving that file on my system. Out of the
blue I got errors that the file could not be written. In fact, the entire
file system seemed to suddenly become read-only, but I was able to do things,
like save my file to a previously mounted thumb drive. Mount showed nothing
unusual, but I couldn't unmount the thumb drive and received an error about
/etc/mtab being mounted on a read-only file system. In addition, "shutdown -h
now" failed to shut the system down so I had no choice but to manually power
off. The system failed to boot after that, with a kernel panic and unable
to mount anything as root.

I booted into the old system on disk1 (BTW, lilo is in the MBR of disk0),
and was able to finish my work. Then I installed lvm2, ran vgscan, and ran
e2fsck on the volume. It found all kinds of issues, but I was able to repair
everything. I then mounted the volume and copied everything to an external
drive for grins (I always have backups of the important stuff, but figured
a full backup couldn't hurt).

But, whenever I run e2fsck on the volume it consistently gives numerous
errors during pass 2 that I'm not sure of the meaning of:

ata1.00: exception Emask 0x0 SAct 0xf SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/00:00:a7:16:01/01:00:24:00:00/40 tag 0 ncq 131072 in
         res 41/40:00:06:17:01/00:00:24:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete

These messages are repeated about six times. I also saw similar messages
while copying the files from the volume.

e2fsck was successful, because I was able to boot into the OS on disk0, so I
did and I left it sitting all night, with no issues. I am currently working off
of it, trying to replicate the problem, but not having any luck. I will
probably go to Dell seeking a replacement hard drive, but was curious for input
here in case this is something that can be repaired instead.


Reply to: