--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: linux-image-2.6.18-3-vserver-amd64: strange error messages and sporadic md-raid failures with sata_promise
- From: Fabian Fagerholm <fabbe@sirius.neoisis.net>
- Date: Mon, 25 Dec 2006 15:14:19 +0200
- Message-id: <20061225131419.5243.84316.reportbug@sirius.neoisis.net>
Package: linux-image-2.6.18-3-vserver-amd64
Version: 2.6.18-7
Severity: normal
I run an AMD64 box with two WDC WD1600JS-55NCB1 SATA drives. Every now
and than, the machine logs lines like the following in the syslog:
Dec 25 09:15:11 xyz kernel: ata2: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Dec 25 09:15:11 xyz kernel: ata2: status=0x51 { DriveReady SeekComplete Error }
Dec 25 09:15:11 xyz kernel: ata2: error=0x04 { DriveStatusError }
Also, lines like the following have appeared, though not as often:
Dec 25 09:15:07 xyz kernel: ATA: abnormal status 0x80 on port 0xFFFFC2000001C29C
The two drives are partitioned into four partitions, each set up as
RAID-1 mirrors using MD (softraid). There are several LVM volumes
on top of those mirrors. Every now and then, MD kicks out a partition
from one of the mirrors. Which one gets kicked out, seems to be random.
This has been going on for a few days, and seems to be related to
increased activity.
S.M.A.R.T. claims the disks are fine (the overall test result is PASSED)
except for one attribute, which I believe is related to temperature:
190 Unknown_Attribute 0x0022 049 010 045 Old_age Always In_the_past 51
The failure time is either In_the_past or FAILING_NOW, depending on the
situation.
I've tried running badblocks with read-only test on the drives, and they
show no errors. (I can't risk a write test on this machine.) I've
re-added the failed partition and the RAID resync went fine.
The disk that gets kicked out of the RAID array is scheduled for
replacement, but these symptoms seem to indicate more than just a failed
disk -- heat problems, a disk controller failure, a kernel bug, or
something. Other machines have had failed disks with similar symptoms,
but those had S.M.A.R.T. status FAILED and were clear-cut cases.
The disk controller chipset is a PDC20319, and the machine runs
sata_promise. Kernel etc. info below.
Please let me know what additional details are needed!
Thanks,
--
Fabian Fagerholm <fabbe@paniq.net>
-- System Information:
Debian Release: 4.0
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-3-vserver-amd64
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Versions of packages linux-image-2.6.18-3-vserver-amd64 depends on:
ii coreutil 5.97-5 The GNU core utilities
ii debconf 1.5.8 Debian configuration management sy
ii e2fsprog 1.39+1.40-WIP-2006.11.14+dfsg-1 ext2 file system utilities and lib
ii initramf 0.85c tools for generating an initramfs
ii module-i 3.3-pre3-1 tools for managing Linux kernel mo
Versions of packages linux-image-2.6.18-3-vserver-amd64 recommends:
ii util-vserver 0.30.211-6 user-space tools for Linux-VServer
-- debconf information excluded
--- End Message ---