Re: ATA abnormal status
On Thursday 24 August 2006 11:07, Erik Mouw wrote:
> On Wed, Aug 23, 2006 at 10:16:30PM +0200, Francesco Pietra wrote:
> > While computing with mpqc2.3.1 (debian etch amd6a; dual opterons; 8GB ram
> > ECC; raid 1; filesystem ext3; grub on its own partition):
> > Led of HD permanently lighted.
> > Messages on screen:
> > ATA: abnormal status 0x58 on port 0x1C5F
> > ata3: command 0x35 timeout, stat 0x50 host_stat 0x24
> > ata 4: same as above for ata3
> > Trying:
> > $ df -h
> > sd 3:0:0:0 SCSI error return code 0x8000002
> > Additional sense : SCSI parity error
> > end request: I/O error, dev sda, sector 47748992
> Hmm, SATA drive, probably old kernel, sense code not yet mapped to ATA
> sense codes. SCSI parity error means cable problem, IIRC this is mapped
> to the ATA "CRC error", which also means bad cable.
> > Later:
> > raid1 Disk failure on sd6, disabling device
> > raid1 :sdb3: redirecting another mirror
> > RAID1 conf printout
> > --- wd:1 rd:2
> > disk1, wo:0, o:1, dev: sdb8
> > _______
> > I cold only switch power off because it did not respond to down commands.
> > _______
> > Rebooting, the $ prompt was obtained without warnings.
> > Then I looked at
> > /etc/fstab
> > and issued:
> > #fdisk /dev/sda
> > # p
> > #fdisk /dev/sdb
> > # p
> > #df -h
> > there was nothing wrong: both disks identical to before.
> > ______________________________
> > Similar hanging already occurred on 3 August (it was already ext3
> > filesystem) during similar computation with mpqc. There was nothing wrong
> > after rebooting and up to now there was no anomaly. I checked disks and
> > ram.
> > Before that, when using raiser 3.6 filesystem, I had many problems with
> > debian while carrying out mpqc computations . Therefore, I changed to
> > ext3.
> > Thread computations with mpqc for without interruption many days are big
> > stress to the system (mostly for memory because mpqc writes sparingly on
> > HD). ___________--
> > Any guess at what that means? I naively understand it was failure by the
> > OS, not failure of hardware.
> I guess hardware failure. Replace cables and see if that fixes your
> problem. Would be nice to know some more details: kernel version,
> hardware (what sata controller, what drives).
Thanks for your attention.
Main board: Tyan K8WE S2895
SATA II controllers nForce Pro 2200.
Added graphic card Pixel view 6600 256M PCI.
Added SCSI controller LSI PCI for external scsi HD (old IBM for backup).
CPU1 and CPU2: Opteron Dual Core 265.
ram: 8 x KingstonKVR 400D43a/1GB DDR2 CL3 Ecc Reg.
HD: 2 x Maxtor 6V300F0; ATA version 7; ATA standard ATA/ATAPI-7 T13 1532 D.
OS: debian etch amd6a, kernel 2.6.15-1-amd64-k8-smp, filesystem ext3, grub on
boot partition, partitions for proc home tmp usr var swap, raid1 software, no
Xsystem when the accident occurred.
#smartctl -a -d ata /dev/sda (or sdb) reported PASSED (run after the accident
described above). Unable to see the result of short self test (don't know
where it is written, if at all; disks are not in database).
While I plan to replace the HD cables as soon as this computation has attained
convergence, I wonder whether lack of a power protection unit may have been
responsible for the failure of disks. I plan anyway to buy one; only
uncertain about the power for this machine and an Athlon k7 pc. 800VA enough?
I do not need long energy supply because calculations can be resumed from
last HD written result; perhaps one minute energy supply?