Re: ATA abnormal status

On Wednesday 30 August 2006 23:49, Erik Mouw wrote:
> On Fri, Aug 25, 2006 at 08:37:03PM +0200, Francesco Pietra wrote:
> > Sent again: The external scsi HD was not connected when the accident
> > occurred
> >
> >
> > Hi Erik:
> > Thanks for your attention.
> >
> > Main board: Tyan K8WE S2895
> > SATA II controllers nForce Pro 2200.
> > Added graphic card Pixel view 6600 256M PCI.
> > Added SCSI controller LSI PCI for external scsi HD (old IBM for backup).
> > CPU1 and CPU2: Opteron Dual Core 265.
> > ram: 8 x KingstonKVR 400D43a/1GB DDR2 CL3 Ecc Reg.
> > HD: 2 x Maxtor 6V300F0; ATA version 7; ATA standard ATA/ATAPI-7 T13 1532
> > D.
> There appear to be problems with Nvidia Nforce chipsets with certain
> Maxtor drives that result in data corruption. From what I could figure
> out it appears to be a problem in the nforce SATA engine that show up
> with certain Maxtor drives, though sometimes also with other brands.
> Maxtor has a firmware update that works around the Nvidia bug, you
> might want to ask their support department.
> > OS: debian etch amd6a, kernel 2.6.15-1-amd64-k8-smp, filesystem ext3,
> > grub on boot partition, partitions for proc home tmp usr var swap, raid1
> > software, no Xsystem when the accident occurred.
> Make sure that you don't have the proprietary Nvidia kernel module
> loaded for the graphics card. Because it's a proprietary module it's
> not properly reviewed so it might silently corrupt memory.
> > #smartctl -a -d ata /dev/sda (or sdb) reported PASSED (run after the
> > accident described above). Unable to see the result of short self test
> > (don't know where it is written, if at all; disks are not in database).
> You could try to get smartmontools from debian-unstable and see if it
> has support for your drives.
> > While I plan to replace the HD cables as soon as this computation has
> > attained convergence, I wonder whether lack of a power protection unit
> > may have been responsible for the failure of disks.
> Possible, though drives usually tend to die completely when they get
> damaged by overvoltage.
> > I plan anyway to buy one; only
> > uncertain about the power for this machine and an Athlon k7 pc. 800VA
> > enough? I do not need long energy supply because calculations can be
> > resumed from last HD written result; perhaps one minute energy supply?
> APC has a nice product selector on their website, see
> http://www.apc.com/tools/ups_selector/ . There's good support for APC
> UPSes by nut, and knutclient even gives you a nice graphical monitor
> application (and of course nut can be monitored by nagios).
> Anyway, back to your problem:
> - Make sure you don't use the proprietary nvidia kernel module
> - Replace the cables
> - If it still persists, check Maxtor support

Thank you for all extremely useful information.

During the last night, while computing with mpqc2.3.1 (no X system loaded, as 
always) same problem with disks. HD led lighted continuously. The screen 
---SCSI error
---<o> kernel panic - not syncing: Aiee, killing interrupt handler!

All checks I could carry out as before, showed the disks in order, and my data 
also in order.

Now I have changed the data cables with virgin cables to both disks and 
restarted mpqc.

I'll do nothing else for today because I received two new disks I had recently 
ordered ---WD 1500ADFD WD Raptor
Date 13 Jun 2006
5VDC 0.90A
12VDC 0.75

I plan to replace the Maxtor with these. My only concern is the ventilation. 
Although the Enermax CS-721 has 4 fans (in addition to the two to the dual 
opterons) there is no direct ventilation to the disks. 

I wonder whether an aluminum Cooldrive 4002 (with its own low-diameter fan) 
for each disk is an appropriate heat dissipation or a large-diameter external 
fan directed toward the disk is better. Your advice?

Next Monday I'll receive an 1500AV UPS.

Thanks again 

> Erik

