[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 2.4.18 File System Corruption



On Mon, 2002-05-06 at 04:11, elf@buici.com wrote: 
> On Mon, May 06, 2002 at 12:37:31AM +1200, Adam Warner wrote:
> > Hi all,
> > 
> > I have experienced massive file system corruption on my main computer
> > after installing Woody and upgrading the kernel to 2.4.18-686-smp
> > (2.4.18-5). The filesystems were ext3. Hardware is an ABit BP6.
> > 
> > The install disks were also the 2.4.18-bf series.
> > 
> > This is an example of the kernel error messages I received when
> > transferring files to the box over NFS:
> > 
> > kernel: hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> > kernel: hde: drive not ready for command
> > kernel: hde: status error: status=0x50 { DriveReady SeekComplete }
> > kernel: hde: no DRQ after issuing WRITE
> 
> [deletia]
> 
> I had a similar problem recently.  My drives were brand new and so I
> believed that the problem had to be in the drivers.  In my case, there
> was no corruption.  There was much patching and tweaking until I found
> that some of my drives wouldn't transfer at the advertised udma5 rate.
> Using hdparam, I set the udma mode to 4 and they worked OK though at
> reduced throughput.  Before I was done, I switched to the hpt37x2
> driver and found that the drives ran fine at the advertised udma5
> rate.
> 
> IMHO, there is a strong indication that there be some problem with the
> IDE driver in 2.4.18 and DMA.

Thanks for the info elf. The hard disk is fine Steve. I have been
thrashing it for a couple of hours without any problems--including
perfectly passing fscks.

I installed Woody again (using the 2.4.17-bf rescue and root disks) and
compiled my own 2.4.17 SMP kernel because the 2.4.17 kernel images have
disappeared out of Woody and unstable. Eveything was fine. So I decided
to see if I could cause everything to break again by upgrading to
kernel-image-2.4.18-686-smp.

In the short time I have tested it everything is OK. And I have fully
filled the partitions with data. Whatever series of events caused the
0x58 status errors is elusive. Of course it would still be wise for me
to now stick with 2.4.17 at least until 2.4.19 is released.

Neil Conway wrote today on the Linux Kernel Mailing List, "Also, does
anyone understand why screwing up a DMA transfer results in the trashing
of inodes? Even better, how come this hasn't bitten many more people?
Surely there are lots of people out there with disks and CDs on the same
IDE cable..." The reply was: "That seems to be a seperate problem with
the block layer and locked buffers or pages (don't remember which). I
think a patch was submitted and integrated sometime in 2.4.19-pre.
Andrew Morton would know more.":
http://www.ussg.iu.edu/hypermail/linux/kernel/0205.0/0945.html

Perhaps this is the patch (2.4.19-pre6):
[PATCH] block/IDE/interrupt lockup fix
http://lkml.org/archive/2002/4/1/61/index.html

And new IDE code was merged into 2.4.19-pre3:
 -ac merge (including new IDE)                         (Alan Cox)

And there is an SMP fix in 2.4.19-pre4:
[PATCH] boot_cpu_data corruption on SMP x86

Anyway thanks for your responses. I'm glad the problem is not more
widespread unlike I had first suspected.

Regards,
Adam



-- 
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org



Reply to: