Strange events... (after a week-end of attempts)
After your kind suggestions...
these are the actions that I performed to solve the "random single
character file corruption" I have been experiencing for about three
months in my Ultra 5 running Debian stable.
1.David S. Miller invited me to migrate from 2.4.19 to 2.4.22 in order
to solve some ext3 bugs -> done without any effects. Then I also changed
back ext3 to ext2. David asked me which "IDE" controller I have so where
have I to check? This below is a snapshot from my boot log regarding the
disk:
.....
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
CMD646: IDE controller at PCI slot 01:03.0
CMD646: chipset revision 3
CMD646: chipset revision 0x03, MultiWord DMA Force Limited
CMD646: 100% native mode on irq 4,7e0
ide0: BM-DMA at 0x1fe02c00020-0x1fe02c00027, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0x1fe02c00028-0x1fe02c0002f, BIOS settings: hdc:pio, hdd:pio
hda: ST38420A, ATA DISK drive
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
hdc: CRD-8322B, ATAPI CD/DVD-ROM drive
ide0 at 0x1fe02c00000-0x1fe02c00007,0x1fe02c0000a on irq 4,7e0
ide1 at 0x1fe02c00010-0x1fe02c00017,0x1fe02c0001a on irq 4,7e0 (shared
with ide0)
hda: attached ide-disk driver.
hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hda: task_no_data_intr: error=0x04 { DriveStatusError }
hda: 16841664 sectors (8623 MB) w/512KiB Cache, CHS=16708/16/63
.....
About the two error messages you can see, I found that Alan Cox said:
"These are ok - its trying to set options the drive doesn't
support and we dont yet do that quietly."
2.Ben Collins: a complete fsck -> done -> disk clean. Then he says to disable
DMA -> done in the kernel and checked with hdparm. Problem remains.
3.Frank Van Damme with badblocks -> done -> no bad blocks on the disk.
4.For Andreas Pommer and Frank Gevaerts it could be a RAM
problem -> I replaced the two 64Mb wafers with other two -> problems remains.
By the way memtest didn't find anything and so PROM method does.
5.In the past I had an hardware/software problem with a kind of Ethernet card so now I
change the PCI slot in which the Ethernet card (for second network
connection, in fact the machine works as a firewall) is inserted ->
problem remains. Next days I will substitute the card with few hopes...
Some details:
-typically if I use ftp and get and put back a large file (such
as kernel bzip2) the files differ. Moreover, if I get it and
then tar -xvjf, the process aborts due to a corruption.
If there is no bzip2 corruption and I can compile the kernel,
then very likely some files randomly contain a wrong character.
Sometimes I have a good bzip2 file, then I unpack and use it,
and it's OK. The same file used a second time gives file
corruption!
-Sometimes I experience also a command/package corruption, e.g.
yesterday evenenig I had to purge and re-install the "less"
package because /usr/bin/pager or some files connected to it was
corrupted;
-the machine acts as a firewall and the kernel is patched with
ppp extension for mppe because I have to connect to a Microsoft
VPN server; I don't know if this is meaningful.
I don't want to give up!
Thank you all.
Roberto Giorgetti
Milan - Italy
Reply to: