[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: Latest ext2fs Failure



> 
> I take it from your account that the heavy disk activity was 
> on hd0s3, but you saw corruption on hd0s2.  Is that right?  

Yes.

> Did you see any signs of corruption on hd0s3 as well?  

No.  I actually re-installed the damaged partition (sorry)
so that I could continue working on gdb, and was able to
work on hd0s3 without incident.  There do not seem to be
any errors on that partition.

> 
> > I booted into Linux, and mounted my hurd partition to have a look.
> 
> Make sure you mount read-only, and that e2fsck doesn't run 
> automatically at boot.  Also, what Linux kernel version are you using?  
> 

The Linux boot does not know anything about my Hurd partitions.  I
have to manually mount and scan my Hurd partitions.  The Linux partition
is an old version (Debian Slink), so it is a 2.0.34 kernel and the
ext2fs and e2fsprogs are around 1.12 I believe.

> > ext2-fs error (device 03:02):  ext2_readdir: bad entry in directory
> > #122881: rec_len is too small for name_len -- offset = 140, 
> inode = 122887,
> > rec_len = 20, name_len = 28169.
> > 
> > ext2-fs error (device 03:02):  ext2_readdir: bad entry in directory
> > #122881: rec_len is too small for name_len -- offset = 140, inode =
> > 1970238055, rec_len = 11632, name_len = 13106.
> 
> ?? What's this about?  It's reporting two different sets of 
> bogus contents at the same spot in the same directory (140 bytes into in 
> #122881).  I'm pretty confused about how this could be, unless Linux
rewrote its own
> different bogons into the directory between the first and 
> second messages.

Warning:  I transcribed this information.  I was very careful about offset,
inode, etc., but it is *possible* that I accidentally copied the directory#
twice.

> However, I am somewhat bewildered in this case, because it appears to me 
> that e2fsck's Pass 2 checks should catch these very problems (I am looking

> at e2fsprogs-1.15): specifically, it checks for offset+rec_len >
blocksize, 
> and 140+11632 is more than 1024 last I checked (but that rec_len is from 
> the curious second error message above, and the first error message for
the 
> same inode shows a rec_len/name_len that would not trip e2fsck).  (It is
also 
> the case that e2fsck doesn't really check this as thoroughly as it could.

Also note that my e2fsprogs is probably 1.12, in line with Debian Slink.

> > I also fired up debugfs and looked at the various directories.  The
> > strange thing is the debugfs could read the directory entries just
> > fine.
> 
> debugfs can be more useful in trying to figure it out.  
> (Also, I don't know what your cpu/connectivity situation is or how big
your 
> partition is, but if you can dd the whole partition off, bzip2, and put
that on 
> the net for me to fetch, then I can take a look at your damaged 
> filesystem directly.)

Er.  I foolishly destroyed the evidence so I can't upload it.  At any rate,
my connectivity is only over a 32.2 modem, and my partition is ~ 1 Gig 
(but mostly unused).  This might be more than I can do.

> It would be helpful to show me what debugfs's `ls' shows for the
> directories in question (i.e. "ls <122881>", "ls <192513>"); 
> it shows the rec_len values.  You can use debugfs's `dump' to fetch the 
> raw contents of the corrupted directories into a file, and send me that;
also 
> use it to fetch the raw contents of the referenced inodes and (122887, 
> 192538) and look at them or use `file' or whatever to figure out what 
> they are and if they are intact.
> 

I will attempt to recreate the problem by performing the same builds as
before.  This may or may not cause this to happen again.  When it does I
will follow all of the above steps (except perhaps dd'ing the partition and
uploading it).

Sorry,

-Brent


Reply to: