RE: Latest ext2fs Failure
>
> I take it from your account that the heavy disk activity was
> on hd0s3, but you saw corruption on hd0s2. Is that right?
Yes.
> Did you see any signs of corruption on hd0s3 as well?
No. I actually re-installed the damaged partition (sorry)
so that I could continue working on gdb, and was able to
work on hd0s3 without incident. There do not seem to be
any errors on that partition.
>
> > I booted into Linux, and mounted my hurd partition to have a look.
>
> Make sure you mount read-only, and that e2fsck doesn't run
> automatically at boot. Also, what Linux kernel version are you using?
>
The Linux boot does not know anything about my Hurd partitions. I
have to manually mount and scan my Hurd partitions. The Linux partition
is an old version (Debian Slink), so it is a 2.0.34 kernel and the
ext2fs and e2fsprogs are around 1.12 I believe.
> > ext2-fs error (device 03:02): ext2_readdir: bad entry in directory
> > #122881: rec_len is too small for name_len -- offset = 140,
> inode = 122887,
> > rec_len = 20, name_len = 28169.
> >
> > ext2-fs error (device 03:02): ext2_readdir: bad entry in directory
> > #122881: rec_len is too small for name_len -- offset = 140, inode =
> > 1970238055, rec_len = 11632, name_len = 13106.
>
> ?? What's this about? It's reporting two different sets of
> bogus contents at the same spot in the same directory (140 bytes into in
> #122881). I'm pretty confused about how this could be, unless Linux
rewrote its own
> different bogons into the directory between the first and
> second messages.
Warning: I transcribed this information. I was very careful about offset,
inode, etc., but it is *possible* that I accidentally copied the directory#
twice.
> However, I am somewhat bewildered in this case, because it appears to me
> that e2fsck's Pass 2 checks should catch these very problems (I am looking
> at e2fsprogs-1.15): specifically, it checks for offset+rec_len >
blocksize,
> and 140+11632 is more than 1024 last I checked (but that rec_len is from
> the curious second error message above, and the first error message for
the
> same inode shows a rec_len/name_len that would not trip e2fsck). (It is
also
> the case that e2fsck doesn't really check this as thoroughly as it could.
Also note that my e2fsprogs is probably 1.12, in line with Debian Slink.
> > I also fired up debugfs and looked at the various directories. The
> > strange thing is the debugfs could read the directory entries just
> > fine.
>
> debugfs can be more useful in trying to figure it out.
> (Also, I don't know what your cpu/connectivity situation is or how big
your
> partition is, but if you can dd the whole partition off, bzip2, and put
that on
> the net for me to fetch, then I can take a look at your damaged
> filesystem directly.)
Er. I foolishly destroyed the evidence so I can't upload it. At any rate,
my connectivity is only over a 32.2 modem, and my partition is ~ 1 Gig
(but mostly unused). This might be more than I can do.
> It would be helpful to show me what debugfs's `ls' shows for the
> directories in question (i.e. "ls <122881>", "ls <192513>");
> it shows the rec_len values. You can use debugfs's `dump' to fetch the
> raw contents of the corrupted directories into a file, and send me that;
also
> use it to fetch the raw contents of the referenced inodes and (122887,
> 192538) and look at them or use `file' or whatever to figure out what
> they are and if they are intact.
>
I will attempt to recreate the problem by performing the same builds as
before. This may or may not cause this to happen again. When it does I
will follow all of the above steps (except perhaps dd'ing the partition and
uploading it).
Sorry,
-Brent
Reply to: