[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Reproducible filesystem corruption in lenny



We use Debian for some embedded devices that use off-the-shelf flash
drives for their primary storage.  Since upgrading from etch to lenny
and tweaking our partition layout, we've started seeing filesystem
corruption occur very rapidly after we clone the filesystem (via
partimage and resize2fs).  While investigating, I've been able to
reproduce the corruption with both etch's and lenny's partimage, with
both etch's and lenny's e2fsprogs, with both the realtime-patched
kernel we used under etch and lenny's stock amd64 kernel, with flash
drives of different sizes, with different flash drive partition
layouts, and with one of our embedded devices, an off-the-shelf lenny
server, and an off-the-shelf etch server.  This doesn't make any sense
to me.

While trying to figure all of this out, I've found that I can
reproduce filesystem corruption 100% of the time simply by executing
these commands:

mke2fs -O has_journal,resize_inode,dir_index,filetype,sparse_super,large_file
/dev/sdb2
tune2fs -c 29 /dev/sdb2   # /dev/sdb is an external flash drive
mount /dev/sdb2 /mnt/image
cd /mnt/image
tar xf ~/data.tar        # data.tar is a 71MB archive of the /var partition
cd
umount /mnt/image
e2fsck -f /dev/sdb2

At this point, e2fsck starts complaining with errors like this:
Symlink /lib/python-support/python2.5/_dbus_glib_bindings.so (inode
#113416) is invalid.
Clear<y>?

Turning off has_journal or adding -o data=journal fixes the
immediately preceding problem.  (I haven't tested it for our cloning
procedure.)  However, I don't want to go back to ext2, and
data=journal seems to be barely documented.  (What exactly does it
do?)

We've seen other errors after cloning (subdirectories that point to
their parents, "resize inode not valid", etc.), but these particular
errors are completely reproducible.  The corruption occurs on more
than one flash drive.  badblocks -w /dev/sdb reports no errors
(although I seem to remember one of disks being bigger running
badblocks - do flash drives remap bad sectors?).

I can't imagine that Linux or Debian would be released with this sort
of potentially severe reproducible bug but am at a loss to figure out
what I might be doing wrong or what's specific to my setup.  And I
can't figure out why we're only seeing it since upgrading to lenny
when I can currently reproduce the problem under etch.

Any help would be greatly appreciated.  Thanks.

Josh Kelley


Reply to: