Re: Severe EXT3 bug(s) in vanilla kernel 2.6.18?
On Fri, Dec 15, 2006 at 01:32:51PM +0100, Bas van Schaik wrote:
> Hi all,
>
> After browsing through the debian-kernel mailinglist archive, I found
> out that there's no one reporting the latest EXT3 problems in the
> vanilla kernel. The last report of EXT3-problems on the debian-kernel
> list had to do with JBD, the current problems (as posted on the Linux
> Kernel mailinglist) are much worse, I think.
> You might want to check those URLS/subjects of discussion on LKML:
>
> "2.6.18-mm2: ext3 BUG?"
> http://lkml.org/lkml/2006/10/5/353
> Seems unresolved
fixed in some 2.6.18.X
and only affects 1k bs
>
> "2.6.19 file content corruption on ext3"
> http://lkml.org/lkml/2006/12/7/163
> Has to do with 2.6.19, but might have it's roots in 2.6.18
new 2.6.19 code
> "Debugging I/O errors?"
> http://lkml.org/lkml/2006/10/20/93
> Source unknown, but more people seem to have the same problem.
>
>
> These issues got my attention, because I'm having those (or similar)
> problems myself, on two different machines (clusters, actually) with
> completely different hardware and disks. I'll explain.
>
> I'm maintaining two clusters, with machines running a mix between Debian
> Stable with Etch-kernels to have AoE (ATA over Ethernet support).
> Machines in these clusters "export" their harddisks using AoE (check out
> the "vblade" package), and one machine imports those using the kernel
> "aoe"-module. On top of those imported devices, multiple RAID5-arrays
> are created, and LVM is running on top of RAID, ext3 on the LVM LV.
>
> After a few days, I get EXT3-errors. like this:
> > EXT3-fs: mounted filesystem with ordered data mode.
> > EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared for block 412186
> > Aborting journal on device loop0.
> > EXT3-fs error (device loop0) in ext3_free_blocks_sb: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_truncate: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_orphan_del: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_delete_inode: Journal has aborted
> > __journal_remove_journal_head: freeing b_committed_data
> > __journal_remove_journal_head: freeing b_committed_data
> (...)
> > __journal_remove_journal_head: freeing b_committed_data
> > ext3_abort called.
> > EXT3-fs error (device loop0): ext3_journal_start_sb: Detected aborted journal
> > Remounting filesystem read-only
> > __journal_remove_journal_head: freeing b_committed_data
>
> FSCK'ing the filesystem fixes those errors, but after a few days (or
> weeks, depending on the fs load) the corruptions appear again. I might
> be worth telling you that there are no other suspicious messages in my logs.
inform ext3-devel: linux-ext4@vger.kernel.org
> This seems to be related to the problem described here:
> http://myrddin.org/2006/02/14/ext3-nastiness/
>
> and here:
> http://www.debian-administration.org/users/Utumno/weblog/16
>
>
> I don't know if I need to file a bug on this, for now I just want to
> here your thoughts. FYI:
>
> Kernel information for cluster 1:
> > root@infinity:~# uname -a
> > Linux infinity 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux
>
> And cluster 2:
> > dust:~# uname -a
> > Linux dust 2.6.18-3-686 #1 SMP Thu Nov 23 20:49:23 UTC 2006 i686 GNU/Linux
>
> Thanks for your replies!
>
> Best regards,
>
> -- Bas van Schaik
best regards
--
maks
Reply to: