[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Severe EXT3 bug(s) in vanilla kernel 2.6.18?



On Fri, Dec 15, 2006 at 01:32:51PM +0100, Bas van Schaik wrote:
> Hi all,
> 
> After browsing through the debian-kernel mailinglist archive, I found
> out that there's no one reporting the latest EXT3 problems in the
> vanilla kernel. The last report of EXT3-problems on the debian-kernel
> list had to do with JBD, the current problems (as posted on the Linux
> Kernel mailinglist) are much worse, I think.
> You might want to check those URLS/subjects of discussion on LKML:
> 
>   "2.6.18-mm2: ext3 BUG?"
>   http://lkml.org/lkml/2006/10/5/353
> Seems unresolved

fixed in some 2.6.18.X
and only affects 1k bs
 
> 
>   "2.6.19 file content corruption on ext3"
>   http://lkml.org/lkml/2006/12/7/163
> Has to do with 2.6.19, but might have it's roots in 2.6.18

new 2.6.19 code
 
 
>   "Debugging I/O errors?"
>   http://lkml.org/lkml/2006/10/20/93
> Source unknown, but more people seem to have the same problem.
> 
> 
> These issues got my attention, because I'm having those (or similar)
> problems myself, on two different machines (clusters, actually) with
> completely different hardware and disks. I'll explain.
> 
> I'm maintaining two clusters, with machines running a mix between Debian
> Stable with Etch-kernels to have AoE (ATA over Ethernet support).
> Machines in these clusters "export" their harddisks using AoE (check out
> the "vblade" package), and one machine imports those using the kernel
> "aoe"-module. On top of those imported devices, multiple RAID5-arrays
> are created, and LVM is running on top of RAID, ext3 on the LVM LV.
> 
> After a few days, I get EXT3-errors. like this:
> > EXT3-fs: mounted filesystem with ordered data mode.
> > EXT3-fs error (device loop0): ext3_free_blocks_sb: bit already cleared for block 412186
> > Aborting journal on device loop0.
> > EXT3-fs error (device loop0) in ext3_free_blocks_sb: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_truncate: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_orphan_del: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_reserve_inode_write: Journal has aborted
> > EXT3-fs error (device loop0) in ext3_delete_inode: Journal has aborted
> > __journal_remove_journal_head: freeing b_committed_data
> > __journal_remove_journal_head: freeing b_committed_data
> (...)
> > __journal_remove_journal_head: freeing b_committed_data
> > ext3_abort called.
> > EXT3-fs error (device loop0): ext3_journal_start_sb: Detected aborted journal
> > Remounting filesystem read-only
> > __journal_remove_journal_head: freeing b_committed_data
> 
> FSCK'ing the filesystem fixes those errors, but after a few days (or
> weeks, depending on the fs load) the corruptions appear again. I might
> be worth telling you that there are no other suspicious messages in my logs.

inform ext3-devel: linux-ext4@vger.kernel.org
 
> This seems to be related to the problem described here:
>   http://myrddin.org/2006/02/14/ext3-nastiness/
> 
> and here:
>   http://www.debian-administration.org/users/Utumno/weblog/16
> 
> 
> I don't know if I need to file a bug on this, for now I just want to
> here your thoughts. FYI:
> 
> Kernel information for cluster 1:
> > root@infinity:~# uname -a
> > Linux infinity 2.6.17-2-686 #1 SMP Wed Sep 13 16:34:10 UTC 2006 i686 GNU/Linux
> 
> And cluster 2:
> > dust:~# uname -a
> > Linux dust 2.6.18-3-686 #1 SMP Thu Nov 23 20:49:23 UTC 2006 i686 GNU/Linux
> 
> Thanks for your replies!
> 
> Best regards,
> 
>   -- Bas van Schaik

best regards

--
maks



Reply to: