[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#677655: 3.4-trunk-486: kernel oops: EIP is at __destroy_inode+0x56/0x8d



Martin-Éric Racine wrote:

> Here's the dmesg output at bootup, right after the first few oopses
> have started to appear.

Thanks, nice and quick.

Let's see:

[...]
> <6>[   28.167997] EXT4-fs (sda1): re-mounted. Opts: (null)
> <6>[   28.721401] EXT4-fs (sda1): re-mounted. Opts: (null)
> <1>[   29.595342] BUG: unable to handle kernel paging request at ffffb4ff
> <1>[   29.595373] IP: [<c10b698e>] __destroy_inode+0x56/0x8d

Bad pointer.

[...]
> <4>[   29.595737] EIP is at __destroy_inode+0x56/0x8d
> <4>[   29.595756] EAX: ffffb4ff EBX: f54f1d38 ECX: f6871ed8 EDX: ffffb4fe
> <4>[   29.595777] ESI: f55475d8 EDI: f54f1d38 EBP: 00000000 ESP: f6871ef0
> <4>[   29.595798]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> <4>[   29.595818] CR0: 8005003b CR2: ffffb4ff CR3: 36877000 CR4: 00000090
> <4>[   29.595839] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> <4>[   29.595856] DR6: ffff0ff0 DR7: 00000400
> <0>[   29.595876] Process mount (pid: 693, ti=f6870000 task=f37f1810 task.ti=f6870000)

Call chain: mount -> do_mount -> do_remount_sb -> shrink_dcache_db -> ...

[...]
> <0>[   29.596036] Code: 85 c0 75 0f ba ee 00 00 00 b8 58 a4 33 c1 e8 87 6e f6 ff 8b 43 1c ff 88 c4 01 00 00 8b 43 10 8d 50 ff 83 fa fd 77 14 85 c0 74 10 <ff> 08 0f 94 c2 84 d2 74 07 31 d2 e8 ee be fa ff 8b 43 14 8d 50 

Decoding, this is:

  1c:	8b 43 10             	mov    0x10(%ebx),%eax
  1f:	8d 50 ff             	lea    -0x1(%eax),%edx
  22:	83 fa fd             	cmp    $0xfffffffd,%edx
  25:	77 14                	ja     0x3b
  27:	85 c0                	test   %eax,%eax
  29:	74 10                	je     0x3b
  2b:*	ff 08                	decl   (%eax)     <-- trapping instruction
  2d:	0f 94 c2             	sete   %dl

which corresponds to

     77d:	8b 43 10             	mov    0x10(%ebx),%eax
     780:	8d 50 ff             	lea    -0x1(%eax),%edx
     783:	83 fa fd             	cmp    $0xfffffffd,%edx
     786:	77 05                	ja     78d <__destroy_inode+0x57>
		posix_acl_release(inode->i_acl);
     788:	e8 94 ff ff ff       	call   721 <posix_acl_release>

from fs/inode.c.  (Your call to posix_acl_release is inlined while mine
is not because your kernel was built with an older GCC and I'm too
lazy to downgrade.)  Here's posix_acl_release:

	static inline void
	posix_acl_release(struct posix_acl *acl)
	{
		if (acl && atomic_dec_and_test(&acl->a_refcount))
			kfree_rcu(acl, a_rcu);
	}

a_refcount is at offset 0 in struct posix_acl.  The dec_and_test
fails because acl is a bad pointer.

So inode is incompletely initialized, I guess.  Climbing the call
chain:

> <0>[   29.595892] Stack:
> <4>[   29.595903]  f54f1d38 c10b6cdd f5538898 c10b5104 f5538898 f6871f20 f6871f20 c10b5149
> <4>[   29.595946]  f55388f8 f5817c00 f5817c80 c10b5362 f55b8a78 f55389f8 f5817c00 00000000
> <4>[   29.595988]  fffffff3 c10a9dd3 00000000 00000000 00000000 0000002e 00000027 f5812090
> <0>[   29.596030] Call Trace:
> <4>[   29.596036]  [<c10b6cdd>] ? destroy_inode+0x1a/0x3e
> <4>[   29.596036]  [<c10b5104>] ? dentry_kill+0x7f/0x8c
> <4>[   29.596036]  [<c10b5149>] ? shrink_dentry_list+0x38/0x62
> <4>[   29.596036]  [<c10b5362>] ? shrink_dcache_sb+0x40/0x51
> <4>[   29.596036]  [<c10a9dd3>] ? do_remount_sb+0x5b/0x11c
> <4>[   29.596036]  [<c10b9acc>] ? do_mount+0x1de/0x5ca
> <4>[   29.596036]  [<c113b3f0>] ? _copy_from_user+0x28/0x47
> <4>[   29.596036]  [<c108b524>] ? memdup_user+0x26/0x43
> <4>[   29.596036]  [<c10b9f21>] ? sys_mount+0x67/0x96
> <4>[   29.596036]  [<c128e6ec>] ? syscall_call+0x7/0xb
> <0>[   29.596036] EIP: [<c10b698e>] __destroy_inode+0x56/0x8d SS:ESP 0068:f6871ef0

Probably:

	dentry_kill -> d_kill -> dentry_iput -> iput -> ...

Meaning dentry->d_inode has invalid ->i_acl.  Walking further:

	sys_mount -> do_mount -> do_remount -> do_remount_sb ->
		-> shrink_dcache_sb -> shrink_dentry_list ->
		-> try_prune_one_dentry -> dentry_kill

I got nothin'.  Could you try 3.5-rc2 or newer so we can pester
upstream?  Like this:

 0. prerequisites:

	apt-get install git build-essential

 1. grab the kernel history if you don't already have it:

	git clone \
	  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

 2. checkout latest, configure, build:

	cd linux
	git fetch --all
	git checkout origin/master
	cp /boot/config-$(uname -r) .config; # current configuration
	scripts/config --disable DEBUG_INFO
	make localmodconfig; # optional: minimize configuration
	make deb-pkg; # optionally with -j<num> for parallel build
	dpkg -i ../<name of package>; # as root
	reboot
	... test test test ...

 3. celebrate or complain

If it fails, please send a summary of symptoms to
linux-fsdevel@vger.kernel.org, cc-ing either me or this bug log so we
can track it.  Be sure to mention:

 - steps to reproduce, expected result, actual result, and how the
   difference indicates a bug (should be simple enough)
 - which kernel versions you've tested and result with each
 - full "dmesg" output from a broken kernel, as an attachment
 - any other weird observations
 - a pointer to http://bugs.debian.org/677655 for the backstory

If it succeeds, I think we should just celebrate and leave it at that. :)
Or if someone gets a moment to update experimental to 3.5-rc2 or
newer, that would be useful for other reasons.

Thanks for your patience and thanks again for testing --- it's very
nice to see this kind of thing caught early.

Hope that helps,
Jonathan



Reply to: