Bug#677655: 3.4-trunk-486: kernel oops: EIP is at __destroy_inode+0x56/0x8d
Martin-Éric Racine wrote:
> Here's the dmesg output at bootup, right after the first few oopses
> have started to appear.
Thanks, nice and quick.
Let's see:
[...]
> <6>[ 28.167997] EXT4-fs (sda1): re-mounted. Opts: (null)
> <6>[ 28.721401] EXT4-fs (sda1): re-mounted. Opts: (null)
> <1>[ 29.595342] BUG: unable to handle kernel paging request at ffffb4ff
> <1>[ 29.595373] IP: [<c10b698e>] __destroy_inode+0x56/0x8d
Bad pointer.
[...]
> <4>[ 29.595737] EIP is at __destroy_inode+0x56/0x8d
> <4>[ 29.595756] EAX: ffffb4ff EBX: f54f1d38 ECX: f6871ed8 EDX: ffffb4fe
> <4>[ 29.595777] ESI: f55475d8 EDI: f54f1d38 EBP: 00000000 ESP: f6871ef0
> <4>[ 29.595798] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> <4>[ 29.595818] CR0: 8005003b CR2: ffffb4ff CR3: 36877000 CR4: 00000090
> <4>[ 29.595839] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> <4>[ 29.595856] DR6: ffff0ff0 DR7: 00000400
> <0>[ 29.595876] Process mount (pid: 693, ti=f6870000 task=f37f1810 task.ti=f6870000)
Call chain: mount -> do_mount -> do_remount_sb -> shrink_dcache_db -> ...
[...]
> <0>[ 29.596036] Code: 85 c0 75 0f ba ee 00 00 00 b8 58 a4 33 c1 e8 87 6e f6 ff 8b 43 1c ff 88 c4 01 00 00 8b 43 10 8d 50 ff 83 fa fd 77 14 85 c0 74 10 <ff> 08 0f 94 c2 84 d2 74 07 31 d2 e8 ee be fa ff 8b 43 14 8d 50
Decoding, this is:
1c: 8b 43 10 mov 0x10(%ebx),%eax
1f: 8d 50 ff lea -0x1(%eax),%edx
22: 83 fa fd cmp $0xfffffffd,%edx
25: 77 14 ja 0x3b
27: 85 c0 test %eax,%eax
29: 74 10 je 0x3b
2b:* ff 08 decl (%eax) <-- trapping instruction
2d: 0f 94 c2 sete %dl
which corresponds to
77d: 8b 43 10 mov 0x10(%ebx),%eax
780: 8d 50 ff lea -0x1(%eax),%edx
783: 83 fa fd cmp $0xfffffffd,%edx
786: 77 05 ja 78d <__destroy_inode+0x57>
posix_acl_release(inode->i_acl);
788: e8 94 ff ff ff call 721 <posix_acl_release>
from fs/inode.c. (Your call to posix_acl_release is inlined while mine
is not because your kernel was built with an older GCC and I'm too
lazy to downgrade.) Here's posix_acl_release:
static inline void
posix_acl_release(struct posix_acl *acl)
{
if (acl && atomic_dec_and_test(&acl->a_refcount))
kfree_rcu(acl, a_rcu);
}
a_refcount is at offset 0 in struct posix_acl. The dec_and_test
fails because acl is a bad pointer.
So inode is incompletely initialized, I guess. Climbing the call
chain:
> <0>[ 29.595892] Stack:
> <4>[ 29.595903] f54f1d38 c10b6cdd f5538898 c10b5104 f5538898 f6871f20 f6871f20 c10b5149
> <4>[ 29.595946] f55388f8 f5817c00 f5817c80 c10b5362 f55b8a78 f55389f8 f5817c00 00000000
> <4>[ 29.595988] fffffff3 c10a9dd3 00000000 00000000 00000000 0000002e 00000027 f5812090
> <0>[ 29.596030] Call Trace:
> <4>[ 29.596036] [<c10b6cdd>] ? destroy_inode+0x1a/0x3e
> <4>[ 29.596036] [<c10b5104>] ? dentry_kill+0x7f/0x8c
> <4>[ 29.596036] [<c10b5149>] ? shrink_dentry_list+0x38/0x62
> <4>[ 29.596036] [<c10b5362>] ? shrink_dcache_sb+0x40/0x51
> <4>[ 29.596036] [<c10a9dd3>] ? do_remount_sb+0x5b/0x11c
> <4>[ 29.596036] [<c10b9acc>] ? do_mount+0x1de/0x5ca
> <4>[ 29.596036] [<c113b3f0>] ? _copy_from_user+0x28/0x47
> <4>[ 29.596036] [<c108b524>] ? memdup_user+0x26/0x43
> <4>[ 29.596036] [<c10b9f21>] ? sys_mount+0x67/0x96
> <4>[ 29.596036] [<c128e6ec>] ? syscall_call+0x7/0xb
> <0>[ 29.596036] EIP: [<c10b698e>] __destroy_inode+0x56/0x8d SS:ESP 0068:f6871ef0
Probably:
dentry_kill -> d_kill -> dentry_iput -> iput -> ...
Meaning dentry->d_inode has invalid ->i_acl. Walking further:
sys_mount -> do_mount -> do_remount -> do_remount_sb ->
-> shrink_dcache_sb -> shrink_dentry_list ->
-> try_prune_one_dentry -> dentry_kill
I got nothin'. Could you try 3.5-rc2 or newer so we can pester
upstream? Like this:
0. prerequisites:
apt-get install git build-essential
1. grab the kernel history if you don't already have it:
git clone \
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
2. checkout latest, configure, build:
cd linux
git fetch --all
git checkout origin/master
cp /boot/config-$(uname -r) .config; # current configuration
scripts/config --disable DEBUG_INFO
make localmodconfig; # optional: minimize configuration
make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>; # as root
reboot
... test test test ...
3. celebrate or complain
If it fails, please send a summary of symptoms to
linux-fsdevel@vger.kernel.org, cc-ing either me or this bug log so we
can track it. Be sure to mention:
- steps to reproduce, expected result, actual result, and how the
difference indicates a bug (should be simple enough)
- which kernel versions you've tested and result with each
- full "dmesg" output from a broken kernel, as an attachment
- any other weird observations
- a pointer to http://bugs.debian.org/677655 for the backstory
If it succeeds, I think we should just celebrate and leave it at that. :)
Or if someone gets a moment to update experimental to 3.5-rc2 or
newer, that would be useful for other reasons.
Thanks for your patience and thanks again for testing --- it's very
nice to see this kind of thing caught early.
Hope that helps,
Jonathan
Reply to: