[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#982459: mdadm examine corrupts host ext4



Control: retitle -1 mdadm --examine in chroot without /dev mounted corrupts host's filesystem
Control: found -1 5.10.127-2
Control: fixed -1 5.18.2-1~bpo11+1

On Tuesday, 2 August 2022 11:03:09 CET Chris Hofstaedtler wrote:
> Control: reassign -1 src:linux

On 10 Feb 2021 14:29:52 +0100 Patrick Cernko <pcernko@mpi-klsb.mpg.de> wrote:
> $MDADM --examine --scan --config=partitions
> 
> If I run this command in a chroot on a machine with md0 as host's root 
> filesystem WITHOUT mounting /proc, /sys and /dev in the chroot, mdadm 
> CORRUPTS the host's root filesystem (/dev/md0 with ext4 filesystem 
> format). I can reproduce this problem every time I do this. 
> 
> Kernel: Linux 5.4.78.1.amd64-smp (SMP w/4 CPU cores)
> Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_USER, 
> TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE

Patrick: AFAICT, that is not a Debian (provided) kernel.
Are or were you able to reproduce this issue with a Debian kernel?
If so, which (exact) version?

> * Håkan T Johansson <f96hajo@chalmers.se> [220801 19:31]:
> > On Sun, 31 Jul 2022, Chris Hofstaedtler wrote:
> > > I can't see a difference that should matter from userspace.
> > > 
> > > I have stared a bit at the kernel code... there have been quite some
> > > changes and fixes in this area. Which kernel version were you
> > > running when testing this?
> > > 
> > > Could you retry on something >= 5.9? I.e. some version with patch
> > > 08fc1ab6d748ab1a690fd483f41e2938984ce353.
> > 
> > I believe that I was running 5.10 (bullseye).

Håkan: IIUC, the bug occurs with the 5.10.127-2 kernel.
If you try it with the most recent 5.10 kernel, does the issue still occur?
If we have a 'good' and a 'bad' 5.10 kernel, that would make it easier to
narrow down in which commit it was fixed.

> > It looks like 5.18 (from backports) does not show the issue!  (i.e. works)
> > 
> > host:
> > linux-image-5.18.0-0.bpo.1-amd64      5.18.2-1~bpo11+1
> > 
> > [bug still occurs with]
> > host:
> >    linux-image-5.10.0-16-amd64           5.10.127-2

Updated the bug accordingly.

> > This time I did get some dmesg BUG output as well (attached).

For reference [dmesg 1]:
[mån aug  1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 0000000000000010
[mån aug  1 15:53:08 2022] #PF: supervisor read access in kernel mode
[mån aug  1 15:53:08 2022] #PF: error_code(0x0000) - not-present page
[mån aug  1 15:53:08 2022] PGD 0 P4D 0 
[mån aug  1 15:53:08 2022] Oops: 0000 [#1] SMP PTI
[mån aug  1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P           OE     5.10.0-16-amd64 #1 Debian 5.10.127-2
[mån aug  1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005
[mån aug  1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
[mån aug  1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
[mån aug  1 15:53:08 2022] RSP: 0018:ffffae27c059fd60 EFLAGS: 00010246
[mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: ffff9d1b94505480 RCX: ffff9d1bc52e5e38
[mån aug  1 15:53:08 2022] RDX: ffff9d1bc13782d8 RSI: 0000000000000c14 RDI: ffffffffc096feb0
[mån aug  1 15:53:08 2022] RBP: ffff9d1bc52e5e38 R08: ffff9d1be04d5230 R09: 0000000000000001
[mån aug  1 15:53:08 2022] R10: ffff9d1bc985f000 R11: 000000000000001d R12: ffff9d1bc13782d8
[mån aug  1 15:53:08 2022] R13: ffff9d1be04d5000 R14: 0000000000000c14 R15: ffff9d1bc13782d8
[mån aug  1 15:53:08 2022] FS:  00007fed5ecb1840(0000) GS:ffff9d1cd7c80000(0000) knlGS:0000000000000000
[mån aug  1 15:53:08 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mån aug  1 15:53:08 2022] CR2: 0000000000000010 CR3: 00000001a46d8000 CR4: 00000000000006e0
[mån aug  1 15:53:08 2022] Call Trace:
[mån aug  1 15:53:08 2022]  ext4_orphan_del+0x23f/0x290 [ext4]
[mån aug  1 15:53:08 2022]  ext4_evict_inode+0x31f/0x630 [ext4]
[mån aug  1 15:53:08 2022]  evict+0xd1/0x1a0
[mån aug  1 15:53:08 2022]  __dentry_kill+0xe4/0x180
[mån aug  1 15:53:08 2022]  dput+0x149/0x2f0
[mån aug  1 15:53:08 2022]  __fput+0xe4/0x240
[mån aug  1 15:53:08 2022]  task_work_run+0x65/0xa0
[mån aug  1 15:53:08 2022]  exit_to_user_mode_prepare+0x111/0x120
[mån aug  1 15:53:08 2022]  syscall_exit_to_user_mode+0x28/0x140
[mån aug  1 15:53:08 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[mån aug  1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77

> > I also noticed that the BUG: report in dmesg does not happen directly
> > when doing 'mdadm --examine --scan --config=partitions'.  It rather
> > occurs when some activity happens on the host filesystem, e.g.
> > a 'touch /root/a' command.
> > 
> > I have tried with both kernels several times, and it was repeatable that
> > 5.10 got stuck while 5.18 does not show issues.

Repeatable is good :-)
If you have a minimal set of steps to reproduce the issue, can you share that?

> If you have the time, maybe trying the various kernel versions
> between 5.10 and 5.18 would be a good start.

If it's already fixed in 5.10, that would dramatically cut down the search
space, so let's try that first.

> > Reminder: to get the issue, /dev/ should not be mounted in the chroot.
> > With /dev/ mounted, 5.10 also works.

I've retitled the bug to reflect that (also given Patrick's later replies).

> I'll see if I can repro this on 5.10, but need to find a box first.

Chris: found a box in the meantime?

Cheers,
  Diederik

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: