[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#982459: mdadm examine corrupts host ext4



Control: reassign -1 src:linux

Dear Håkan,

thanks for reporting back and testing!

* Håkan T Johansson <f96hajo@chalmers.se> [220801 19:31]:
> On Sun, 31 Jul 2022, Chris Hofstaedtler wrote:
> 
> > I can't see a difference that should matter from userspace.
> > 
> > I have stared a bit at the kernel code... there have been quite some
> > changes and fixes in this area. Which kernel version were you
> > running when testing this?
> > 
> > Could you retry on something >= 5.9? I.e. some version with patch
> >    08fc1ab6d748ab1a690fd483f41e2938984ce353.
> 
> I believe that I was running 5.10 (bullseye).
> 
> It looks like 5.18 (from backports) does not show the issue!  (i.e. works)

Okay, I think we are now clearly in "this is not an mdadm bug per
se" territory (-> reassigning to src:linux).

[..]
>   This time I did get some dmesg BUG output as well (attached).
>   It does not seem to be the same backtrace on two occurances.
> 
>   I also noticed that the BUG: report in dmesg does not happen directly
>   when doing 'mdadm --examine --scan --config=partitions'.  It rather
>   occurs when some activity happens on the host filesystem, e.g.
>   a 'touch /root/a' command.
> 
> host:
>   linux-image-5.18.0-0.bpo.1-amd64      5.18.2-1~bpo11+1
> 
>   (did not re-install anything else, except upgraded zfs, also from
>   backports (since pure bullseye would not compile with 5.18))
> 
>   Does not exhibit the problem.
> 
> I have tried with both kernels several times, and it was repeatable that
> 5.10 got stuck while 5.18 does not show issues.

Its good that this now works in 5.18. However I'm not sure how we
should find the commit fixing this - in 5.14 lots of block layer
code was shuffled around/refactored.

If you have the time, maybe trying the various kernel versions
between 5.10 and 5.18 would be a good start. If they are not in
backports anymore, they should still be at
  http://snapshot.debian.org/package/linux/

> Reminder: to get the issue, /dev/ should not be mounted in the chroot.
> With /dev/ mounted, 5.10 also works.

I'll see if I can repro this on 5.10, but need to find a box first.

Best,
Chris

> [mån aug  1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [mån aug  1 15:53:08 2022] #PF: supervisor read access in kernel mode
> [mån aug  1 15:53:08 2022] #PF: error_code(0x0000) - not-present page
> [mån aug  1 15:53:08 2022] PGD 0 P4D 0 
> [mån aug  1 15:53:08 2022] Oops: 0000 [#1] SMP PTI
> [mån aug  1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P           OE     5.10.0-16-amd64 #1 Debian 5.10.127-2
> [mån aug  1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005
> [mån aug  1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
> [mån aug  1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
> [mån aug  1 15:53:08 2022] RSP: 0018:ffffae27c059fd60 EFLAGS: 00010246
> [mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: ffff9d1b94505480 RCX: ffff9d1bc52e5e38
> [mån aug  1 15:53:08 2022] RDX: ffff9d1bc13782d8 RSI: 0000000000000c14 RDI: ffffffffc096feb0
> [mån aug  1 15:53:08 2022] RBP: ffff9d1bc52e5e38 R08: ffff9d1be04d5230 R09: 0000000000000001
> [mån aug  1 15:53:08 2022] R10: ffff9d1bc985f000 R11: 000000000000001d R12: ffff9d1bc13782d8
> [mån aug  1 15:53:08 2022] R13: ffff9d1be04d5000 R14: 0000000000000c14 R15: ffff9d1bc13782d8
> [mån aug  1 15:53:08 2022] FS:  00007fed5ecb1840(0000) GS:ffff9d1cd7c80000(0000) knlGS:0000000000000000
> [mån aug  1 15:53:08 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [mån aug  1 15:53:08 2022] CR2: 0000000000000010 CR3: 00000001a46d8000 CR4: 00000000000006e0
> [mån aug  1 15:53:08 2022] Call Trace:
> [mån aug  1 15:53:08 2022]  ext4_orphan_del+0x23f/0x290 [ext4]
> [mån aug  1 15:53:08 2022]  ext4_evict_inode+0x31f/0x630 [ext4]
> [mån aug  1 15:53:08 2022]  evict+0xd1/0x1a0
> [mån aug  1 15:53:08 2022]  __dentry_kill+0xe4/0x180
> [mån aug  1 15:53:08 2022]  dput+0x149/0x2f0
> [mån aug  1 15:53:08 2022]  __fput+0xe4/0x240
> [mån aug  1 15:53:08 2022]  task_work_run+0x65/0xa0
> [mån aug  1 15:53:08 2022]  exit_to_user_mode_prepare+0x111/0x120
> [mån aug  1 15:53:08 2022]  syscall_exit_to_user_mode+0x28/0x140
> [mån aug  1 15:53:08 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [mån aug  1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77
> [mån aug  1 15:53:08 2022] Code: 44 00 00 48 8b 15 19 a1 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 e9 a0 0c 00 f7 d8 64 89 02 b8
> [mån aug  1 15:53:08 2022] RSP: 002b:00007ffd50452818 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
> [mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: 000055dab4578910 RCX: 00007fed5eea2d77
> [mån aug  1 15:53:08 2022] RDX: 00007fed5ef6e8a0 RSI: 0000000000000000 RDI: 0000000000000006
> [mån aug  1 15:53:08 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fed5ef6dbe0
> [mån aug  1 15:53:08 2022] R10: 000000000000006f R11: 0000000000000202 R12: 00007fed5ef6f4a0
> [mån aug  1 15:53:08 2022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
> [mån aug  1 15:53:08 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables loop dcdbas radeon zfs(POE) zunicode(POE) zzstd(OE) ttm zlua(OE) zavl(POE) icp(POE) drm_kms_helper iTCO_wdt intel_pmc_bxt cec iTCO_vendor_support zcommon(POE) watchdog znvpair(POE) intel_powerclamp ipmi_si drm pcspkr spl(OE) ipmi_devintf serio_raw ipmi_msghandler rng_core i2c_algo_bit sg evdev e752x_edac button overlay ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear raid1 sd_mod sr_mod cdrom ata_generic md_mod mptspi mptscsih ata_piix libata mptbase scsi_transport_spi nvme ehci_pci uhci_hcd nvme_core ehci_hcd t10_pi scsi_mod lpc_ich crc_t10dif crct10dif_generic psmouse usbcore e1000 crct10dif_common
> [mån aug  1 15:53:08 2022]  usb_common video
> [mån aug  1 15:53:08 2022] CR2: 0000000000000010
> [mån aug  1 15:53:08 2022] ---[ end trace 4fd9ed73d190bc2a ]---
> [mån aug  1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
> [mån aug  1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
> [mån aug  1 15:53:08 2022] RSP: 0018:ffffae27c059fd60 EFLAGS: 00010246
> [mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: ffff9d1b94505480 RCX: ffff9d1bc52e5e38
> [mån aug  1 15:53:08 2022] RDX: ffff9d1bc13782d8 RSI: 0000000000000c14 RDI: ffffffffc096feb0
> [mån aug  1 15:53:08 2022] RBP: ffff9d1bc52e5e38 R08: ffff9d1be04d5230 R09: 0000000000000001
> [mån aug  1 15:53:08 2022] R10: ffff9d1bc985f000 R11: 000000000000001d R12: ffff9d1bc13782d8
> [mån aug  1 15:53:08 2022] R13: ffff9d1be04d5000 R14: 0000000000000c14 R15: ffff9d1bc13782d8
> [mån aug  1 15:53:08 2022] FS:  00007fed5ecb1840(0000) GS:ffff9d1cd7c80000(0000) knlGS:0000000000000000
> [mån aug  1 15:53:08 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [mån aug  1 15:53:08 2022] CR2: 0000000000000010 CR3: 00000001a46d8000 CR4: 00000000000006e0

> [mån aug  1 18:57:57 2022] BUG: kernel NULL pointer dereference, address: 0000000000000010
> [mån aug  1 18:57:57 2022] #PF: supervisor read access in kernel mode
> [mån aug  1 18:57:57 2022] #PF: error_code(0x0000) - not-present page
> [mån aug  1 18:57:57 2022] PGD 0 P4D 0 
> [mån aug  1 18:57:57 2022] Oops: 0000 [#1] SMP PTI
> [mån aug  1 18:57:57 2022] CPU: 2 PID: 4427 Comm: touch Tainted: P           OE     5.10.0-16-amd64 #1 Debian 5.10.127-2
> [mån aug  1 18:57:57 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005
> [mån aug  1 18:57:57 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
> [mån aug  1 18:57:57 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab 57 e9 e5 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
> [mån aug  1 18:57:57 2022] RSP: 0018:ffffc2b08062fb78 EFLAGS: 00010246
> [mån aug  1 18:57:57 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9daed0440068
> [mån aug  1 18:57:57 2022] RDX: ffff9daec0fb53b8 RSI: 0000000000000469 RDI: ffffffffc0896c80
> [mån aug  1 18:57:57 2022] RBP: ffff9daed0440068 R08: ffff9daed07f7138 R09: 0000000000000000
> [mån aug  1 18:57:57 2022] R10: ffff9daec4c2ef08 R11: 0000000000000000 R12: ffff9daec0fb53b8
> [mån aug  1 18:57:57 2022] R13: ffff9daee013d800 R14: 0000000000000469 R15: ffff9daee013d800
> [mån aug  1 18:57:57 2022] FS:  00007febc0a915c0(0000) GS:ffff9dafd7c80000(0000) knlGS:0000000000000000
> [mån aug  1 18:57:57 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [mån aug  1 18:57:57 2022] CR2: 0000000000000010 CR3: 0000000106616000 CR4: 00000000000006e0
> [mån aug  1 18:57:57 2022] Call Trace:
> [mån aug  1 18:57:57 2022]  ? __ext4_handle_dirty_metadata+0x51/0x1a0 [ext4]
> [mån aug  1 18:57:57 2022]  __ext4_new_inode+0x925/0x1690 [ext4]
> [mån aug  1 18:57:57 2022]  ext4_create+0x106/0x1b0 [ext4]
> [mån aug  1 18:57:57 2022]  path_openat+0xde1/0x1080
> [mån aug  1 18:57:57 2022]  do_filp_open+0x88/0x130
> [mån aug  1 18:57:57 2022]  ? getname_flags.part.0+0x29/0x1a0
> [mån aug  1 18:57:57 2022]  ? __check_object_size+0x136/0x150
> [mån aug  1 18:57:57 2022]  do_sys_openat2+0x97/0x150
> [mån aug  1 18:57:57 2022]  __x64_sys_openat+0x54/0x90
> [mån aug  1 18:57:57 2022]  do_syscall_64+0x33/0x80
> [mån aug  1 18:57:57 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [mån aug  1 18:57:57 2022] RIP: 0033:0x7febc09b9be7
> [mån aug  1 18:57:57 2022] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
> [mån aug  1 18:57:57 2022] RSP: 002b:00007ffedb21a7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
> [mån aug  1 18:57:57 2022] RAX: ffffffffffffffda RBX: 00007ffedb21aaa8 RCX: 00007febc09b9be7
> [mån aug  1 18:57:57 2022] RDX: 0000000000000941 RSI: 00007ffedb21ae94 RDI: 00000000ffffff9c
> [mån aug  1 18:57:57 2022] RBP: 00007ffedb21ae94 R08: 0000000000000000 R09: 0000000000000000
> [mån aug  1 18:57:57 2022] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
> [mån aug  1 18:57:57 2022] R13: 00007ffedb21ae94 R14: 0000000000000000 R15: 0000000000000000
> [mån aug  1 18:57:57 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables loop radeon zfs(POE) ttm zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) drm_kms_helper iTCO_wdt cec icp(POE) intel_pmc_bxt dcdbas iTCO_vendor_support ipmi_si watchdog zcommon(POE) znvpair(POE) intel_powerclamp drm spl(OE) ipmi_devintf pcspkr ipmi_msghandler i2c_algo_bit sg serio_raw rng_core e752x_edac evdev button overlay ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear raid1 sd_mod sr_mod cdrom ata_generic md_mod ata_piix libata nvme mptspi mptscsih nvme_core uhci_hcd ehci_pci e1000 ehci_hcd t10_pi crc_t10dif psmouse mptbase usbcore crct10dif_generic scsi_transport_spi scsi_mod lpc_ich crct10dif_common
> [mån aug  1 18:57:57 2022]  usb_common video
> [mån aug  1 18:57:57 2022] CR2: 0000000000000010
> [mån aug  1 18:57:57 2022] ---[ end trace 284590a68ce9a232 ]---
> [mån aug  1 18:57:57 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
> [mån aug  1 18:57:57 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab 57 e9 e5 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
> [mån aug  1 18:57:57 2022] RSP: 0018:ffffc2b08062fb78 EFLAGS: 00010246
> [mån aug  1 18:57:57 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9daed0440068
> [mån aug  1 18:57:57 2022] RDX: ffff9daec0fb53b8 RSI: 0000000000000469 RDI: ffffffffc0896c80
> [mån aug  1 18:57:57 2022] RBP: ffff9daed0440068 R08: ffff9daed07f7138 R09: 0000000000000000
> [mån aug  1 18:57:57 2022] R10: ffff9daec4c2ef08 R11: 0000000000000000 R12: ffff9daec0fb53b8
> [mån aug  1 18:57:57 2022] R13: ffff9daee013d800 R14: 0000000000000469 R15: ffff9daee013d800
> [mån aug  1 18:57:57 2022] FS:  00007febc0a915c0(0000) GS:ffff9dafd7c80000(0000) knlGS:0000000000000000
> [mån aug  1 18:57:57 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [mån aug  1 18:57:57 2022] CR2: 0000000000000010 CR3: 0000000106616000 CR4: 00000000000006e0

> [mån aug  1 19:24:19 2022] EXT4-fs error (device md127): ext4_validate_inode_bitmap:105: comm touch: Corrupt inode bitmap - block_group = 0, inode_bitmap = 494
> [mån aug  1 19:24:19 2022] Aborting journal on device md127-8.
> [mån aug  1 19:24:19 2022] EXT4-fs (md127): Remounting filesystem read-only


Reply to: