[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#982459: mdadm examine corrupts host ext4




On Sun, 31 Jul 2022, Chris Hofstaedtler wrote:

I can't see a difference that should matter from userspace.

I have stared a bit at the kernel code... there have been quite some
changes and fixes in this area. Which kernel version were you
running when testing this?

Could you retry on something >= 5.9? I.e. some version with patch
   08fc1ab6d748ab1a690fd483f41e2938984ce353.

Dear Chris,

I believe that I was running 5.10 (bullseye).

It looks like 5.18 (from backports) does not show the issue!  (i.e. works)

Some more details:

I have now tried again:

host:
  linux-image-5.10.0-16-amd64           5.10.127-2
  mdadm                                 4.2-1~bpo11+1
chroot:
  mdadm                                         4.1-11

  Some more details:

  This time I did get some dmesg BUG output as well (attached).
  It does not seem to be the same backtrace on two occurances.

  I also noticed that the BUG: report in dmesg does not happen directly
  when doing 'mdadm --examine --scan --config=partitions'.  It rather
  occurs when some activity happens on the host filesystem, e.g.
  a 'touch /root/a' command.

host:
  linux-image-5.18.0-0.bpo.1-amd64      5.18.2-1~bpo11+1

  (did not re-install anything else, except upgraded zfs, also from
  backports (since pure bullseye would not compile with 5.18))

  Does not exhibit the problem.

I have tried with both kernels several times, and it was repeatable that 5.10 got stuck while 5.18 does not show issues.

Reminder: to get the issue, /dev/ should not be mounted in the chroot.
With /dev/ mounted, 5.10 also works.

Best regards,
Håkan
[mån aug  1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 0000000000000010
[mån aug  1 15:53:08 2022] #PF: supervisor read access in kernel mode
[mån aug  1 15:53:08 2022] #PF: error_code(0x0000) - not-present page
[mån aug  1 15:53:08 2022] PGD 0 P4D 0 
[mån aug  1 15:53:08 2022] Oops: 0000 [#1] SMP PTI
[mån aug  1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P           OE     5.10.0-16-amd64 #1 Debian 5.10.127-2
[mån aug  1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005
[mån aug  1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
[mån aug  1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
[mån aug  1 15:53:08 2022] RSP: 0018:ffffae27c059fd60 EFLAGS: 00010246
[mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: ffff9d1b94505480 RCX: ffff9d1bc52e5e38
[mån aug  1 15:53:08 2022] RDX: ffff9d1bc13782d8 RSI: 0000000000000c14 RDI: ffffffffc096feb0
[mån aug  1 15:53:08 2022] RBP: ffff9d1bc52e5e38 R08: ffff9d1be04d5230 R09: 0000000000000001
[mån aug  1 15:53:08 2022] R10: ffff9d1bc985f000 R11: 000000000000001d R12: ffff9d1bc13782d8
[mån aug  1 15:53:08 2022] R13: ffff9d1be04d5000 R14: 0000000000000c14 R15: ffff9d1bc13782d8
[mån aug  1 15:53:08 2022] FS:  00007fed5ecb1840(0000) GS:ffff9d1cd7c80000(0000) knlGS:0000000000000000
[mån aug  1 15:53:08 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mån aug  1 15:53:08 2022] CR2: 0000000000000010 CR3: 00000001a46d8000 CR4: 00000000000006e0
[mån aug  1 15:53:08 2022] Call Trace:
[mån aug  1 15:53:08 2022]  ext4_orphan_del+0x23f/0x290 [ext4]
[mån aug  1 15:53:08 2022]  ext4_evict_inode+0x31f/0x630 [ext4]
[mån aug  1 15:53:08 2022]  evict+0xd1/0x1a0
[mån aug  1 15:53:08 2022]  __dentry_kill+0xe4/0x180
[mån aug  1 15:53:08 2022]  dput+0x149/0x2f0
[mån aug  1 15:53:08 2022]  __fput+0xe4/0x240
[mån aug  1 15:53:08 2022]  task_work_run+0x65/0xa0
[mån aug  1 15:53:08 2022]  exit_to_user_mode_prepare+0x111/0x120
[mån aug  1 15:53:08 2022]  syscall_exit_to_user_mode+0x28/0x140
[mån aug  1 15:53:08 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[mån aug  1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77
[mån aug  1 15:53:08 2022] Code: 44 00 00 48 8b 15 19 a1 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 e9 a0 0c 00 f7 d8 64 89 02 b8
[mån aug  1 15:53:08 2022] RSP: 002b:00007ffd50452818 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: 000055dab4578910 RCX: 00007fed5eea2d77
[mån aug  1 15:53:08 2022] RDX: 00007fed5ef6e8a0 RSI: 0000000000000000 RDI: 0000000000000006
[mån aug  1 15:53:08 2022] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007fed5ef6dbe0
[mån aug  1 15:53:08 2022] R10: 000000000000006f R11: 0000000000000202 R12: 00007fed5ef6f4a0
[mån aug  1 15:53:08 2022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[mån aug  1 15:53:08 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables loop dcdbas radeon zfs(POE) zunicode(POE) zzstd(OE) ttm zlua(OE) zavl(POE) icp(POE) drm_kms_helper iTCO_wdt intel_pmc_bxt cec iTCO_vendor_support zcommon(POE) watchdog znvpair(POE) intel_powerclamp ipmi_si drm pcspkr spl(OE) ipmi_devintf serio_raw ipmi_msghandler rng_core i2c_algo_bit sg evdev e752x_edac button overlay ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear raid1 sd_mod sr_mod cdrom ata_generic md_mod mptspi mptscsih ata_piix libata mptbase scsi_transport_spi nvme ehci_pci uhci_hcd nvme_core ehci_hcd t10_pi scsi_mod lpc_ich crc_t10dif crct10dif_generic psmouse usbcore e1000 crct10dif_common
[mån aug  1 15:53:08 2022]  usb_common video
[mån aug  1 15:53:08 2022] CR2: 0000000000000010
[mån aug  1 15:53:08 2022] ---[ end trace 4fd9ed73d190bc2a ]---
[mån aug  1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
[mån aug  1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
[mån aug  1 15:53:08 2022] RSP: 0018:ffffae27c059fd60 EFLAGS: 00010246
[mån aug  1 15:53:08 2022] RAX: 0000000000000000 RBX: ffff9d1b94505480 RCX: ffff9d1bc52e5e38
[mån aug  1 15:53:08 2022] RDX: ffff9d1bc13782d8 RSI: 0000000000000c14 RDI: ffffffffc096feb0
[mån aug  1 15:53:08 2022] RBP: ffff9d1bc52e5e38 R08: ffff9d1be04d5230 R09: 0000000000000001
[mån aug  1 15:53:08 2022] R10: ffff9d1bc985f000 R11: 000000000000001d R12: ffff9d1bc13782d8
[mån aug  1 15:53:08 2022] R13: ffff9d1be04d5000 R14: 0000000000000c14 R15: ffff9d1bc13782d8
[mån aug  1 15:53:08 2022] FS:  00007fed5ecb1840(0000) GS:ffff9d1cd7c80000(0000) knlGS:0000000000000000
[mån aug  1 15:53:08 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mån aug  1 15:53:08 2022] CR2: 0000000000000010 CR3: 00000001a46d8000 CR4: 00000000000006e0
[mån aug  1 18:57:57 2022] BUG: kernel NULL pointer dereference, address: 0000000000000010
[mån aug  1 18:57:57 2022] #PF: supervisor read access in kernel mode
[mån aug  1 18:57:57 2022] #PF: error_code(0x0000) - not-present page
[mån aug  1 18:57:57 2022] PGD 0 P4D 0 
[mån aug  1 18:57:57 2022] Oops: 0000 [#1] SMP PTI
[mån aug  1 18:57:57 2022] CPU: 2 PID: 4427 Comm: touch Tainted: P           OE     5.10.0-16-amd64 #1 Debian 5.10.127-2
[mån aug  1 18:57:57 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005
[mån aug  1 18:57:57 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
[mån aug  1 18:57:57 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab 57 e9 e5 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
[mån aug  1 18:57:57 2022] RSP: 0018:ffffc2b08062fb78 EFLAGS: 00010246
[mån aug  1 18:57:57 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9daed0440068
[mån aug  1 18:57:57 2022] RDX: ffff9daec0fb53b8 RSI: 0000000000000469 RDI: ffffffffc0896c80
[mån aug  1 18:57:57 2022] RBP: ffff9daed0440068 R08: ffff9daed07f7138 R09: 0000000000000000
[mån aug  1 18:57:57 2022] R10: ffff9daec4c2ef08 R11: 0000000000000000 R12: ffff9daec0fb53b8
[mån aug  1 18:57:57 2022] R13: ffff9daee013d800 R14: 0000000000000469 R15: ffff9daee013d800
[mån aug  1 18:57:57 2022] FS:  00007febc0a915c0(0000) GS:ffff9dafd7c80000(0000) knlGS:0000000000000000
[mån aug  1 18:57:57 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mån aug  1 18:57:57 2022] CR2: 0000000000000010 CR3: 0000000106616000 CR4: 00000000000006e0
[mån aug  1 18:57:57 2022] Call Trace:
[mån aug  1 18:57:57 2022]  ? __ext4_handle_dirty_metadata+0x51/0x1a0 [ext4]
[mån aug  1 18:57:57 2022]  __ext4_new_inode+0x925/0x1690 [ext4]
[mån aug  1 18:57:57 2022]  ext4_create+0x106/0x1b0 [ext4]
[mån aug  1 18:57:57 2022]  path_openat+0xde1/0x1080
[mån aug  1 18:57:57 2022]  do_filp_open+0x88/0x130
[mån aug  1 18:57:57 2022]  ? getname_flags.part.0+0x29/0x1a0
[mån aug  1 18:57:57 2022]  ? __check_object_size+0x136/0x150
[mån aug  1 18:57:57 2022]  do_sys_openat2+0x97/0x150
[mån aug  1 18:57:57 2022]  __x64_sys_openat+0x54/0x90
[mån aug  1 18:57:57 2022]  do_syscall_64+0x33/0x80
[mån aug  1 18:57:57 2022]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[mån aug  1 18:57:57 2022] RIP: 0033:0x7febc09b9be7
[mån aug  1 18:57:57 2022] Code: 25 00 00 41 00 3d 00 00 41 00 74 47 64 8b 04 25 18 00 00 00 85 c0 75 6b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 95 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
[mån aug  1 18:57:57 2022] RSP: 002b:00007ffedb21a7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[mån aug  1 18:57:57 2022] RAX: ffffffffffffffda RBX: 00007ffedb21aaa8 RCX: 00007febc09b9be7
[mån aug  1 18:57:57 2022] RDX: 0000000000000941 RSI: 00007ffedb21ae94 RDI: 00000000ffffff9c
[mån aug  1 18:57:57 2022] RBP: 00007ffedb21ae94 R08: 0000000000000000 R09: 0000000000000000
[mån aug  1 18:57:57 2022] R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
[mån aug  1 18:57:57 2022] R13: 00007ffedb21ae94 R14: 0000000000000000 R15: 0000000000000000
[mån aug  1 18:57:57 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables loop radeon zfs(POE) ttm zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) drm_kms_helper iTCO_wdt cec icp(POE) intel_pmc_bxt dcdbas iTCO_vendor_support ipmi_si watchdog zcommon(POE) znvpair(POE) intel_powerclamp drm spl(OE) ipmi_devintf pcspkr ipmi_msghandler i2c_algo_bit sg serio_raw rng_core e752x_edac evdev button overlay ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear raid1 sd_mod sr_mod cdrom ata_generic md_mod ata_piix libata nvme mptspi mptscsih nvme_core uhci_hcd ehci_pci e1000 ehci_hcd t10_pi crc_t10dif psmouse mptbase usbcore crct10dif_generic scsi_transport_spi scsi_mod lpc_ich crct10dif_common
[mån aug  1 18:57:57 2022]  usb_common video
[mån aug  1 18:57:57 2022] CR2: 0000000000000010
[mån aug  1 18:57:57 2022] ---[ end trace 284590a68ce9a232 ]---
[mån aug  1 18:57:57 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4]
[mån aug  1 18:57:57 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab 57 e9 e5 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00
[mån aug  1 18:57:57 2022] RSP: 0018:ffffc2b08062fb78 EFLAGS: 00010246
[mån aug  1 18:57:57 2022] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9daed0440068
[mån aug  1 18:57:57 2022] RDX: ffff9daec0fb53b8 RSI: 0000000000000469 RDI: ffffffffc0896c80
[mån aug  1 18:57:57 2022] RBP: ffff9daed0440068 R08: ffff9daed07f7138 R09: 0000000000000000
[mån aug  1 18:57:57 2022] R10: ffff9daec4c2ef08 R11: 0000000000000000 R12: ffff9daec0fb53b8
[mån aug  1 18:57:57 2022] R13: ffff9daee013d800 R14: 0000000000000469 R15: ffff9daee013d800
[mån aug  1 18:57:57 2022] FS:  00007febc0a915c0(0000) GS:ffff9dafd7c80000(0000) knlGS:0000000000000000
[mån aug  1 18:57:57 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[mån aug  1 18:57:57 2022] CR2: 0000000000000010 CR3: 0000000106616000 CR4: 00000000000006e0
[mån aug  1 19:24:19 2022] EXT4-fs error (device md127): ext4_validate_inode_bitmap:105: comm touch: Corrupt inode bitmap - block_group = 0, inode_bitmap = 494
[mån aug  1 19:24:19 2022] Aborting journal on device md127-8.
[mån aug  1 19:24:19 2022] EXT4-fs (md127): Remounting filesystem read-only

Reply to: