[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1043078: linux-image-6.3.0-2-amd64: kernel NULL pointer dereference with MD write-back journal



Package: src:linux
Version: 6.3.11-1
Severity: normal

Dear Maintainer,

I was testing RAID-5 write-back journal (AKA cache) for the first time.

https://docs.kernel.org/driver-api/md/raid5-cache.html

I experienced a NULL pointer dereference early in the process.

-------------------- steps leading up to the crash -------------------

Make a RAID-1 from a pair of SSDs.

$ sudo mdadm --create /dev/md101 -l 1 -n 2 /dev/disk/by-id/ata-Samsung_SSD_850_PRO_256GB_S251NX0H60631*

Make a RAID-5 containing the journal and three block device (one of
which is, in turn a RAID). Size is restricted to 10 GB for testing
purposes.

$ sudo mdadm --create /dev/md5 -n 3 -l 5 -z 10G --write-journal /dev/md101 -c 128K /dev/disk/by-id/ata-TOSHIBA_HDWG21C_* /dev/md3

I waited for the RAID to re-sync (this is necessary in order to enable
write-back jornal mode, though not documented).

Enable write-back mode:

$ echo write-back | sudo tee /sys/block/md5/md/journal_mode

Test writes to the RAID-5 via dd:

$ sudo dd if=/dev/zero of=/dev/md5 iflag=fullblock oflag=direct bs=1M count=10240

------------------------ observed behavior --------------------------
Writes proceded at 100 +/- 4 MB/sec to the journal disks.
Writes proceded at 29 +/- 1 MB/sec to the RAID-5 member devices.
This lasted for 50 +/1 1 second, at which point writes stopped and the
kernel printed an error:
BUG: kernel NULL pointer dereference, address: 0000000000000157


-- System Information:
Debian Release: trixie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.3.0-2-amd64 (SMP w/16 CPU threads; PREEMPT)
Kernel taint flags: TAINT_DIE, TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-image-6.3.0-2-amd64 depends on:
ii  initramfs-tools [linux-initramfs-tool]  0.142
ii  kmod                                    30+20230519-1
ii  linux-base                              4.9

Versions of packages linux-image-6.3.0-2-amd64 recommends:
pn  apparmor             <none>
ii  firmware-linux-free  20200122-1

Versions of packages linux-image-6.3.0-2-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  grub-efi-amd64          2.06-13
ii  linux-doc-6.3           6.3.11-1

Versions of packages linux-image-6.3.0-2-amd64 is related to:
ii  firmware-amd-graphics     20230515-3
pn  firmware-atheros          <none>
pn  firmware-bnx2             <none>
pn  firmware-bnx2x            <none>
pn  firmware-brcm80211        <none>
pn  firmware-cavium           <none>
pn  firmware-intel-sound      <none>
pn  firmware-intelwimax       <none>
pn  firmware-ipw2x00          <none>
pn  firmware-ivtv             <none>
pn  firmware-iwlwifi          <none>
pn  firmware-libertas         <none>
pn  firmware-linux-nonfree    <none>
pn  firmware-misc-nonfree     <none>
pn  firmware-myricom          <none>
pn  firmware-netxen           <none>
pn  firmware-qlogic           <none>
pn  firmware-realtek          <none>
pn  firmware-samsung          <none>
pn  firmware-siano            <none>
pn  firmware-ti-connectivity  <none>
pn  xen-hypervisor            <none>

-- no debconf information
[433679.573742] BUG: kernel NULL pointer dereference, address: 0000000000000157
[433679.573747] #PF: supervisor read access in kernel mode
[433679.573749] #PF: error_code(0x0000) - not-present page
[433679.573750] PGD 0 P4D 0 
[433679.573753] Oops: 0000 [#1] PREEMPT SMP NOPTI
[433679.573755] CPU: 1 PID: 227438 Comm: md5_raid5 Tainted: G           OE      6.3.0-2-amd64 #1  Debian 6.3.11-1
[433679.573758] Hardware name: ASUS System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 4603 09/13/2021
[433679.573760] RIP: 0010:submit_bio_noacct+0x10d/0x4c0
[433679.573764] Code: 81 01 00 00 41 8b 85 d8 01 00 00 85 c0 74 6d 48 8b 43 48 48 85 c0 74 0f 48 63 15 de b0 42 01 48 8b 84 d0 d0 00 00 00 83 e5 01 <80> bc 28 56 01 00 00 00 0f 85 8b 01 00 00 80 bc 28 54 01 00 00 00
[433679.573766] RSP: 0018:ffffb1d41070fd00 EFLAGS: 00010202
[433679.573768] RAX: 0000000000000000 RBX: ffff8cb4e21ec0b8 RCX: ffff8cb4fa30ea40
[433679.573770] RDX: 0000000000000000 RSI: ffffffffb7797ba0 RDI: ffff8cb4e21ec0b8
[433679.573771] RBP: 0000000000000001 R08: 0000000000040001 R09: ffffb1d41070fd38
[433679.573773] R10: 0000000000000007 R11: 0000000000000000 R12: ffff8cb4fa30ea40
[433679.573774] R13: ffff8cb921aeeae0 R14: 000000001dcb2a80 R15: 0000000000000008
[433679.573775] FS:  0000000000000000(0000) GS:ffff8cc3bea40000(0000) knlGS:0000000000000000
[433679.573777] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[433679.573778] CR2: 0000000000000157 CR3: 0000000523804000 CR4: 00000000003506e0
[433679.573780] Call Trace:
[433679.573781]  <TASK>
[433679.573783]  ? __die+0x23/0x70
[433679.573787]  ? page_fault_oops+0x17d/0x4c0
[433679.573791]  ? exc_page_fault+0x74/0x170
[433679.573794]  ? asm_exc_page_fault+0x26/0x30
[433679.573798]  ? submit_bio_noacct+0x10d/0x4c0
[433679.573801]  handle_active_stripes.constprop.0+0x349/0x560 [raid456]
[433679.573810]  raid5d+0x4a0/0x760 [raid456]
[433679.573817]  ? __schedule+0x442/0xb50
[433679.573820]  ? _raw_spin_lock_irqsave+0x27/0x60
[433679.573822]  ? preempt_count_add+0x6e/0xa0
[433679.573825]  ? _raw_spin_lock_irqsave+0x27/0x60
[433679.573828]  ? __pfx_md_thread+0x10/0x10 [md_mod]
[433679.573837]  md_thread+0xae/0x190 [md_mod]
[433679.573846]  ? __pfx_autoremove_wake_function+0x10/0x10
[433679.573849]  kthread+0xed/0x120
[433679.573852]  ? __pfx_kthread+0x10/0x10
[433679.573854]  ret_from_fork+0x2c/0x50
[433679.573858]  </TASK>
[433679.573859] Modules linked in: cfg80211 xts ecb serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic algif_skcipher af_alg uas usb_storage xfs twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common essiv authenc dm_crypt vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cpufreq_conservative cpufreq_userspace cpufreq_powersave rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs bridge stp llc binfmt_misc amdgpu intel_rapl_msr intel_rapl_common drm_buddy edac_mce_amd gpu_sched drm_display_helper kvm_amd cec snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi eeepc_wmi rc_core asus_wmi kvm drm_ttm_helper snd_hda_intel battery ttm snd_intel_dspcfg irqbypass ledtrig_audio snd_intel_sdw_acpi hid_dr hid_pl sparse_keymap ff_memless drm_kms_helper snd_hda_codec platform_profile ccp rfkill asus_wmi_sensors video snd_hda_core rapl sp5100_tco sg snd_hwdep watchdog rng_core mxm_wmi wmi_bmof k10temp pcspkr button acpi_cpufreq cpufreq_ondemand lm90 snd_intel8x0 snd_ac97_codec ac97_bus
[433679.573901]  snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore evdev psmouse i2c_dev sidewinder gameport nfsd joydev auth_rpcgss parport_pc nfs_acl lockd ppdev grace lp parport sunrpc drm dm_mod efi_pstore fuse loop configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear hid_generic bcache raid1 md_mod crc32_pclmul sd_mod usbhid crc32c_intel t10_pi hid crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic crct10dif_pclmul ghash_clmulni_intel crc64 crct10dif_common sha512_ssse3 sha512_generic ahci libahci xhci_pci libata xhci_hcd aesni_intel crypto_simd cryptd i2c_piix4 e1000e scsi_mod usbcore igb i2c_algo_bit dca usb_common scsi_common wmi gpio_amdpt gpio_generic
[433679.573944] CR2: 0000000000000157
[433679.573945] ---[ end trace 0000000000000000 ]---
[433679.793060] RIP: 0010:submit_bio_noacct+0x10d/0x4c0
[433679.793065] Code: 81 01 00 00 41 8b 85 d8 01 00 00 85 c0 74 6d 48 8b 43 48 48 85 c0 74 0f 48 63 15 de b0 42 01 48 8b 84 d0 d0 00 00 00 83 e5 01 <80> bc 28 56 01 00 00 00 0f 85 8b 01 00 00 80 bc 28 54 01 00 00 00
[433679.793068] RSP: 0018:ffffb1d41070fd00 EFLAGS: 00010202
[433679.793070] RAX: 0000000000000000 RBX: ffff8cb4e21ec0b8 RCX: ffff8cb4fa30ea40
[433679.793071] RDX: 0000000000000000 RSI: ffffffffb7797ba0 RDI: ffff8cb4e21ec0b8
[433679.793072] RBP: 0000000000000001 R08: 0000000000040001 R09: ffffb1d41070fd38
[433679.793074] R10: 0000000000000007 R11: 0000000000000000 R12: ffff8cb4fa30ea40
[433679.793075] R13: ffff8cb921aeeae0 R14: 000000001dcb2a80 R15: 0000000000000008
[433679.793076] FS:  0000000000000000(0000) GS:ffff8cc3bea40000(0000) knlGS:0000000000000000
[433679.793078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[433679.793079] CR2: 0000000000000157 CR3: 0000000523804000 CR4: 00000000003506e0
[433679.793081] note: md5_raid5[227438] exited with irqs disabled
[433679.793096] ------------[ cut here ]------------
[433679.793098] WARNING: CPU: 1 PID: 227438 at kernel/exit.c:814 do_exit+0x8cd/0xb10
[433679.793102] Modules linked in: cfg80211 xts ecb serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic algif_skcipher af_alg uas usb_storage xfs twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common essiv authenc dm_crypt vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cpufreq_conservative cpufreq_userspace cpufreq_powersave rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs bridge stp llc binfmt_misc amdgpu intel_rapl_msr intel_rapl_common drm_buddy edac_mce_amd gpu_sched drm_display_helper kvm_amd cec snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi eeepc_wmi rc_core asus_wmi kvm drm_ttm_helper snd_hda_intel battery ttm snd_intel_dspcfg irqbypass ledtrig_audio snd_intel_sdw_acpi hid_dr hid_pl sparse_keymap ff_memless drm_kms_helper snd_hda_codec platform_profile ccp rfkill asus_wmi_sensors video snd_hda_core rapl sp5100_tco sg snd_hwdep watchdog rng_core mxm_wmi wmi_bmof k10temp pcspkr button acpi_cpufreq cpufreq_ondemand lm90 snd_intel8x0 snd_ac97_codec ac97_bus
[433679.793144]  snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore evdev psmouse i2c_dev sidewinder gameport nfsd joydev auth_rpcgss parport_pc nfs_acl lockd ppdev grace lp parport sunrpc drm dm_mod efi_pstore fuse loop configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear hid_generic bcache raid1 md_mod crc32_pclmul sd_mod usbhid crc32c_intel t10_pi hid crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic crct10dif_pclmul ghash_clmulni_intel crc64 crct10dif_common sha512_ssse3 sha512_generic ahci libahci xhci_pci libata xhci_hcd aesni_intel crypto_simd cryptd i2c_piix4 e1000e scsi_mod usbcore igb i2c_algo_bit dca usb_common scsi_common wmi gpio_amdpt gpio_generic
[433679.793186] CPU: 1 PID: 227438 Comm: md5_raid5 Tainted: G      D    OE      6.3.0-2-amd64 #1  Debian 6.3.11-1
[433679.793189] Hardware name: ASUS System Product Name/ROG CROSSHAIR VII HERO (WI-FI), BIOS 4603 09/13/2021
[433679.793190] RIP: 0010:do_exit+0x8cd/0xb10
[433679.793193] Code: 88 13 00 00 65 01 05 46 34 77 4b e9 1e ff ff ff 48 8b bb 98 09 00 00 31 f6 e8 3f d9 ff ff e9 ac fd ff ff 0f 0b e9 75 f7 ff ff <0f> 0b e9 aa f7 ff ff 4c 89 e6 bf 05 06 00 00 e8 bf 07 01 00 e9 47
[433679.793195] RSP: 0018:ffffb1d41070fed8 EFLAGS: 00010286
[433679.793197] RAX: 0000000000000000 RBX: ffff8cb8b9498000 RCX: 0000000000000000
[433679.793199] RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff
[433679.793200] RBP: ffff8cbc1506e300 R08: 0000000000000000 R09: ffffb1d41070fde0
[433679.793202] R10: 0000000000000003 R11: ffff8cc3ff2f7fe8 R12: 0000000000000009
[433679.793203] R13: ffff8cb4e1236300 R14: 0000000000000000 R15: 0000000000000000
[433679.793204] FS:  0000000000000000(0000) GS:ffff8cc3bea40000(0000) knlGS:0000000000000000
[433679.793206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[433679.793208] CR2: 0000000000000157 CR3: 0000000523804000 CR4: 00000000003506e0
[433679.793210] Call Trace:
[433679.793211]  <TASK>
[433679.793212]  ? do_exit+0x8cd/0xb10
[433679.793214]  ? __warn+0x81/0x130
[433679.793218]  ? do_exit+0x8cd/0xb10
[433679.793220]  ? report_bug+0x191/0x1c0
[433679.793223]  ? handle_bug+0x41/0x70
[433679.793226]  ? exc_invalid_op+0x17/0x70
[433679.793229]  ? asm_exc_invalid_op+0x1a/0x20
[433679.793232]  ? do_exit+0x8cd/0xb10
[433679.793234]  ? do_exit+0x70/0xb10
[433679.793237]  make_task_dead+0x81/0x170
[433679.793239]  rewind_stack_and_make_dead+0x17/0x20
[433679.793243] RIP: 0000:0x0
[433679.793247] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[433679.793248] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[433679.793250] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[433679.793251] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[433679.793253] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[433679.793254] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[433679.793255] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[433679.793257]  </TASK>
[433679.793258] ---[ end trace 0000000000000000 ]---

Reply to: