Re: Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000
On Sun, Feb 9, 2025, at 15:38, Rainer Dorsch wrote:
>
> during reboot of Cubox-i with stable kernel 6.1.0-29-armmp, I got a kernel
> Oops (though the reboot did complete eventually):
Hi Rainer,
> [2406987.476525] 8<--- cut here ---
> [2406987.479798] Unable to handle kernel NULL pointer dereference at virtual
> address 00000000
A NULL pointer was dereferenced, which in this case is almost
certainly a logic bug in kernel code.
> [2406987.669806] CPU: 0 PID: 9106 Comm: rmmod Tainted: G C
> 6.1.0-29-armmp #1 Debian 6.1.123-1
> [2406987.679578] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [2406987.686300] PC is at zcomp_cpu_dead+0x14/0x58 [zram]
> [2406987.691486] LR is at cpuhp_invoke_callback+0xd4/0x6fc
You can get the exact code location by running the oops through
'addr2line', but the function is fairly short.
> [2406987.816364] Process rmmod (pid: 9106, stack limit = 0x589ba9ab)
This happened while unloading a module
> [2406987.961101] zcomp_cpu_dead [zram] from cpuhp_invoke_callback+0xd4/0x6fc
> [2406987.968039] cpuhp_invoke_callback from cpuhp_issue_call+0x54/0x1b4
> [2406987.974523] cpuhp_issue_call from
> __cpuhp_state_remove_instance+0xf8/0x1b4
> [2406987.981702] __cpuhp_state_remove_instance from zcomp_destroy+0x20/0x34
> [zram]
> [2406987.989153] zcomp_destroy [zram] from zram_reset_device+0x114/0x170
> [zram]
> [2406987.996345] zram_reset_device [zram] from zram_remove+0x10c/0x120 [zram]
> [2406988.003358] zram_remove [zram] from zram_remove_cb+0x14/0x5c [zram]
> [2406988.009941] zram_remove_cb [zram] from idr_for_each+0x5c/0x108
> [2406988.016084] idr_for_each from destroy_devices+0x38/0x68 [zram]
> [2406988.022240] destroy_devices [zram] from sys_delete_module+0x194/0x320
> [2406988.028990] sys_delete_module from ret_fast_syscall+0x0/0x1c
This is the entire backtrace, showing that only the zram module
was involved.
Linux-6.1 is fairly old, and this file has changed a bit between
that and 6.13, though none of the changes here immediately point
to a NULL pointer dereference:
b8f03cb703a1 zram: move immutable comp params away from per-CPU context
6a81bdfeb350 zram: introduce zcomp_ctx structure
52c7b4e2ba50 zram: introduce zcomp_req structure
f2bac7ad187d zram: introduce zcomp_params structure
1a78390d8760 zram: check that backends array has at least one backend
1d3100cf148d zram: add 842 compression backend support
84112e314f69 zram: add zlib compression backend support
73e7d81abbc8 zram: add zstd compression backend support
c60a4ef54446 zram: add lz4hc compression backend support
22d651c3b339 zram: add lz4 compression backend support
2152247c55b6 zram: add lzo and lzorle compression backends support
917a59e81c34 zram: introduce custom comp backends API
45866e0e214f zram: do not allocate physically contiguous strm buffers
7ac07a26dea7 zram: preparation for multi-zcomp support
This is the code in question (from 6.13):
static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
{
comp->ops->destroy_ctx(&zstrm->ctx);
vfree(zstrm->buffer);
zstrm->buffer = NULL;
}
int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
{
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
struct zcomp_strm *zstrm;
zstrm = per_cpu_ptr(comp->stream, cpu);
zcomp_strm_free(comp, zstrm);
return 0;
}
If you look at the vmlinux file with objdump, you can probably
figure out if the bug is dereferencing zstrm or comp. The other
things I would try to narrow down the problem are:
- unload the module manually during runtime
- update the kernel to a more recent one, such as 6.12
- use a different compression backend for zram (zstd, deflate, lzo, ...)
Arnd
Reply to: