Re: Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000

To: debian-arm@lists.debian.org
Subject: Re: Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000
From: "Arnd Bergmann" <arnd@arndb.de>
Date: Tue, 11 Feb 2025 08:19:12 +0100
Message-id: <[🔎] e9680296-93e3-4307-9e85-e62642a8fd20@app.fastmail.com>
In-reply-to: <[🔎] 2257630.0IWmF9Yd3q@h370>
References: <[🔎] 2257630.0IWmF9Yd3q@h370>

On Sun, Feb 9, 2025, at 15:38, Rainer Dorsch wrote:
>
> during reboot of Cubox-i with stable kernel 6.1.0-29-armmp, I got a kernel 
> Oops (though the reboot did complete eventually):

Hi Rainer,

> [2406987.476525] 8<--- cut here ---
> [2406987.479798] Unable to handle kernel NULL pointer dereference at virtual 
> address 00000000

A NULL pointer was dereferenced, which in this case is almost
certainly a logic bug in kernel code.

> [2406987.669806] CPU: 0 PID: 9106 Comm: rmmod Tainted: G         C         
> 6.1.0-29-armmp #1  Debian 6.1.123-1
> [2406987.679578] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [2406987.686300] PC is at zcomp_cpu_dead+0x14/0x58 [zram]
> [2406987.691486] LR is at cpuhp_invoke_callback+0xd4/0x6fc

You can get the exact code location by running the oops through
'addr2line', but the function is fairly short.

> [2406987.816364] Process rmmod (pid: 9106, stack limit = 0x589ba9ab)

This happened while unloading a module

> [2406987.961101]  zcomp_cpu_dead [zram] from cpuhp_invoke_callback+0xd4/0x6fc
> [2406987.968039]  cpuhp_invoke_callback from cpuhp_issue_call+0x54/0x1b4
> [2406987.974523]  cpuhp_issue_call from 
> __cpuhp_state_remove_instance+0xf8/0x1b4
> [2406987.981702]  __cpuhp_state_remove_instance from zcomp_destroy+0x20/0x34 
> [zram]
> [2406987.989153]  zcomp_destroy [zram] from zram_reset_device+0x114/0x170 
> [zram]
> [2406987.996345]  zram_reset_device [zram] from zram_remove+0x10c/0x120 [zram]
> [2406988.003358]  zram_remove [zram] from zram_remove_cb+0x14/0x5c [zram]
> [2406988.009941]  zram_remove_cb [zram] from idr_for_each+0x5c/0x108
> [2406988.016084]  idr_for_each from destroy_devices+0x38/0x68 [zram]
> [2406988.022240]  destroy_devices [zram] from sys_delete_module+0x194/0x320
> [2406988.028990]  sys_delete_module from ret_fast_syscall+0x0/0x1c

This is the entire backtrace, showing that only the zram module
was involved.

Linux-6.1 is fairly old, and this file has changed a bit between
that and 6.13, though none of the changes here immediately point
to a NULL pointer dereference:

b8f03cb703a1 zram: move immutable comp params away from per-CPU context
6a81bdfeb350 zram: introduce zcomp_ctx structure
52c7b4e2ba50 zram: introduce zcomp_req structure
f2bac7ad187d zram: introduce zcomp_params structure
1a78390d8760 zram: check that backends array has at least one backend
1d3100cf148d zram: add 842 compression backend support
84112e314f69 zram: add zlib compression backend support
73e7d81abbc8 zram: add zstd compression backend support
c60a4ef54446 zram: add lz4hc compression backend support
22d651c3b339 zram: add lz4 compression backend support
2152247c55b6 zram: add lzo and lzorle compression backends support
917a59e81c34 zram: introduce custom comp backends API
45866e0e214f zram: do not allocate physically contiguous strm buffers
7ac07a26dea7 zram: preparation for multi-zcomp support

This is the code in question (from 6.13):

static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
{
        comp->ops->destroy_ctx(&zstrm->ctx);
        vfree(zstrm->buffer);
        zstrm->buffer = NULL;
}
int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
{
        struct zcomp *comp = hlist_entry(node, struct zcomp, node);
        struct zcomp_strm *zstrm;

        zstrm = per_cpu_ptr(comp->stream, cpu);
        zcomp_strm_free(comp, zstrm);
        return 0;
}

If you look at the vmlinux file with objdump, you can probably
figure out if the bug is dereferencing zstrm or comp. The other
things I would try to narrow down the problem are:

- unload the module manually during runtime
- update the kernel to a more recent one, such as 6.12
- use a different compression backend for zram (zstd, deflate, lzo, ...)

      Arnd

Reply to:

References:
- Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000
  - From: Rainer Dorsch <ml@bokomoko.de>

Prev by Date: Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Next by Date: Advice for testing PAC support
Previous by thread: Cubox-i: Kernel-Oops: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Next by thread: Advice for testing PAC support
Index(es):
- Date
- Thread