[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#966846: Kernel panic (4.19.0-10): RIP __cgroup_bpf_run_filter_skb



On Mon, 3 Aug 2020 09:55:28 +0200 =?UTF-8?Q?C=c3=a9dric_Dufour?= <cedric.dufour@ced-network.net> wrote:
> Package: linux-source-4.19
> Version: 4.19.132-1
> Severity: important
>
> Hello,
>
> Since linux-image-4.19.0-10-amd64, I'm facing regular Kernel panics - "RIP: 0010:__cgroup_bpf_run_filter_skb+0x26d/0x3d0" - resulting in full (file) *server freeze*.
>
> The issue is pretty well described and summarized in https://forum.proxmox.com/threads/kernel-5-4-44-causes-system-freeze-on-hp-microserver-gen8.72050/page-2#post-323498
>
> The "culprit" commit - "netprio_cgroup: Fix unlimited memory leak of v2 cgroups" - is indeed included in Debian kernel (4.19) since changelog entry 4.19.131-1
>
> It *seems* there is already a patch proposed upstream (although here for kernel 4.9): https://lkml.org/lkml/2020/7/20/883
>
> Best regards,
>
> Cédric
>
> --
> Cédric Dufour
>
>

FWIW, I am seeing a very similar issue.  Some Debian 10 AWS instances used to run Guacamole via Docker recently started randomly freezing up on me.  I enabled kernel dumps and finally caught one of the machines misbehaving.  Looking at the kdump I see this:
      KERNEL: /usr/lib/debug/vmlinux-4.19.0-10-cloud-amd64
    DUMPFILE: dump.202008101612  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon Aug 10 16:11:47 2020
      UPTIME: 00:05:44
LOAD AVERAGE: 0.21, 0.11, 0.04
       TASKS: 261
    NODENAME: guac.env0.staging.cool.cyber.dhs.gov
     RELEASE: 4.19.0-10-cloud-amd64
     VERSION: #1 SMP Debian 4.19.132-1 (2020-07-24)
     MACHINE: x86_64  (2499 Mhz)
      MEMORY: 4 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000010"
         PID: 1453
     COMMAND: "sshd"
        TASK: ffff8a3f695115c0  [THREAD_INFO: ffff8a3f695115c0]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 1453   TASK: ffff8a3f695115c0  CPU: 0   COMMAND: "sshd"
 #0 [ffffb37740c77800] machine_kexec at ffffffff97a4b297
 #1 [ffffb37740c77858] __crash_kexec at ffffffff97b0e7dd
 #2 [ffffb37740c77920] crash_kexec at ffffffff97b0f62d
 #3 [ffffb37740c77938] oops_end at ffffffff97a2907d
 #4 [ffffb37740c77958] no_context at ffffffff97a5858e
 #5 [ffffb37740c779b0] __do_page_fault at ffffffff97a58c42
 #6 [ffffb37740c77a20] async_page_fault at ffffffff982010be
    [exception RIP: __cgroup_bpf_run_filter_skb+189]
    RIP: ffffffff97b94ffd  RSP: ffffb37740c77ad0  RFLAGS: 00010286
    RAX: 0000000000000000  RBX: ffff8a3ff55e5ee8  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff8a3ff3d49800  RDI: ffff8a3ff52fd500
    RBP: ffff8a3ff52fd500   R8: ffff8a3ff55e5ee8   R9: 0000000000010000
    R10: 0000000000000001  R11: ffff8a3ef6dd7500  R12: 0000000000000000
    R13: 0000000000000000  R14: ffff8a3ff52fd840  R15: ffff8a3ff55e5ee8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb37740c77b30] ip_finish_output at ffffffff97f65988
 #8 [ffffb37740c77b68] ip_output at ffffffff97f6640c
 #9 [ffffb37740c77bc0] __ip_queue_xmit at ffffffff97f65e6d
#10 [ffffb37740c77c18] __tcp_transmit_skb at ffffffff97f80557
#11 [ffffb37740c77c88] tcp_write_xmit at ffffffff97f81e34
#12 [ffffb37740c77cf0] __tcp_push_pending_frames at ffffffff97f82ae1
#13 [ffffb37740c77d00] tcp_sendmsg_locked at ffffffff97f733ac
#14 [ffffb37740c77da8] tcp_sendmsg at ffffffff97f73507
#15 [ffffb37740c77dc8] sock_sendmsg at ffffffff97ee8aa6
#16 [ffffb37740c77de0] sock_write_iter at ffffffff97ee8b47
#17 [ffffb37740c77e50] new_sync_write at ffffffff97c49bfb
#18 [ffffb37740c77ed0] vfs_write at ffffffff97c4c7d5
#19 [ffffb37740c77f00] ksys_write at ffffffff97c4ca77
#20 [ffffb37740c77f38] do_syscall_64 at ffffffff97a04140
#21 [ffffb37740c77f50] entry_SYSCALL_64_after_hwframe at ffffffff98200088
    RIP: 00007fd74beba504  RSP: 00007ffc1d456638  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000084  RCX: 00007fd74beba504
    RDX: 0000000000000084  RSI: 000055785f33bb90  RDI: 0000000000000003
    RBP: 000055785f31d630   R8: 0000000000000000   R9: 0000000000001000
    R10: 0000000000000008  R11: 0000000000000246  R12: 00000000000001dd
    R13: 000055785ddc9b00  R14: 0000000000000003  R15: 00007ffc1d4566e0
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

crash> sym ffffffff97b94ffd
ffffffff97b94ffd (T) __cgroup_bpf_run_filter_skb+189 ./debian/build/build_amd64_none_cloud-amd64/./kernel/bpf/cgroup.c: 539

crash> log
[    0.000000] Linux version 4.19.0-10-cloud-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.132-1 (2020-07-24)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-10-cloud-amd64 root=UUID=9ac8f5bd-5b64-48cd-9efd-2b2d35a30500 ro console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 nmi_watchdog=1 elevator=noop scsi_mod.use_blk_mq=Y crashkernel=384M-:128M
<SNIP>
[  478.686368] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  478.693551] PGD 0 P4D 0
[  478.696291] Oops: 0000 [#1] SMP PTI
[  478.699431] CPU: 0 PID: 1453 Comm: sshd Kdump: loaded Not tainted 4.19.0-10-cloud-amd64 #1 Debian 4.19.132-1
[  478.706782] Hardware name: Amazon EC2 t3.medium/, BIOS 1.0 10/16/2017
[  478.711129] RIP: 0010:__cgroup_bpf_run_filter_skb+0xbd/0x1e0
[  478.715172] Code: 00 00 00 49 89 7f 18 48 89 0c 24 44 89 e1 48 29 c8 48 89 4c 24 08 49 89 87 d8 00 00 00 89 d2 48 8d 84 d6 b0 03 00 00 48 8b 00 <48> 8b 58 10 4c 8d 70 10 48 85 db 0f 84 01 01 00 00 4d 8d 6f 30 bd
[  478.727711] RSP: 0018:ffffb37740c77ad0 EFLAGS: 00010286
[  478.731595] RAX: 0000000000000000 RBX: ffff8a3ff55e5ee8 RCX: 0000000000000000
[  478.736351] RDX: 0000000000000001 RSI: ffff8a3ff3d49800 RDI: ffff8a3ff52fd500
[  478.741042] RBP: ffff8a3ff52fd500 R08: ffff8a3ff55e5ee8 R09: 0000000000010000
[  478.745697] R10: 0000000000000001 R11: ffff8a3ef6dd7500 R12: 0000000000000000
[  478.750446] R13: 0000000000000000 R14: ffff8a3ff52fd840 R15: ffff8a3ff55e5ee8
[  478.755161] FS:  00007fd74bb17e40(0000) GS:ffff8a3ff7e00000(0000) knlGS:0000000000000000
[  478.761724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  478.765853] CR2: 0000000000000010 CR3: 00000000a94e6005 CR4: 00000000007606b0
[  478.770524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  478.775273] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  478.779984] PKRU: 55555554
[  478.782901] Call Trace:
[  478.785756]  ip_finish_output+0x228/0x270
[  478.789204]  ? nf_hook_slow+0x44/0xc0
[  478.792490]  ip_output+0x6c/0xe0
[  478.795685]  ? ip_append_data.part.49+0xd0/0xd0
[  478.799403]  __ip_queue_xmit+0x15d/0x410
[  478.802945]  ? set_fd_set.part.7+0x40/0x40
[  478.806411]  __tcp_transmit_skb+0x527/0xb10
[  478.810032]  tcp_write_xmit+0x384/0x1000
[  478.813636]  ? _copy_from_iter_full+0x94/0x240
[  478.817438]  __tcp_push_pending_frames+0x31/0xd0
[  478.821170]  tcp_sendmsg_locked+0xc1c/0xd50
[  478.824714]  tcp_sendmsg+0x27/0x40
[  478.827921]  sock_sendmsg+0x36/0x40
[  478.831280]  sock_write_iter+0x97/0x100
[  478.834714]  new_sync_write+0xfb/0x160
[  478.838010]  vfs_write+0xa5/0x1a0
[  478.841129]  ksys_write+0x57/0xd0
[  478.844250]  do_syscall_64+0x50/0xf0
[  478.847526]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  478.851385] RIP: 0033:0x7fd74beba504
[  478.854598] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[  478.867315] RSP: 002b:00007ffc1d456638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  478.873758] RAX: ffffffffffffffda RBX: 0000000000000084 RCX: 00007fd74beba504
[  478.878456] RDX: 0000000000000084 RSI: 000055785f33bb90 RDI: 0000000000000003
[  478.883176] RBP: 000055785f31d630 R08: 0000000000000000 R09: 0000000000001000
[  478.887885] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000000001dd
[  478.892646] R13: 000055785ddc9b00 R14: 0000000000000003 R15: 00007ffc1d4566e0
[  478.897480] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink br_netfilter bridge stp llc binfmt_misc overlay crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_rapl_perf evdev serio_raw button ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crc32c_intel aesni_intel nvme aes_x86_64 crypto_simd ena nvme_core cryptd glue_helper
[  478.931979] CR2: 0000000000000010

Let me know if I can provide any other information that may be of use.

Shane Frasier


Reply to: