On Mon, 3 Aug 2020 09:55:28 +0200 =?UTF-8?Q?C=c3=a9dric_Dufour?= <
cedric.dufour@ced-network.net> wrote:
> Package: linux-source-4.19
> Version: 4.19.132-1
> Severity: important
>
> Hello,
>
> Since linux-image-4.19.0-10-amd64, I'm facing regular Kernel panics - "RIP: 0010:__cgroup_bpf_run_filter_skb+0x26d/0x3d0" - resulting in full (file) *server freeze*.
>
> The issue is pretty well described and summarized in
https://forum.proxmox.com/threads/kernel-5-4-44-causes-system-freeze-on-hp-microserver-gen8.72050/page-2#post-323498>
> The "culprit" commit - "netprio_cgroup: Fix unlimited memory leak of v2 cgroups" - is indeed included in Debian kernel (4.19) since changelog entry 4.19.131-1
>
> It *seems* there is already a patch proposed upstream (although here for kernel 4.9):
https://lkml.org/lkml/2020/7/20/883>
> Best regards,
>
> Cédric
>
> --
> Cédric Dufour
>
>
FWIW, I am seeing a very similar issue. Some Debian 10 AWS instances used to run Guacamole via Docker recently started randomly freezing up on me. I enabled kernel dumps and finally caught one of the machines misbehaving. Looking at the kdump I see this:
KERNEL: /usr/lib/debug/vmlinux-4.19.0-10-cloud-amd64
DUMPFILE: dump.202008101612 [PARTIAL DUMP]
CPUS: 2
DATE: Mon Aug 10 16:11:47 2020
UPTIME: 00:05:44
LOAD AVERAGE: 0.21, 0.11, 0.04
TASKS: 261
NODENAME: guac.env0.staging.cool.cyber.dhs.gov
RELEASE: 4.19.0-10-cloud-amd64
VERSION: #1 SMP Debian 4.19.132-1 (2020-07-24)
MACHINE: x86_64 (2499 Mhz)
MEMORY: 4 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000010"
PID: 1453
COMMAND: "sshd"
TASK: ffff8a3f695115c0 [THREAD_INFO: ffff8a3f695115c0]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 1453 TASK: ffff8a3f695115c0 CPU: 0 COMMAND: "sshd"
#0 [ffffb37740c77800] machine_kexec at ffffffff97a4b297
#1 [ffffb37740c77858] __crash_kexec at ffffffff97b0e7dd
#2 [ffffb37740c77920] crash_kexec at ffffffff97b0f62d
#3 [ffffb37740c77938] oops_end at ffffffff97a2907d
#4 [ffffb37740c77958] no_context at ffffffff97a5858e
#5 [ffffb37740c779b0] __do_page_fault at ffffffff97a58c42
#6 [ffffb37740c77a20] async_page_fault at ffffffff982010be
[exception RIP: __cgroup_bpf_run_filter_skb+189]
RIP: ffffffff97b94ffd RSP: ffffb37740c77ad0 RFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8a3ff55e5ee8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff8a3ff3d49800 RDI: ffff8a3ff52fd500
RBP: ffff8a3ff52fd500 R8: ffff8a3ff55e5ee8 R9: 0000000000010000
R10: 0000000000000001 R11: ffff8a3ef6dd7500 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8a3ff52fd840 R15: ffff8a3ff55e5ee8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb37740c77b30] ip_finish_output at ffffffff97f65988
#8 [ffffb37740c77b68] ip_output at ffffffff97f6640c
#9 [ffffb37740c77bc0] __ip_queue_xmit at ffffffff97f65e6d
#10 [ffffb37740c77c18] __tcp_transmit_skb at ffffffff97f80557
#11 [ffffb37740c77c88] tcp_write_xmit at ffffffff97f81e34
#12 [ffffb37740c77cf0] __tcp_push_pending_frames at ffffffff97f82ae1
#13 [ffffb37740c77d00] tcp_sendmsg_locked at ffffffff97f733ac
#14 [ffffb37740c77da8] tcp_sendmsg at ffffffff97f73507
#15 [ffffb37740c77dc8] sock_sendmsg at ffffffff97ee8aa6
#16 [ffffb37740c77de0] sock_write_iter at ffffffff97ee8b47
#17 [ffffb37740c77e50] new_sync_write at ffffffff97c49bfb
#18 [ffffb37740c77ed0] vfs_write at ffffffff97c4c7d5
#19 [ffffb37740c77f00] ksys_write at ffffffff97c4ca77
#20 [ffffb37740c77f38] do_syscall_64 at ffffffff97a04140
#21 [ffffb37740c77f50] entry_SYSCALL_64_after_hwframe at ffffffff98200088
RIP: 00007fd74beba504 RSP: 00007ffc1d456638 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000084 RCX: 00007fd74beba504
RDX: 0000000000000084 RSI: 000055785f33bb90 RDI: 0000000000000003
RBP: 000055785f31d630 R8: 0000000000000000 R9: 0000000000001000
R10: 0000000000000008 R11: 0000000000000246 R12: 00000000000001dd
R13: 000055785ddc9b00 R14: 0000000000000003 R15: 00007ffc1d4566e0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> sym ffffffff97b94ffd
ffffffff97b94ffd (T) __cgroup_bpf_run_filter_skb+189 ./debian/build/build_amd64_none_cloud-amd64/./kernel/bpf/cgroup.c: 539
crash> log
[ 0.000000] Linux version 4.19.0-10-cloud-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.132-1 (2020-07-24)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-10-cloud-amd64 root=UUID=9ac8f5bd-5b64-48cd-9efd-2b2d35a30500 ro console=tty0 console=ttyS0,115200 earlyprintk=ttyS0,115200 nmi_watchdog=1 elevator=noop scsi_mod.use_blk_mq=Y crashkernel=384M-:128M
<SNIP>
[ 478.686368] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 478.693551] PGD 0 P4D 0
[ 478.696291] Oops: 0000 [#1] SMP PTI
[ 478.699431] CPU: 0 PID: 1453 Comm: sshd Kdump: loaded Not tainted 4.19.0-10-cloud-amd64 #1 Debian 4.19.132-1
[ 478.706782] Hardware name: Amazon EC2 t3.medium/, BIOS 1.0 10/16/2017
[ 478.711129] RIP: 0010:__cgroup_bpf_run_filter_skb+0xbd/0x1e0
[ 478.715172] Code: 00 00 00 49 89 7f 18 48 89 0c 24 44 89 e1 48 29 c8 48 89 4c 24 08 49 89 87 d8 00 00 00 89 d2 48 8d 84 d6 b0 03 00 00 48 8b 00 <48> 8b 58 10 4c 8d 70 10 48 85 db 0f 84 01 01 00 00 4d 8d 6f 30 bd
[ 478.727711] RSP: 0018:ffffb37740c77ad0 EFLAGS: 00010286
[ 478.731595] RAX: 0000000000000000 RBX: ffff8a3ff55e5ee8 RCX: 0000000000000000
[ 478.736351] RDX: 0000000000000001 RSI: ffff8a3ff3d49800 RDI: ffff8a3ff52fd500
[ 478.741042] RBP: ffff8a3ff52fd500 R08: ffff8a3ff55e5ee8 R09: 0000000000010000
[ 478.745697] R10: 0000000000000001 R11: ffff8a3ef6dd7500 R12: 0000000000000000
[ 478.750446] R13: 0000000000000000 R14: ffff8a3ff52fd840 R15: ffff8a3ff55e5ee8
[ 478.755161] FS: 00007fd74bb17e40(0000) GS:ffff8a3ff7e00000(0000) knlGS:0000000000000000
[ 478.761724] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 478.765853] CR2: 0000000000000010 CR3: 00000000a94e6005 CR4: 00000000007606b0
[ 478.770524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 478.775273] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 478.779984] PKRU: 55555554
[ 478.782901] Call Trace:
[ 478.785756] ip_finish_output+0x228/0x270
[ 478.789204] ? nf_hook_slow+0x44/0xc0
[ 478.792490] ip_output+0x6c/0xe0
[ 478.795685] ? ip_append_data.part.49+0xd0/0xd0
[ 478.799403] __ip_queue_xmit+0x15d/0x410
[ 478.802945] ? set_fd_set.part.7+0x40/0x40
[ 478.806411] __tcp_transmit_skb+0x527/0xb10
[ 478.810032] tcp_write_xmit+0x384/0x1000
[ 478.813636] ? _copy_from_iter_full+0x94/0x240
[ 478.817438] __tcp_push_pending_frames+0x31/0xd0
[ 478.821170] tcp_sendmsg_locked+0xc1c/0xd50
[ 478.824714] tcp_sendmsg+0x27/0x40
[ 478.827921] sock_sendmsg+0x36/0x40
[ 478.831280] sock_write_iter+0x97/0x100
[ 478.834714] new_sync_write+0xfb/0x160
[ 478.838010] vfs_write+0xa5/0x1a0
[ 478.841129] ksys_write+0x57/0xd0
[ 478.844250] do_syscall_64+0x50/0xf0
[ 478.847526] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 478.851385] RIP: 0033:0x7fd74beba504
[ 478.854598] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 478.867315] RSP: 002b:00007ffc1d456638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 478.873758] RAX: ffffffffffffffda RBX: 0000000000000084 RCX: 00007fd74beba504
[ 478.878456] RDX: 0000000000000084 RSI: 000055785f33bb90 RDI: 0000000000000003
[ 478.883176] RBP: 000055785f31d630 R08: 0000000000000000 R09: 0000000000001000
[ 478.887885] R10: 0000000000000008 R11: 0000000000000246 R12: 00000000000001dd
[ 478.892646] R13: 000055785ddc9b00 R14: 0000000000000003 R15: 00007ffc1d4566e0
[ 478.897480] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack ipt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nft_chain_nat_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_tables nfnetlink br_netfilter bridge stp llc binfmt_misc overlay crct10dif_pclmul crc32_pclmul ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_rapl_perf evdev serio_raw button ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crc32c_intel aesni_intel nvme aes_x86_64 crypto_simd ena nvme_core cryptd glue_helper
[ 478.931979] CR2: 0000000000000010
Let me know if I can provide any other information that may be of use.
Shane Frasier