Bug#914505: linux: reference to netfilter chain not removed on rule replacement, subsequently system hangs
- To: Christoph Anton Mitterer <calestyo@scientia.net>, 914505@bugs.debian.org
- Subject: Bug#914505: linux: reference to netfilter chain not removed on rule replacement, subsequently system hangs
- From: Salvatore Bonaccorso <carnil@debian.org>
- Date: Sun, 2 May 2021 13:36:44 +0200
- Message-id: <YI6OzDcx1Gabk/ZY@eldamar.lan>
- Reply-to: Salvatore Bonaccorso <carnil@debian.org>, 914505@bugs.debian.org
- In-reply-to: <154302768656.4121.16485878097216388860.reportbug@heisenberg.scientia.net>
- References: <154302768656.4121.16485878097216388860.reportbug@heisenberg.scientia.net> <154302768656.4121.16485878097216388860.reportbug@heisenberg.scientia.net>
Control: tags -1 + moreinfo
On Sat, Nov 24, 2018 at 03:48:06AM +0100, Christoph Anton Mitterer wrote:
> Source: linux
> Version: 4.18.20-1
> Severity: important
> Tags: upstream
>
>
> Hi.
>
> Possibly the following may be also partially iptables (i.e. the userland tool) fault.
>
> I'm using fail2ban with some custom usage mode, which is that the hook-rule
> for fail2ban's change isn't just appended somehwere, but an inserted at just
> the right point in my iptables rules (loaded at boot by netfilter-persistent).
>
> This looks e.g. like the following in terms of rules:
> ...
> -A INPUT --in-interface lo -m comment --comment "f2b-hook-sshd"
> -A INPUT --destination 0.ssh.srv.localhost --protocol tcp -m tcp --destination-port ssh --syn -j ACCEPT
> ...
> (where the first rule servers as a dummy rule)
>
> And an /etc/fail2ban/action.d/iptables-multiport.conf which looks like:
> ...
> actionstart = <iptables> -N f2b-<name>
> <iptables> -A f2b-<name> -j <returntype>
> rulenum="$( <iptables> -L <chain> --line-numbers | grep '/\* f2b-hook-<name> \*/' | cut -d ' ' -f 1 )"
> <iptables> -R <chain> "${rulenum}" -p <protocol> -m multiport --dports <port> -j f2b-<name>
> ...
> actionstop = rulenum="$( <iptables> -L <chain> --line-numbers | grep f2b-<name> | cut -d ' ' -f 1 )"
> <iptables> -R <chain> "${rulenum}" --in-interface lo -m comment --comment f2b-hook-<name>
> <iptables> -F f2b-<name>
> <iptables> -X f2b-<name>
> ...
>
> So far so good.
>
>
> When fail2ban starts I get something like this:
> # iptables -L
> Chain INPUT (policy DROP)
> target prot opt source destination
> ACCEPT all -- anywhere anywhere
> ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
> ACCEPT icmp -- anywhere anywhere
> DROP all -- anywhere anywhere state INVALID,UNTRACKED
> f2b-sshd tcp -- anywhere anywhere multiport dports ssh
> REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
> ...
> Chain f2b-sshd (1 references)
> target prot opt source destination
> RETURN all -- anywhere anywhere
>
>
> Everything fine.
>
>
> But now:
>
>
> 1) Replacing the rule that causes the reference to f2b-sshd doesn't clear the reference.
> Now when I stop fail2ban it will do something like:
> iptables -R INPUT 5 --in-interface lo -m comment --comment f2b-hook-<name>
> i.e. bringing me back the original dummy rule, but here some error happens on either
> iptable or the kernel or both:
> # iptables -L
> Chain INPUT (policy DROP)
> target prot opt source destination
> ACCEPT all -- anywhere anywhere
> ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
> ACCEPT icmp -- anywhere anywhere
> DROP all -- anywhere anywhere state INVALID,UNTRACKED
> all -- anywhere anywhere /* f2b-hook-ssh */
> REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
> ...
> Chain f2b-sshd (1 references)
> target prot opt source destination
>
> The dummy rule in INPUT brought back, the chain f2b-sshd is flushed but left back
> with reference set to 1, which is obviously wrong, as the rule no longer
> references the queue.
>
> This also happens when just calling the iptables commands manually.
> It does not happen when e.g. deleting the rules (iptables -D) as fail2ban would
> do per default.
>
>
> If I repeat this multiple times, I can make the references even count up, e.g.:
> Chain f2b-sshd (2 references)
> target prot opt source destination
>
>
>
> 2) The kernel is now in state from which it cannot recover,...
> it seems.
>
> It doesn't seem possible to be possible to get the broken chain
> away... including when I deleted the rule that was replaced (better
> said its replacement).
>
> When I try to start from scratch with e.g.
> # iptables-restore < /etc/iptables/rules.v4
> The process hangs and I get a:
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115308] ------------[ cut here ]------------
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115320] kernel BUG at /build/linux-iActNR/linux-4.18.10/net/netfilter/nf_tables_api.c:1364!
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115367] invalid opcode: 0000 [#1] SMP PTI
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115379] CPU: 3 PID: 17642 Comm: iptables-restor Not tainted 4.18.0-2-amd64 #1 Debian 4.18.10-2
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115382] Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115412] RIP: 0010:nf_tables_chain_destroy.isra.48+0x95/0xa0 [nf_tables]
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115414] Code: 51 bf ab d8 48 8b 7b 58 e8 78 5b b3 d8 48 89 ef 5b 5d e9 6e 5b b3 d8 48 8b 7b 58 e8 65 5b b3 d8 48 89 df 5b 5d e9 5b 5b b3 d8 <0f> 0b 0f 0b eb 9c 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 07 8b 90
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115450] RSP: 0018:ffffa6c70aaf3998 EFLAGS: 00010202
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115454] RAX: 0000000000000001 RBX: ffffffff9a2dafc0 RCX: dead000000000200
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115456] RDX: ffff99a0758d3cc0 RSI: ffff99a18a8fa980 RDI: ffff99a22927ef00
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115457] RBP: ffff99a0758d3cc0 R08: 0000000000000000 R09: ffffffffc08ff600
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115459] R10: ffff99a18a8fae00 R11: 0000000000000001 R12: dead000000000200
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115461] R13: dead000000000100 R14: ffff99a18a8fa980 R15: ffffffff9a2dc220
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115464] FS: 00007fb5a8b28b80(0000) GS:ffff99a25dd80000(0000) knlGS:0000000000000000
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115467] CR2: 00005600467ffda4 CR3: 000000058cfa6005 CR4: 00000000003606e0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115470] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115472] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115473] Call Trace:
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115495] nf_tables_commit+0xd13/0x1110 [nf_tables]
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115516] nfnetlink_rcv_batch+0x562/0x6d0 [nfnetlink]
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115538] ? kmem_cache_alloc_node_trace+0x1b0/0x1e0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115549] ? alloc_vmap_area+0x7c/0x360
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115553] ? __insert_vmap_area+0x99/0x100
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115562] ? refcount_inc+0x5/0x30
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115571] ? apparmor_capable+0x72/0xb0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115580] ? security_capable+0x35/0x50
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115587] ? nla_parse+0x32/0x100
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115592] nfnetlink_rcv+0x11e/0x13c [nfnetlink]
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115604] netlink_unicast+0x1c2/0x250
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115609] netlink_sendmsg+0x2c1/0x3b0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115620] sock_sendmsg+0x36/0x40
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115626] ___sys_sendmsg+0x2a0/0x2f0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115639] ? filemap_map_pages+0x385/0x3a0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115642] ? refcount_inc+0x5/0x30
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115650] ? apparmor_capable+0x72/0xb0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115655] ? security_capable+0x35/0x50
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115660] ? __sys_sendmsg+0x5e/0xa0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115665] __sys_sendmsg+0x5e/0xa0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115677] do_syscall_64+0x55/0x110
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115688] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115696] RIP: 0033:0x7fb5a8e36354
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115697] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 80 00 00 00 00 48 8d 05 91 36 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 41 89 d4 53 48 89 f5
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115733] RSP: 002b:00007fff63f0ea38 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115736] RAX: ffffffffffffffda RBX: 00007fff63f0ea50 RCX: 00007fb5a8e36354
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115739] RDX: 0000000000000000 RSI: 00007fff63f0fad0 RDI: 0000000000000003
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115740] RBP: 00007fff63f10150 R08: 0000000000000004 R09: 00007fb5a8af6f40
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115742] R10: 00007fff63f0fabc R11: 0000000000000246 R12: 00005628c5276740
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115744] R13: 00007fff63f12a20 R14: 00007fff63f0ea40 R15: 00007fff63f12a58
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115747] Modules linked in: udp_diag tcp_diag inet_diag nft_chain_route_ipv4 xt_CHECKSUM nft_chain_nat_ipv4 ipt_MASQUERADE nf_nat_ipv4 nf_nat tun bridge stp llc ctr ccm fuse devlink ebtable_filter ebtables cpufreq_userspace cpufreq_powersave cpufreq_conservative arc4 snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support intel_rapl snd_hda_codec_realtek nf_conntrack_ipv6 nf_defrag_ipv6 x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_generic xt_tcpudp kvm_intel iwlmvm snd_soc_skl snd_soc_skl_ipc ip6t_REJECT snd_soc_sst_ipc nf_reject_ipv6 snd_soc_sst_dsp kvm snd_hda_ext_core irqbypass snd_soc_acpi mac80211 crct10dif_pclmul snd_soc_core crc32_pclmul snd_compress btusb btrtl snd_hda_intel btbcm btintel snd_hda_codec ghash_clmulni_intel bluetooth snd_hda_core intel_cstate iwlwifi snd_hwdep uvcvideo
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115815] intel_uncore snd_pcm videobuf2_vmalloc videobuf2_memops cdc_mbim intel_rapl_perf videobuf2_v4l2 snd_timer cdc_wdm videobuf2_common nf_conntrack_ipv4 cdc_ncm nf_defrag_ipv4 usbnet videodev mii snd pcspkr sdhci_pci cqhci joydev media drbg i915 soundcore sdhci ansi_cprng idma64 nft_counter mmc_core cfg80211 sg ecdh_generic drm_kms_helper crc16 rfkill mei_me intel_lpss_pci drm i2c_i801 intel_lpss xt_comment mei i2c_algo_bit ipt_REJECT nf_reject_ipv4 wmi button battery xt_multiport xt_policy xt_state xt_conntrack nf_conntrack nft_compat tpm_crb fujitsu_laptop tpm_tis tpm_tis_core sparse_keymap video tpm pcc_cpufreq acpi_pad ac rng_core nf_tables nfnetlink binfmt_misc loop parport_pc sunrpc ppdev lp parport ip_tables x_tables autofs4 dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115895] async_pq async_xor async_tx raid1 raid0 multipath linear md_mod btrfs libcrc32c crc32c_generic xor zstd_decompress zstd_compress xxhash raid6_pq uhci_hcd ehci_pci ehci_hcd usb_storage sd_mod crc32c_intel ahci libahci xhci_pci xhci_hcd aesni_intel aes_x86_64 crypto_simd libata cryptd glue_helper evdev scsi_mod psmouse serio_raw e1000e usbcore usb_common
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115967] ---[ end trace 78344f348b2da5ca ]---
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115982] RIP: 0010:nf_tables_chain_destroy.isra.48+0x95/0xa0 [nf_tables]
> Nov 24 03:00:01 heisenberg kernel: [ 9857.115984] Code: 51 bf ab d8 48 8b 7b 58 e8 78 5b b3 d8 48 89 ef 5b 5d e9 6e 5b b3 d8 48 8b 7b 58 e8 65 5b b3 d8 48 89 df 5b 5d e9 5b 5b b3 d8 <0f> 0b 0f 0b eb 9c 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 07 8b 90
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116015] RSP: 0018:ffffa6c70aaf3998 EFLAGS: 00010202
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116017] RAX: 0000000000000001 RBX: ffffffff9a2dafc0 RCX: dead000000000200
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116018] RDX: ffff99a0758d3cc0 RSI: ffff99a18a8fa980 RDI: ffff99a22927ef00
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116020] RBP: ffff99a0758d3cc0 R08: 0000000000000000 R09: ffffffffc08ff600
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116022] R10: ffff99a18a8fae00 R11: 0000000000000001 R12: dead000000000200
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116023] R13: dead000000000100 R14: ffff99a18a8fa980 R15: ffffffff9a2dc220
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116025] FS: 00007fb5a8b28b80(0000) GS:ffff99a25dd80000(0000) knlGS:0000000000000000
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116028] CR2: 00005600467ffda4 CR3: 000000058cfa6005 CR4: 00000000003606e0
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Nov 24 03:00:01 heisenberg kernel: [ 9857.116032] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
>
> Further, any networking seems dead now (probably because netfilter has said goodbye).
> Cleanly rebooting also fails as systemd tries to shutdown all kinds of (now hanging)
> networking stuff (including netfilter-persistent) and waits forevery during shutdown.
>
>
> I'd guess this can be clearly not just an error in userland tools... or at least kernel
> shouldn't allow userland to get it into such bad state.
Is this issue still reproducible for you with a recent kernel from
unstable or buster-backports? There seem to have been a couple of
commits in this area after 4.18, which might have resolved the issue.
If it is not repoducible anymore, let's rather otherwise close the
issue.
Regards,
Salvatore
Reply to: