[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian Server (NAT Gateway) Periodically Crashing



Hello All,

I am hoping to get some help with one of my virtual machines. I am running a KVM host with several virtual machines provide internet services to a small network. The gateway machine is a Debian 8 minimum install that was updated to 8.5.

user1@gateway:~# sudo lsb_release -da
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.5 (jessie)
Release:        8.5
Codename:       jessie
user1@gateway:~# sudo uname -a
Linux gateway 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

Minimal additional packages are installed, only enough to support a NAT gateway.

Periodically, the vm experiences a kernel Oops and crashes, taking down internet access for the network. This is the only vm that is crashing, the other VMs (based off the same minimal install, updated to 8.5, minimum software installs) have uptimes of 100+ days. This VM seems to crash every few weeks.

I managed to somewhat stabilize the internet connection by enabling crash dumps and automatic reboots with instructions from here:
https://www.bentasker.co.uk/documentation/linux/312-installing-and-configuring-kdump-on-debian-jessie

All is well as mostly. The machines reboots after a crash dump, so there is minimum impact to the network. But occasionally, the PPPOE fails to redial after a reboot :-<  Aside from the PPPOE issue, I figured I would try to get to the root of why the vm is crashing to begin with. However, I am not developer and have no idea how to interpret the crash dump. As much as I can tell, there swapper/0  process caused the dump with instruction put_page+5. I have no idea what that means.

Any assistance as to why this host is crashing would be helpful. The only thing this host is doing is serving as a NAT gateway. I have having no problems with any other VMs with the same basic OS load.

As a start, I at least got the log and bt from the crash dump. I can provide additional crash info if needed (and givent he commands).

user1@gateway:/var/crash/201607040851# sudo crash kernel_link dump.201607040851
...version info removed...
      KERNEL: kernel_link     
    DUMPFILE: dump.201607040851  [PARTIAL DUMP]
        CPUS: 1
        DATE: Mon Jul  4 08:51:23 2016
      UPTIME: 4 days, 13:17:27
LOAD AVERAGE: 0.00, 0.01, 0.05
       TASKS: 67
    NODENAME: gateway
     RELEASE: 3.16.0-4-amd64
     VERSION: #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08)
     MACHINE: x86_64  (1596 Mhz)
      MEMORY: 2 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff8181a460  [THREAD_INFO: ffffffff81800000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0      TASK: ffffffff8181a460  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff88007fc039c8] machine_kexec at ffffffff8104c0a2
 #1 [ffff88007fc03a18] crash_kexec at ffffffff810df7da
 #2 [ffff88007fc03ad8] oops_end at ffffffff81016228
 #3 [ffff88007fc03af8] no_context at ffffffff8150b172
 #4 [ffff88007fc03b38] __do_page_fault at ffffffff810571c0
 #5 [ffff88007fc03c30] async_page_fault at ffffffff81516a58
    [exception RIP: put_page+5]
    RIP: ffffffff8114a935  RSP: ffff88007fc03ce8  RFLAGS: 00010206
    RAX: 0000000000000030  RBX: ffff88007974f4c0  RCX: 000000007974f400
    RDX: 0000000000000000  RSI: 00000000fffffe01  RDI: 0000000000000000
    RBP: 0000000000000001   R8: 0000000080000000   R9: ffff880036c500b0
    R10: 6db6db6db6db6db7  R11: 0000160000000000  R12: ffff880079a35d00
    R13: 0000000000000049  R14: ffff88007974f220  R15: ffff88007971bb00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88007fc03ce0] ip_finish_output2 at ffffffff81459756
 #7 [ffff88007fc03d20] ip_fragment at ffffffff8145a1c8
 #8 [ffff88007fc03d98] ip_finish_output at ffffffff8145a9d4
 #9 [ffff88007fc03dd8] __netif_receive_skb_core at ffffffff8141f1a3
#10 [ffff88007fc03e28] netif_receive_skb_internal at ffffffff8141f42f
#11 [ffff88007fc03e48] virtnet_poll at ffffffffa00375aa [virtio_net]
#12 [ffff88007fc03ed0] net_rx_action at ffffffff8141f7b0
#13 [ffff88007fc03f20] __do_softirq at ffffffff8106c6a1
#14 [ffff88007fc03f78] irq_exit at ffffffff8106ca75
#15 [ffff88007fc03f80] do_IRQ at ffffffff81517822
--- <IRQ stack> ---
#16 [ffffffff81803e48] ret_from_intr at ffffffff8151566d
    [exception RIP: native_safe_halt+2]
    RIP: ffffffff81051c12  RSP: ffffffff81803ef0  RFLAGS: 00000246
    RAX: ffffffff8101c8b0  RBX: 0000000000000086  RCX: ffffffff81855220
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000105dbbc7c  R11: 0000000000000104  R12: 000000000000d160
    R13: 0000000000000040  R14: ffffffff8108ae2d  R15: 0000000000000086
    ORIG_RAX: ffffffffffffff8e  CS: 0010  SS: 0018
#17 [ffffffff81803ef0] default_idle at ffffffff8101c8c9
#18 [ffffffff81803f08] cpu_startup_entry at ffffffff810a83e0
#19 [ffffffff81803f68] start_kernel at ffffffff81903076
#20 [ffffffff81803fa0] x86_64_start_kernel at ffffffff8190271f

crash> log
....cut.....
[    3.936064] FS-Cache: Loaded
[    3.948833] FS-Cache: Netfs 'nfs' registered for caching
[    3.967038] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   34.781171] random: nonblocking pool is initialized
[ 6171.133280] IPv4: martian source 169.254.39.87 from 169.254.39.87, on dev eth0
[ 6171.134424] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 08 06        ......0Y......
[ 6171.135287] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.135942] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 08 00        ......0Y......
[ 6171.137267] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.137267] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea 08 00        ..>.....[.x...
[ 6171.151332] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.152519] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 08 00        ......0Y......
[ 6171.153594] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.154369] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea 08 00        ..>.....[.x...
[ 6171.206225] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.207925] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 08 00        ......0Y......
[ 6171.209235] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.210167] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea 08 00        ..>.....[.x...
[ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.210203] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 08 00        ......0Y......
[ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.210203] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea 08 00        ..>.....[.x...
[ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87, on dev eth0
[ 6171.210203] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 08 00        ......0Y......
[393447.694364] BUG: unable to handle kernel NULL pointer dereference at           (null)
[393447.695801] IP: [<ffffffff8114a935>] put_page+0x5/0x30
[393447.697326] PGD 36ef2067 PUD 36ef1067 PMD 0
[393447.698305] Oops: 0000 [#1] SMP
[393447.698305] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc pppoe pppox ip6table_filter ppp_generic slhc ip6_tables xt_conntrack iptable_filter ipt_MASQUERADE xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle ip_tables x_tables crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ttm pcspkr evdev drm_kms_helper serio_raw virtio_balloon drm i2c_piix4 i2c_core parport_pc parport pvpanic processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 dm_mod virtio_net virtio_blk ata_generic crct10dif_pclmul crct10dif_common crc32c_intel psmouse uhci_hcd ehci_pci ehci_hcd ata_piix usbcore virtio_pci virtio_ring floppy usb_common virtio libata scsi_mod
[393447.698305] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2
[393447.698305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[393447.698305] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
[393447.698305] RIP: 0010:[<ffffffff8114a935>]  [<ffffffff8114a935>] put_page+0x5/0x30
[393447.698305] RSP: 0018:ffff88007fc03ce8  EFLAGS: 00010206
[393447.698305] RAX: 0000000000000030 RBX: ffff88007974f4c0 RCX: 000000007974f400
[393447.698305] RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: 0000000000000000
[393447.698305] RBP: 0000000000000001 R08: 0000000080000000 R09: ffff880036c500b0
[393447.698305] R10: 6db6db6db6db6db7 R11: 0000160000000000 R12: ffff880079a35d00
[393447.698305] R13: 0000000000000049 R14: ffff88007974f220 R15: ffff88007971bb00
[393447.698305] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[393447.698305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[393447.698305] CR2: 0000000000000000 CR3: 0000000036ec4000 CR4: 00000000000406f0
[393447.698305] Stack:
[393447.698305]  ffffffff8140f377 0000000000005d00 ffff880079a35d00 0000000000000000
[393447.698305]  ffffffff8140f647 0000000000005d00 ffff880079a35d00 ffffffff8145a1c8
[393447.698305]  0000001400000000 0000005d00000020 0000059a7974f400 ffff88007a080000
[393447.698305] Call Trace:
[393447.698305]  <IRQ>
[393447.698305]
[393447.698305]  [<ffffffff8140f377>] ? skb_release_data+0x87/0x110
[393447.698305]  [<ffffffff8140f647>] ? consume_skb+0x27/0x80
[393447.698305]  [<ffffffff8145a1c8>] ? ip_fragment+0x5b8/0x880
[393447.698305]  [<ffffffff81459600>] ? ip_reply_glue_bits+0x50/0x50
[393447.698305]  [<ffffffff8145a9d4>] ? ip_finish_output+0x544/0x850
[393447.698305]  [<ffffffff8141f1a3>] ? __netif_receive_skb_core+0x543/0x750
[393447.698305]  [<ffffffff8105198b>] ? kvm_clock_get_cycles+0x1b/0x20
[393447.698305]  [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80
[393447.698305]  [<ffffffffa00375aa>] ? virtnet_poll+0x52a/0x880 [virtio_net]
[393447.698305]  [<ffffffff8141f7b0>] ? net_rx_action+0x140/0x240
[393447.698305]  [<ffffffff8106c6a1>] ? __do_softirq+0xf1/0x290
[393447.698305]  [<ffffffff8106ca75>] ? irq_exit+0x95/0xa0
[393447.698305]  [<ffffffff81517822>] ? do_IRQ+0x52/0xe0
[393447.698305]  [<ffffffff8151566d>] ? common_interrupt+0x6d/0x6d
[393447.698305]  <EOI>
[393447.698305]
[393447.698305]  [<ffffffff8101c8b0>] ? idle_notifier_unregister+0x20/0x20
[393447.698305]  [<ffffffff81051c12>] ? native_safe_halt+0x2/0x10
[393447.698305]  [<ffffffff8101c8c9>] ? default_idle+0x19/0xb0
[393447.698305]  [<ffffffff810a83e0>] ? cpu_startup_entry+0x340/0x400
[393447.698305]  [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
[393447.698305]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
[393447.698305]  [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120
[393447.698305]  [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
[393447.698305] Code: 45 00 48 89 ef f6 c4 40 74 0a e8 67 fe ff ff e9 ee fe ff ff 66 90 e8 7b fe ff ff e9 e2 fe ff ff 66 0f 1f 44 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 75 0f 3e ff 4f 1c 74 04 c3 0f 1f 00 e9 53
[393447.698305] RIP  [<ffffffff8114a935>] put_page+0x5/0x30
[393447.698305]  RSP <ffff88007fc03ce8>
[393447.698305] CR2: 0000000000000000
crash>

Reply to: