[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Debian Server (NAT Gateway) Periodically Crashing



Hi,

I don't know much about kernel debugging myself, but this looks like
something to report. Take a look at [0] on how to do that.

Regards
/peter

[0] https://www.debian.org/Bugs/Reporting

Am 07.07.2016 um 08:38 schrieb Christian Harris:
> Hello All,
> 
> I am hoping to get some help with one of my virtual machines. I am
> running a KVM host with several virtual machines provide internet
> services to a small network. The gateway machine is a Debian 8 minimum
> install that was updated to 8.5.
> 
> user1@gateway:~# sudo lsb_release -da
> No LSB modules are available.
> Distributor ID: Debian
> Description:    Debian GNU/Linux 8.5 (jessie)
> Release:        8.5
> Codename:       jessie
> user1@gateway:~# sudo uname -a
> Linux gateway 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08)
> x86_64 GNU/Linux
> 
> Minimal additional packages are installed, only enough to support a NAT
> gateway.
> 
> Periodically, the vm experiences a kernel Oops and crashes, taking down
> internet access for the network. This is the only vm that is crashing,
> the other VMs (based off the same minimal install, updated to 8.5,
> minimum software installs) have uptimes of 100+ days. This VM seems to
> crash every few weeks.
> 
> I managed to somewhat stabilize the internet connection by enabling
> crash dumps and automatic reboots with instructions from here:
> https://www.bentasker.co.uk/documentation/linux/312-installing-and-configuring-kdump-on-debian-jessie
> 
> All is well as mostly. The machines reboots after a crash dump, so there
> is minimum impact to the network. But occasionally, the PPPOE fails to
> redial after a reboot :-<  Aside from the PPPOE issue, I figured I would
> try to get to the root of why the vm is crashing to begin with. However,
> I am not developer and have no idea how to interpret the crash dump. As
> much as I can tell, there swapper/0  process caused the dump with
> instruction put_page+5. I have no idea what that means.
> 
> Any assistance as to why this host is crashing would be helpful. The
> only thing this host is doing is serving as a NAT gateway. I have having
> no problems with any other VMs with the same basic OS load.
> 
> As a start, I at least got the log and bt from the crash dump. I can
> provide additional crash info if needed (and givent he commands).
> 
> user1@gateway:/var/crash/201607040851# sudo crash kernel_link
> dump.201607040851
> ...version info removed...
>       KERNEL: kernel_link     
>     DUMPFILE: dump.201607040851  [PARTIAL DUMP]
>         CPUS: 1
>         DATE: Mon Jul  4 08:51:23 2016
>       UPTIME: 4 days, 13:17:27
> LOAD AVERAGE: 0.00, 0.01, 0.05
>        TASKS: 67
>     NODENAME: gateway
>      RELEASE: 3.16.0-4-amd64
>      VERSION: #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08)
>      MACHINE: x86_64  (1596 Mhz)
>       MEMORY: 2 GB
>        PANIC: "Oops: 0000 [#1] SMP " (check log for details)
>          PID: 0
>      COMMAND: "swapper/0"
>         TASK: ffffffff8181a460  [THREAD_INFO: ffffffff81800000]
>          CPU: 0
>        STATE: TASK_RUNNING (PANIC)
> 
> crash> bt
> PID: 0      TASK: ffffffff8181a460  CPU: 0   COMMAND: "swapper/0"
>  #0 [ffff88007fc039c8] machine_kexec at ffffffff8104c0a2
>  #1 [ffff88007fc03a18] crash_kexec at ffffffff810df7da
>  #2 [ffff88007fc03ad8] oops_end at ffffffff81016228
>  #3 [ffff88007fc03af8] no_context at ffffffff8150b172
>  #4 [ffff88007fc03b38] __do_page_fault at ffffffff810571c0
>  #5 [ffff88007fc03c30] async_page_fault at ffffffff81516a58
>     [exception RIP: put_page+5]
>     RIP: ffffffff8114a935  RSP: ffff88007fc03ce8  RFLAGS: 00010206
>     RAX: 0000000000000030  RBX: ffff88007974f4c0  RCX: 000000007974f400
>     RDX: 0000000000000000  RSI: 00000000fffffe01  RDI: 0000000000000000
>     RBP: 0000000000000001   R8: 0000000080000000   R9: ffff880036c500b0
>     R10: 6db6db6db6db6db7  R11: 0000160000000000  R12: ffff880079a35d00
>     R13: 0000000000000049  R14: ffff88007974f220  R15: ffff88007971bb00
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #6 [ffff88007fc03ce0] ip_finish_output2 at ffffffff81459756
>  #7 [ffff88007fc03d20] ip_fragment at ffffffff8145a1c8
>  #8 [ffff88007fc03d98] ip_finish_output at ffffffff8145a9d4
>  #9 [ffff88007fc03dd8] __netif_receive_skb_core at ffffffff8141f1a3
> #10 [ffff88007fc03e28] netif_receive_skb_internal at ffffffff8141f42f
> #11 [ffff88007fc03e48] virtnet_poll at ffffffffa00375aa [virtio_net]
> #12 [ffff88007fc03ed0] net_rx_action at ffffffff8141f7b0
> #13 [ffff88007fc03f20] __do_softirq at ffffffff8106c6a1
> #14 [ffff88007fc03f78] irq_exit at ffffffff8106ca75
> #15 [ffff88007fc03f80] do_IRQ at ffffffff81517822
> --- <IRQ stack> ---
> #16 [ffffffff81803e48] ret_from_intr at ffffffff8151566d
>     [exception RIP: native_safe_halt+2]
>     RIP: ffffffff81051c12  RSP: ffffffff81803ef0  RFLAGS: 00000246
>     RAX: ffffffff8101c8b0  RBX: 0000000000000086  RCX: ffffffff81855220
>     RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
>     RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
>     R10: 0000000105dbbc7c  R11: 0000000000000104  R12: 000000000000d160
>     R13: 0000000000000040  R14: ffffffff8108ae2d  R15: 0000000000000086
>     ORIG_RAX: ffffffffffffff8e  CS: 0010  SS: 0018
> #17 [ffffffff81803ef0] default_idle at ffffffff8101c8c9
> #18 [ffffffff81803f08] cpu_startup_entry at ffffffff810a83e0
> #19 [ffffffff81803f68] start_kernel at ffffffff81903076
> #20 [ffffffff81803fa0] x86_64_start_kernel at ffffffff8190271f
> 
> crash> log
> ....cut.....
> [    3.936064] FS-Cache: Loaded
> [    3.948833] FS-Cache: Netfs 'nfs' registered for caching
> [    3.967038] Installing knfsd (copyright (C) 1996 okir@monad.swb.de
> <mailto:okir@monad.swb.de>).
> [   34.781171] random: nonblocking pool is initialized
> [ 6171.133280] IPv4: martian source 169.254.39.87 from 169.254.39.87, on
> dev eth0
> [ 6171.134424] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14
> 08 06        ......0Y......
> [ 6171.135287] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.135942] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14
> 08 00        ......0Y......
> [ 6171.137267] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.137267] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea
> 08 00        ..>.....[.x...
> [ 6171.151332] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.152519] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14
> 08 00        ......0Y......
> [ 6171.153594] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.154369] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea
> 08 00        ..>.....[.x...
> [ 6171.206225] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.207925] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14
> 08 00        ......0Y......
> [ 6171.209235] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.210167] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea
> 08 00        ..>.....[.x...
> [ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.210203] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14
> 08 00        ......0Y......
> [ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.210203] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea
> 08 00        ..>.....[.x...
> [ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87,
> on dev eth0
> [ 6171.210203] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14
> 08 00        ......0Y......
> [393447.694364] BUG: unable to handle kernel NULL pointer dereference
> at           (null)
> [393447.695801] IP: [<ffffffff8114a935>] put_page+0x5/0x30
> [393447.697326] PGD 36ef2067 PUD 36ef1067 PMD 0
> [393447.698305] Oops: 0000 [#1] SMP
> [393447.698305] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl
> nfs lockd fscache sunrpc pppoe pppox ip6table_filter ppp_generic slhc
> ip6_tables xt_conntrack iptable_filter ipt_MASQUERADE xt_nat iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle ip_tables x_tables
> crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper
> ablk_helper cryptd ttm pcspkr evdev drm_kms_helper serio_raw
> virtio_balloon drm i2c_piix4 i2c_core parport_pc parport pvpanic
> processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 dm_mod
> virtio_net virtio_blk ata_generic crct10dif_pclmul crct10dif_common
> crc32c_intel psmouse uhci_hcd ehci_pci ehci_hcd ata_piix usbcore
> virtio_pci virtio_ring floppy usb_common virtio libata scsi_mod
> [393447.698305] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64
> #1 Debian 3.16.7-ckt25-2
> [393447.698305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [393447.698305] task: ffffffff8181a460 ti: ffffffff81800000 task.ti:
> ffffffff81800000
> [393447.698305] RIP: 0010:[<ffffffff8114a935>]  [<ffffffff8114a935>]
> put_page+0x5/0x30
> [393447.698305] RSP: 0018:ffff88007fc03ce8  EFLAGS: 00010206
> [393447.698305] RAX: 0000000000000030 RBX: ffff88007974f4c0 RCX:
> 000000007974f400
> [393447.698305] RDX: 0000000000000000 RSI: 00000000fffffe01 RDI:
> 0000000000000000
> [393447.698305] RBP: 0000000000000001 R08: 0000000080000000 R09:
> ffff880036c500b0
> [393447.698305] R10: 6db6db6db6db6db7 R11: 0000160000000000 R12:
> ffff880079a35d00
> [393447.698305] R13: 0000000000000049 R14: ffff88007974f220 R15:
> ffff88007971bb00
> [393447.698305] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000)
> knlGS:0000000000000000
> [393447.698305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [393447.698305] CR2: 0000000000000000 CR3: 0000000036ec4000 CR4:
> 00000000000406f0
> [393447.698305] Stack:
> [393447.698305]  ffffffff8140f377 0000000000005d00 ffff880079a35d00
> 0000000000000000
> [393447.698305]  ffffffff8140f647 0000000000005d00 ffff880079a35d00
> ffffffff8145a1c8
> [393447.698305]  0000001400000000 0000005d00000020 0000059a7974f400
> ffff88007a080000
> [393447.698305] Call Trace:
> [393447.698305]  <IRQ>
> [393447.698305]
> [393447.698305]  [<ffffffff8140f377>] ? skb_release_data+0x87/0x110
> [393447.698305]  [<ffffffff8140f647>] ? consume_skb+0x27/0x80
> [393447.698305]  [<ffffffff8145a1c8>] ? ip_fragment+0x5b8/0x880
> [393447.698305]  [<ffffffff81459600>] ? ip_reply_glue_bits+0x50/0x50
> [393447.698305]  [<ffffffff8145a9d4>] ? ip_finish_output+0x544/0x850
> [393447.698305]  [<ffffffff8141f1a3>] ? __netif_receive_skb_core+0x543/0x750
> [393447.698305]  [<ffffffff8105198b>] ? kvm_clock_get_cycles+0x1b/0x20
> [393447.698305]  [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80
> [393447.698305]  [<ffffffffa00375aa>] ? virtnet_poll+0x52a/0x880
> [virtio_net]
> [393447.698305]  [<ffffffff8141f7b0>] ? net_rx_action+0x140/0x240
> [393447.698305]  [<ffffffff8106c6a1>] ? __do_softirq+0xf1/0x290
> [393447.698305]  [<ffffffff8106ca75>] ? irq_exit+0x95/0xa0
> [393447.698305]  [<ffffffff81517822>] ? do_IRQ+0x52/0xe0
> [393447.698305]  [<ffffffff8151566d>] ? common_interrupt+0x6d/0x6d
> [393447.698305]  <EOI>
> [393447.698305]
> [393447.698305]  [<ffffffff8101c8b0>] ? idle_notifier_unregister+0x20/0x20
> [393447.698305]  [<ffffffff81051c12>] ? native_safe_halt+0x2/0x10
> [393447.698305]  [<ffffffff8101c8c9>] ? default_idle+0x19/0xb0
> [393447.698305]  [<ffffffff810a83e0>] ? cpu_startup_entry+0x340/0x400
> [393447.698305]  [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
> [393447.698305]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
> [393447.698305]  [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120
> [393447.698305]  [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c
> [393447.698305] Code: 45 00 48 89 ef f6 c4 40 74 0a e8 67 fe ff ff e9 ee
> fe ff ff 66 90 e8 7b fe ff ff e9 e2 fe ff ff 66 0f 1f 44 00 00 66 66 66
> 66 90 <48> f7 07 00 c0 00 00 75 0f 3e ff 4f 1c 74 04 c3 0f 1f 00 e9 53
> [393447.698305] RIP  [<ffffffff8114a935>] put_page+0x5/0x30
> [393447.698305]  RSP <ffff88007fc03ce8>
> [393447.698305] CR2: 0000000000000000
> crash>

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: