Hi, I don't know much about kernel debugging myself, but this looks like something to report. Take a look at [0] on how to do that. Regards /peter [0] https://www.debian.org/Bugs/Reporting Am 07.07.2016 um 08:38 schrieb Christian Harris: > Hello All, > > I am hoping to get some help with one of my virtual machines. I am > running a KVM host with several virtual machines provide internet > services to a small network. The gateway machine is a Debian 8 minimum > install that was updated to 8.5. > > user1@gateway:~# sudo lsb_release -da > No LSB modules are available. > Distributor ID: Debian > Description: Debian GNU/Linux 8.5 (jessie) > Release: 8.5 > Codename: jessie > user1@gateway:~# sudo uname -a > Linux gateway 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) > x86_64 GNU/Linux > > Minimal additional packages are installed, only enough to support a NAT > gateway. > > Periodically, the vm experiences a kernel Oops and crashes, taking down > internet access for the network. This is the only vm that is crashing, > the other VMs (based off the same minimal install, updated to 8.5, > minimum software installs) have uptimes of 100+ days. This VM seems to > crash every few weeks. > > I managed to somewhat stabilize the internet connection by enabling > crash dumps and automatic reboots with instructions from here: > https://www.bentasker.co.uk/documentation/linux/312-installing-and-configuring-kdump-on-debian-jessie > > All is well as mostly. The machines reboots after a crash dump, so there > is minimum impact to the network. But occasionally, the PPPOE fails to > redial after a reboot :-< Aside from the PPPOE issue, I figured I would > try to get to the root of why the vm is crashing to begin with. However, > I am not developer and have no idea how to interpret the crash dump. As > much as I can tell, there swapper/0 process caused the dump with > instruction put_page+5. I have no idea what that means. > > Any assistance as to why this host is crashing would be helpful. The > only thing this host is doing is serving as a NAT gateway. I have having > no problems with any other VMs with the same basic OS load. > > As a start, I at least got the log and bt from the crash dump. I can > provide additional crash info if needed (and givent he commands). > > user1@gateway:/var/crash/201607040851# sudo crash kernel_link > dump.201607040851 > ...version info removed... > KERNEL: kernel_link > DUMPFILE: dump.201607040851 [PARTIAL DUMP] > CPUS: 1 > DATE: Mon Jul 4 08:51:23 2016 > UPTIME: 4 days, 13:17:27 > LOAD AVERAGE: 0.00, 0.01, 0.05 > TASKS: 67 > NODENAME: gateway > RELEASE: 3.16.0-4-amd64 > VERSION: #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) > MACHINE: x86_64 (1596 Mhz) > MEMORY: 2 GB > PANIC: "Oops: 0000 [#1] SMP " (check log for details) > PID: 0 > COMMAND: "swapper/0" > TASK: ffffffff8181a460 [THREAD_INFO: ffffffff81800000] > CPU: 0 > STATE: TASK_RUNNING (PANIC) > > crash> bt > PID: 0 TASK: ffffffff8181a460 CPU: 0 COMMAND: "swapper/0" > #0 [ffff88007fc039c8] machine_kexec at ffffffff8104c0a2 > #1 [ffff88007fc03a18] crash_kexec at ffffffff810df7da > #2 [ffff88007fc03ad8] oops_end at ffffffff81016228 > #3 [ffff88007fc03af8] no_context at ffffffff8150b172 > #4 [ffff88007fc03b38] __do_page_fault at ffffffff810571c0 > #5 [ffff88007fc03c30] async_page_fault at ffffffff81516a58 > [exception RIP: put_page+5] > RIP: ffffffff8114a935 RSP: ffff88007fc03ce8 RFLAGS: 00010206 > RAX: 0000000000000030 RBX: ffff88007974f4c0 RCX: 000000007974f400 > RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: 0000000000000000 > RBP: 0000000000000001 R8: 0000000080000000 R9: ffff880036c500b0 > R10: 6db6db6db6db6db7 R11: 0000160000000000 R12: ffff880079a35d00 > R13: 0000000000000049 R14: ffff88007974f220 R15: ffff88007971bb00 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff88007fc03ce0] ip_finish_output2 at ffffffff81459756 > #7 [ffff88007fc03d20] ip_fragment at ffffffff8145a1c8 > #8 [ffff88007fc03d98] ip_finish_output at ffffffff8145a9d4 > #9 [ffff88007fc03dd8] __netif_receive_skb_core at ffffffff8141f1a3 > #10 [ffff88007fc03e28] netif_receive_skb_internal at ffffffff8141f42f > #11 [ffff88007fc03e48] virtnet_poll at ffffffffa00375aa [virtio_net] > #12 [ffff88007fc03ed0] net_rx_action at ffffffff8141f7b0 > #13 [ffff88007fc03f20] __do_softirq at ffffffff8106c6a1 > #14 [ffff88007fc03f78] irq_exit at ffffffff8106ca75 > #15 [ffff88007fc03f80] do_IRQ at ffffffff81517822 > --- <IRQ stack> --- > #16 [ffffffff81803e48] ret_from_intr at ffffffff8151566d > [exception RIP: native_safe_halt+2] > RIP: ffffffff81051c12 RSP: ffffffff81803ef0 RFLAGS: 00000246 > RAX: ffffffff8101c8b0 RBX: 0000000000000086 RCX: ffffffff81855220 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 > R10: 0000000105dbbc7c R11: 0000000000000104 R12: 000000000000d160 > R13: 0000000000000040 R14: ffffffff8108ae2d R15: 0000000000000086 > ORIG_RAX: ffffffffffffff8e CS: 0010 SS: 0018 > #17 [ffffffff81803ef0] default_idle at ffffffff8101c8c9 > #18 [ffffffff81803f08] cpu_startup_entry at ffffffff810a83e0 > #19 [ffffffff81803f68] start_kernel at ffffffff81903076 > #20 [ffffffff81803fa0] x86_64_start_kernel at ffffffff8190271f > > crash> log > ....cut..... > [ 3.936064] FS-Cache: Loaded > [ 3.948833] FS-Cache: Netfs 'nfs' registered for caching > [ 3.967038] Installing knfsd (copyright (C) 1996 okir@monad.swb.de > <mailto:okir@monad.swb.de>). > [ 34.781171] random: nonblocking pool is initialized > [ 6171.133280] IPv4: martian source 169.254.39.87 from 169.254.39.87, on > dev eth0 > [ 6171.134424] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 > 08 06 ......0Y...... > [ 6171.135287] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.135942] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 > 08 00 ......0Y...... > [ 6171.137267] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.137267] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea > 08 00 ..>.....[.x... > [ 6171.151332] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.152519] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 > 08 00 ......0Y...... > [ 6171.153594] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.154369] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea > 08 00 ..>.....[.x... > [ 6171.206225] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.207925] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 > 08 00 ......0Y...... > [ 6171.209235] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.210167] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea > 08 00 ..>.....[.x... > [ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.210203] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 > 08 00 ......0Y...... > [ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.210203] ll header: 00000000: 00 16 3e 00 00 01 00 19 5b 8d 78 ea > 08 00 ..>.....[.x... > [ 6171.210203] IPv4: martian source 169.254.255.255 from 169.254.39.87, > on dev eth0 > [ 6171.210203] ll header: 00000000: ff ff ff ff ff ff 30 59 b7 14 13 14 > 08 00 ......0Y...... > [393447.694364] BUG: unable to handle kernel NULL pointer dereference > at (null) > [393447.695801] IP: [<ffffffff8114a935>] put_page+0x5/0x30 > [393447.697326] PGD 36ef2067 PUD 36ef1067 PMD 0 > [393447.698305] Oops: 0000 [#1] SMP > [393447.698305] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl > nfs lockd fscache sunrpc pppoe pppox ip6table_filter ppp_generic slhc > ip6_tables xt_conntrack iptable_filter ipt_MASQUERADE xt_nat iptable_nat > nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack > xt_TCPMSS xt_tcpmss xt_tcpudp iptable_mangle ip_tables x_tables > crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul glue_helper > ablk_helper cryptd ttm pcspkr evdev drm_kms_helper serio_raw > virtio_balloon drm i2c_piix4 i2c_core parport_pc parport pvpanic > processor thermal_sys button autofs4 ext4 crc16 mbcache jbd2 dm_mod > virtio_net virtio_blk ata_generic crct10dif_pclmul crct10dif_common > crc32c_intel psmouse uhci_hcd ehci_pci ehci_hcd ata_piix usbcore > virtio_pci virtio_ring floppy usb_common virtio libata scsi_mod > [393447.698305] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 > #1 Debian 3.16.7-ckt25-2 > [393447.698305] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 > [393447.698305] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: > ffffffff81800000 > [393447.698305] RIP: 0010:[<ffffffff8114a935>] [<ffffffff8114a935>] > put_page+0x5/0x30 > [393447.698305] RSP: 0018:ffff88007fc03ce8 EFLAGS: 00010206 > [393447.698305] RAX: 0000000000000030 RBX: ffff88007974f4c0 RCX: > 000000007974f400 > [393447.698305] RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: > 0000000000000000 > [393447.698305] RBP: 0000000000000001 R08: 0000000080000000 R09: > ffff880036c500b0 > [393447.698305] R10: 6db6db6db6db6db7 R11: 0000160000000000 R12: > ffff880079a35d00 > [393447.698305] R13: 0000000000000049 R14: ffff88007974f220 R15: > ffff88007971bb00 > [393447.698305] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) > knlGS:0000000000000000 > [393447.698305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [393447.698305] CR2: 0000000000000000 CR3: 0000000036ec4000 CR4: > 00000000000406f0 > [393447.698305] Stack: > [393447.698305] ffffffff8140f377 0000000000005d00 ffff880079a35d00 > 0000000000000000 > [393447.698305] ffffffff8140f647 0000000000005d00 ffff880079a35d00 > ffffffff8145a1c8 > [393447.698305] 0000001400000000 0000005d00000020 0000059a7974f400 > ffff88007a080000 > [393447.698305] Call Trace: > [393447.698305] <IRQ> > [393447.698305] > [393447.698305] [<ffffffff8140f377>] ? skb_release_data+0x87/0x110 > [393447.698305] [<ffffffff8140f647>] ? consume_skb+0x27/0x80 > [393447.698305] [<ffffffff8145a1c8>] ? ip_fragment+0x5b8/0x880 > [393447.698305] [<ffffffff81459600>] ? ip_reply_glue_bits+0x50/0x50 > [393447.698305] [<ffffffff8145a9d4>] ? ip_finish_output+0x544/0x850 > [393447.698305] [<ffffffff8141f1a3>] ? __netif_receive_skb_core+0x543/0x750 > [393447.698305] [<ffffffff8105198b>] ? kvm_clock_get_cycles+0x1b/0x20 > [393447.698305] [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80 > [393447.698305] [<ffffffffa00375aa>] ? virtnet_poll+0x52a/0x880 > [virtio_net] > [393447.698305] [<ffffffff8141f7b0>] ? net_rx_action+0x140/0x240 > [393447.698305] [<ffffffff8106c6a1>] ? __do_softirq+0xf1/0x290 > [393447.698305] [<ffffffff8106ca75>] ? irq_exit+0x95/0xa0 > [393447.698305] [<ffffffff81517822>] ? do_IRQ+0x52/0xe0 > [393447.698305] [<ffffffff8151566d>] ? common_interrupt+0x6d/0x6d > [393447.698305] <EOI> > [393447.698305] > [393447.698305] [<ffffffff8101c8b0>] ? idle_notifier_unregister+0x20/0x20 > [393447.698305] [<ffffffff81051c12>] ? native_safe_halt+0x2/0x10 > [393447.698305] [<ffffffff8101c8c9>] ? default_idle+0x19/0xb0 > [393447.698305] [<ffffffff810a83e0>] ? cpu_startup_entry+0x340/0x400 > [393447.698305] [<ffffffff81903076>] ? start_kernel+0x497/0x4a2 > [393447.698305] [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e > [393447.698305] [<ffffffff81902120>] ? early_idt_handler_array+0x120/0x120 > [393447.698305] [<ffffffff8190271f>] ? x86_64_start_kernel+0x14d/0x15c > [393447.698305] Code: 45 00 48 89 ef f6 c4 40 74 0a e8 67 fe ff ff e9 ee > fe ff ff 66 90 e8 7b fe ff ff e9 e2 fe ff ff 66 0f 1f 44 00 00 66 66 66 > 66 90 <48> f7 07 00 c0 00 00 75 0f 3e ff 4f 1c 74 04 c3 0f 1f 00 e9 53 > [393447.698305] RIP [<ffffffff8114a935>] put_page+0x5/0x30 > [393447.698305] RSP <ffff88007fc03ce8> > [393447.698305] CR2: 0000000000000000 > crash>
Attachment:
signature.asc
Description: OpenPGP digital signature