--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: linux-image-3.16.0-4-amd64: [PATCH] Xen domU "unable to handle kernel NULL pointer dereference"
- From: Sebastian Pipping <sebastian@pipping.org>
- Date: Tue, 24 Nov 2015 13:18:16 +0100
- Message-id: <20151124121816.7035.99576.reportbug@localhost>
Package: linux-image-3.16.0-4-amd64
Version: 3.16.7-ckt11-1+deb8u6
Severity: important
Hi!
Inside a Xen domU, with the combination of
* latest kernel of jessie (3.16.7-ckt11-1+deb8u6)
or related kernel from wheezy-backports (3.16.7-ckt11-1+deb8u6~bpo70+1) and
* 2 network interfaces and
* 24 VCPUs ..
I see error "unable to handle kernel NULL pointer dereference" during start-up
...
[ 0.755434] xen_netfront: can't alloc rx grant refs
[ 0.758359] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
[ 0.761622] IP: [<ffffffffa018bc09>] netback_changed+0x989/0xf00
[xen_netfront]
[ 0.761622] PGD 0
[ 0.761622] Oops: 0000 [#1] SMP
[ 0.761622] Modules linked in: ata_piix xen_blkfront(+) xen_netfront(+)
libata crc32c_intel floppy scsi_mod
[ 0.761622] CPU: 1 PID: 129 Comm: xenwatch Not tainted
3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u6~bpo70+1
[ 0.761622] Hardware name: Xen HVM domU, BIOS 4.4.1 10/26/2015
[ 0.761622] task: ffff88003bbd53f0 ti: ffff88003bbd8000 task.ti:
ffff88003bbd8000
[ 0.761622] RIP: 0010:[<ffffffffa018bc09>] [<ffffffffa018bc09>]
netback_changed+0x989/0xf00 [xen_netfront]
[ 0.761622] RSP: 0018:ffff88003bbdbde8 EFLAGS: 00010202
[ 0.761622] RAX: 0000000000000000 RBX: ffff880032398d00 RCX:
0000000000000001
[ 0.761622] RDX: 00000000000322a7 RSI: ffff880032398d98 RDI:
0000000000005729
[ 0.761622] RBP: 0000000000098d00 R08: 0000000000000001 R09:
ffffffff8172b600
[ 0.761622] R10: ffffea0000af94c0 R11: ffffea0000af9b38 R12:
ffff880036a61000
[ 0.761622] R13: ffff8800322a6000 R14: ffff880036a618c0 R15:
ffff8800322a7000
[ 0.761622] FS: 0000000000000000(0000) GS:ffff88003ce20000(0000)
knlGS:0000000000000000
[ 0.761622] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.761622] CR2: 0000000000000018 CR3: 0000000001811000 CR4:
00000000001406e0
[ 0.761622] Stack:
[ 0.761622] ffff88003b5e0c20 ffff880032391381 ffff8800323912c4
ffff880000000018
[ 0.761622] ffff88003b5e0c00 0000001400000001 ffff880032398d98
ffff88003b5e0c00
[ 0.761622] 0000000000000000 ffff8800328798f1 0000000800000001
0000003800000001
[ 0.761622] Call Trace:
[ 0.761622] [<ffffffff81381d50>] ? xenbus_thread+0x2a0/0x2a0
[ 0.761622] [<ffffffff81381dea>] ? xenwatch_thread+0x9a/0x140
[ 0.761622] [<ffffffff810b13b0>] ? __wake_up_sync+0x20/0x20
[ 0.761622] [<ffffffff81090741>] ? kthread+0xc1/0xe0
[ 0.761622] [<ffffffff81090680>] ? flush_kthread_worker+0xb0/0xb0
[ 0.761622] [<ffffffff8154be58>] ? ret_from_fork+0x58/0x90
[ 0.761622] [<ffffffff81090680>] ? flush_kthread_worker+0xb0/0xb0
[ 0.761622] Code: 63 38 fe e9 5c fb ff ff 48 8b 7c 24 20 48 c7 c2 cb d2 18
a0 be f4 ff ff ff 31 c0 e8 72 4a 1f e1 eb a2 48 8b 43 20 48 8b 74 24 30 <48> 8b
78 18 e8 8e 4b 1f e1 85 c0 0f 88 d5 fd ff ff 48 8b 43 20
[ 0.761622] RIP [<ffffffffa018bc09>] netback_changed+0x989/0xf00
[xen_netfront]
[ 0.761622] RSP <ffff88003bbdbde8>
[ 0.761622] CR2: 0000000000000018
[ 0.761622] ---[ end trace 6123087ce2740115 ]---
... and the second network interface ends up unusuable.
It turns out, what's happening is that:
* by default, the hypervisor allocates 32 grant table entries and
* network interface can need more than 32.
* Now function talk_to_netback (drivers/net/xen-netfront.c) calls
function xennet_create_queues (drivers/net/xen-netfront.c) to create
num_queues many queues.
* xennet_create_queues goes on as long as it can and stores
the number of queues created at info->netdev->real_num_tx_queues.
* Now function talk_to_netback continues operation with the (wrong) assumption
that num_queues queues are in place, while it may be fewer than that.
So yyncing num_queues with info->netdev->real_num_tx_queues fixes the
problem.
Viktor Dukhovni published a patch on 2015-09-09 at
http://lists.xenproject.org/archives/html/xen-users/2015-09/txtbaRgWqxpT4.txt ,
already. His patch also fixes the "only created %d queues" message:
unpatched it is using the wanted number of queues (rather than the number of
queues created), by mistake.
I'm hoping for an updated kernel package including Viktor's patch, soon.
For a workaround, one can use something like gnttab_max_nr_frames=256
to increase the size of the grant table (with GRUB_CMDLINE_XEN_DEFAULT in
/etc/default/grub). Again, it's no more than a workaround and requires
rebooting the hypervisor (which upgrading the domU to a fixed kernel does not).
Many thanks in advance,
Sebastian
-- System Information:
Debian Release: 7.9
APT prefers oldstable-updates
APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
--- End Message ---