[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#805971: marked as done (linux-image-3.16.0-4-amd64: [PATCH] Xen domU "unable to handle kernel NULL pointer dereference")



Your message dated Sun, 02 May 2021 17:15:13 +0200
with message-id <E1ldDo8-001MdL-3m@hullmann.westfalen.local>
and subject line Closing this bug
has caused the Debian Bug report #805971,
regarding linux-image-3.16.0-4-amd64: [PATCH] Xen domU "unable to handle kernel NULL pointer dereference"
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
805971: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=805971
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: linux-image-3.16.0-4-amd64
Version: 3.16.7-ckt11-1+deb8u6
Severity: important

Hi!


Inside a Xen domU, with the combination of

 * latest kernel of jessie (3.16.7-ckt11-1+deb8u6)
   or related kernel from wheezy-backports (3.16.7-ckt11-1+deb8u6~bpo70+1) and

 * 2 network interfaces and

 * 24 VCPUs ..

I see error "unable to handle kernel NULL pointer dereference" during start-up
...

  [    0.755434] xen_netfront: can't alloc rx grant refs
  [    0.758359] BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
  [    0.761622] IP: [<ffffffffa018bc09>] netback_changed+0x989/0xf00
[xen_netfront]
  [    0.761622] PGD 0
  [    0.761622] Oops: 0000 [#1] SMP
  [    0.761622] Modules linked in: ata_piix xen_blkfront(+) xen_netfront(+)
libata crc32c_intel floppy scsi_mod
  [    0.761622] CPU: 1 PID: 129 Comm: xenwatch Not tainted
3.16.0-0.bpo.4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u6~bpo70+1
  [    0.761622] Hardware name: Xen HVM domU, BIOS 4.4.1 10/26/2015
  [    0.761622] task: ffff88003bbd53f0 ti: ffff88003bbd8000 task.ti:
ffff88003bbd8000
  [    0.761622] RIP: 0010:[<ffffffffa018bc09>]  [<ffffffffa018bc09>]
netback_changed+0x989/0xf00 [xen_netfront]
  [    0.761622] RSP: 0018:ffff88003bbdbde8  EFLAGS: 00010202
  [    0.761622] RAX: 0000000000000000 RBX: ffff880032398d00 RCX:
0000000000000001
  [    0.761622] RDX: 00000000000322a7 RSI: ffff880032398d98 RDI:
0000000000005729
  [    0.761622] RBP: 0000000000098d00 R08: 0000000000000001 R09:
ffffffff8172b600
  [    0.761622] R10: ffffea0000af94c0 R11: ffffea0000af9b38 R12:
ffff880036a61000
  [    0.761622] R13: ffff8800322a6000 R14: ffff880036a618c0 R15:
ffff8800322a7000
  [    0.761622] FS:  0000000000000000(0000) GS:ffff88003ce20000(0000)
knlGS:0000000000000000
  [    0.761622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    0.761622] CR2: 0000000000000018 CR3: 0000000001811000 CR4:
00000000001406e0
  [    0.761622] Stack:
  [    0.761622]  ffff88003b5e0c20 ffff880032391381 ffff8800323912c4
ffff880000000018
  [    0.761622]  ffff88003b5e0c00 0000001400000001 ffff880032398d98
ffff88003b5e0c00
  [    0.761622]  0000000000000000 ffff8800328798f1 0000000800000001
0000003800000001
  [    0.761622] Call Trace:
  [    0.761622]  [<ffffffff81381d50>] ? xenbus_thread+0x2a0/0x2a0
  [    0.761622]  [<ffffffff81381dea>] ? xenwatch_thread+0x9a/0x140
  [    0.761622]  [<ffffffff810b13b0>] ? __wake_up_sync+0x20/0x20
  [    0.761622]  [<ffffffff81090741>] ? kthread+0xc1/0xe0
  [    0.761622]  [<ffffffff81090680>] ? flush_kthread_worker+0xb0/0xb0
  [    0.761622]  [<ffffffff8154be58>] ? ret_from_fork+0x58/0x90
  [    0.761622]  [<ffffffff81090680>] ? flush_kthread_worker+0xb0/0xb0
  [    0.761622] Code: 63 38 fe e9 5c fb ff ff 48 8b 7c 24 20 48 c7 c2 cb d2 18
a0 be f4 ff ff ff 31 c0 e8 72 4a 1f e1 eb a2 48 8b 43 20 48 8b 74 24 30 <48> 8b
78 18 e8 8e 4b 1f e1 85 c0 0f 88 d5 fd ff ff 48 8b 43 20
  [    0.761622] RIP  [<ffffffffa018bc09>] netback_changed+0x989/0xf00
[xen_netfront]
  [    0.761622]  RSP <ffff88003bbdbde8>
  [    0.761622] CR2: 0000000000000018
  [    0.761622] ---[ end trace 6123087ce2740115 ]---

... and the second network interface ends up unusuable.

It turns out, what's happening is that:

 * by default, the hypervisor allocates 32 grant table entries and

 * network interface can need more than 32.

 * Now function talk_to_netback (drivers/net/xen-netfront.c) calls
   function xennet_create_queues (drivers/net/xen-netfront.c) to create
   num_queues many queues.

 * xennet_create_queues goes on as long as it can and stores
   the number of queues created at info->netdev->real_num_tx_queues.

 * Now function talk_to_netback continues operation with the (wrong) assumption
   that num_queues queues are in place, while it may be fewer than that.
   So yyncing num_queues with info->netdev->real_num_tx_queues fixes the
problem.

Viktor Dukhovni published a patch on 2015-09-09 at
http://lists.xenproject.org/archives/html/xen-users/2015-09/txtbaRgWqxpT4.txt ,
already.  His patch also fixes the "only created %d queues" message:
unpatched it is using the wanted number of queues (rather than the number of
queues created), by mistake.

I'm hoping for an updated kernel package including Viktor's patch, soon.


For a workaround, one can use something like gnttab_max_nr_frames=256
to increase the size of the grant table (with GRUB_CMDLINE_XEN_DEFAULT in
/etc/default/grub).  Again, it's no more than a workaround and requires
rebooting the hypervisor (which upgrading the domU to a fixed kernel does not).

Many thanks in advance,



Sebastian



-- System Information:
Debian Release: 7.9
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

--- End Message ---
--- Begin Message ---
This bug was filed for a very old kernel. If you can reproduce it with
- the current version in unstable/testing
- the latest kernel from buster.backports
please reopen the bug, see https://www.debian.org/Bugs/server-control

--- End Message ---

Reply to: