Handling irqbalance in virtual environments
Moin
It turns out we got again problems with irqbalance.
It was added as recommends of the main image in 3.16, as it was reported
that older kernels move all interrupts to CPU 0 without help.[1]
In the meantime the kernel can do balancing on it's own. In 4.9, I've
seen it working with aacraid, each queue gets hard pinned to it's own
CPU from 0 to $NRCPUS. In 4.19 I've seen the same working properly with
virtio-net.
With 4.19, even on real hardware, where interrupts have an affinity for
all cpus, each interrupt is actually delivered to different cpu.
Random example for this, it even selects only one thread of each core:
| 26: 0 0 0 0 92 0 0 0 IR-PCI-MSI 3670017-edge eno1-TxRx-0
| 27: 0 0 0 0 0 167 0 0 IR-PCI-MSI 3670018-edge eno1-TxRx-1
| 28: 0 0 0 0 0 0 467 0 IR-PCI-MSI 3670019-edge eno1-TxRx-2
| 29: 0 0 0 0 0 0 0 454 IR-PCI-MSI 3670020-edge eno1-TxRx-3
Now irqbalance comes to re-do the existing pinning, and the result is not
longer correct but $RANDOM for the hard queue-to-cpu case of virtio.
At least Google considers the work irqbalance does to "correct" the existing
balancing a large problem.
I'm not sure how to go forward. I have a workaround pending for our
cloud images to hard exclude the installation of irqbalance.[2]
Regards,
Bastian
[1]: https://bugs.debian.org/577788
[2]: https://salsa.debian.org/cloud-team/debian-cloud-images/merge_requests/81
--
Youth doesn't excuse everything.
-- Dr. Janice Lester (in Kirk's body), "Turnabout Intruder",
stardate 5928.5.
Reply to: