[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Spreading NIC interrupts across multiple CPUs



On 3/26/2014 5:23 PM, Aaron Seelye wrote:
> On 3/26/2014 2:44 PM, Stan Hoeppner wrote:
>>
>> Please read this for educational background, especially the Note at the
>> bottom of the page.
>>
>> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html
>>
>>
>> Then ask an intelligent question about IRQ balancing and steering, WRT
>> the two specific and different hardware systems, and Debian kernel
>> versions, being used on each.
> 
> I'd seen other things similar to that, however, it doesn't seem to get
> me any closer to the solution.

Please post the full output of "cat /proc/interrupts" without line wrapping.

> The output from one of the Dell (not balanced) systems:
> 
> root@conf-2:~# uname -a
> Linux conf-2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
> root@conf-2:~# grep eth /proc/interrupts
>   79:  704642666          0          0          0          0          0
>          0          0          0          0          0          0    
> 0          0          0          0   PCI-MSI-edge      eth0
> root@conf-2:~# cat /proc/irq/79/smp_affinity
> 0000ffff
> root@conf-2:~# cat /proc/irq/79/smp_affinity_list
> 0-15

This is an 8 core machine with HT enabled, 16 logical CPUs, so right off
the bat it is dramatically different than the Compaq machine below as
far as the kernel is concerned and how scheduling is performed.  The
current mask may or may not be correct for this configuration.  I never
use HT and I can't find any docs about HT and /proc/irq/xx/smp_affinity.

If this is a production machine and you can't easily reboot it to
disable HT, first try a mask that includes only the physical CPUs and
not the logical:

~# echo ff > /proc/irq/79/smp_affinity

This should schedule IRQs only on the 1st logical processor (physical
CPU) of each core.  If that doesn't do the trick reboot the box and
disable HT.  If that doesn't do it I'll dig further into the scheduler
to figure out what's going on.

> The output from the HP (balanced) system:
> 
> root@deb-test:~# grep eth /proc/interrupts
>   68:       4251       4190       4212       4264       4226       4257
>       4251       4214   PCI-MSI-edge      eth0
> root@deb-test:~# cat /proc/irq/68/smp_affinity
> ff
> root@deb-test:~# cat /proc/irq/68/smp_affinity_list
> 0-7

This is an 8 core machine without HyperThreading.  The mask is correct
for 8 physical CPUs.  Oddly though, one box outputs the leading zeros of
the mask while the other does not.  Or did you mung either output?

> As you can see, both systems are running identical kernels, and both
> have affinity set to spread across all CPUs.  

The latter may not be a correct statement, as HT logical processors are
not CPUs.  Also, the smp_affinity mask on the Dell implies 32
processors.  Many, but not all, of the functional units are duplicated.
 Just as you do not want to schedule two compute intensive tasks to both
logical processors on a core leaving the other cores idle, you also do
not want to assign assign any interrupts to the 2nd logical processor in
a given core.  All this does is pile up context and state switches on
said core.  The net effect is decreasing the overall work that can be
performed.

And to this point, it's not usually a good idea to spread interrupts
round robin from any device evenly across all cores in a system.  This
is inefficient as each core must load the ISR for every interrupt.  This
decreases the effectiveness of L1/L2 caches on all cores, causing
additional cache misses for other processes executing on those cores.
This is precisely why irqbalance was created.

> However, the Dell is using
> CPU0 exclusively for the ethernet device interrupts, while the HP
> spreads them pretty evenly.

This could be as simple at HT being enabled on the Dell.  If not, the
contents of your /proc/interrupts files should help me narrow this down
for you.

For future reference, kernel scheduler problems such as this should be
posted on LKML, not a distro list, no matter which distro you use.
There are very few people on debian-user or any of the distro general
help lists with significant knowledge of the kernel, let alone the
scheduler.  You typically get help with this kind of thing much faster,
and with more thorough knowledge transfer on LKML.

Cheers,

Stan


Reply to: