[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: No irq handler for vector



On 2016-03-17 at 20:23, Henrique de Moraes Holschuh wrote:

> On Wed, 16 Mar 2016, The Wanderer wrote:
> 
>> On 2016-03-16 at 11:35, Henrique de Moraes Holschuh wrote:

>>> What processor is this, please?
>> 
>> Core i7-990X Extreme. /proc/cpuinfo reports it as:
>> 
>> cpu family: 6
>> model: 44
>> model name: Intel(R) Core(TM) i7 CPU X 990 @ 3.47 GHz
>> stepping: 2
> 
> Crap.  I am looking into this.

I'm afraid I don't quite understand. Is there still a problem worth
being concerned about, now that the message has stopped appearing for
me?

>> and currently (with the problem not happening) also reports
>> microcode: 0x14
> 
> Lucky you, 0x14 is safe enough on non-server systems.
> 
>> The problem apparently only happens with some motherboards, whose
>> BIOS or UEFI doesn't handle something correctly (I used to know
>> what, but I've forgotten the details). My motherboard is an Asus
>> Sabertooth X58,
> 
> The broken IOMMU interrupt remapping on the X58/S55xx chipsets,
> maybe?

Could be. I'll try to find time tomorrow to re-do some of my previous
research and dig up what I had deduced the original claimed problem to
be.

> I'd expect any BIOS with a 0x14 microcode to have the fix to the
> above (which is to disable the broken interrupt remapping feature of
> the IOMMU), so it might have been fixed when you updated that BIOS.

The recurring messages persisted after the BIOS update (although they
seemed, at least at first, to get less frequent), so while this may have
helped, it doesn't seem to have been enough on its own.

FWIW, I think the previous microcode on my system was either 0x11 or
0x10, although I can't swear to that. (I might be mixing it up with some
of the computers at my workplace; I don't exactly check the BIOS on this
machine very often.)

It's also (at least faintly) possible that the 0x14 microcode is being
put in place after boot, despite the change to stop doing that
automatically. I did install iucode-tool and some other
microcode-related packages in my attempts to find a fix; although it
didn't seem to produce any results initially, it's not impossible that
some later package update introduced a change which got the microcode
being applied on-the-fly again.

> And I *think* our 3.14 kernel eventually got the patch that bitches
> about BIOSes that get this wrong and tries to disable it, but I am
> not sure about this, so a kernel update can certainly fix it (if
> that's indeed the root cause of the "no irq handler for vector" on
> X58/S55xx systems).

That's interesting to be aware of; thanks. Is there anything in
particular I should look for, in kernel messages, to determine whether
this is taking effect on my system?

> There was also an erratum that caused the uncore frequency multiplier
> to be stuck and locked on "max".  This got fixed somewhere between
> microcode 0x10 and microcode 0x13, AFAIK...
> 
> Does any of the above ring a bell?

I think it may well have been the broken interrupt remapping that was
the problem, but unfortunately, it's been long enough since I gave up on
the research that I don't have the details anymore.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man.         -- George Bernard Shaw

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: