[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: pyxis_machine_check ---- strange message!



dannyhp@pop.web.net writes:
 > 
 > uname -a
 > Linux arrhenius 2.0.35 #1 Sun Nov 15 07:09:00 CET 1998 alpha
 > unknown

Ok thanks, so the 0xfffffc00003159c8 address is in
"cabriolet_and_eb66p_device_interrupt()"

at line:
"	/* read the interrupt summary registers */
	pld = inb(0x804) | (inb(0x805) << 8) | (inb(0x806) << 16);
"

Here is the corresponding assembly code:
0xfffffc00003159b4:     lda     t0,-887(zero)
0xfffffc00003159b8:     sll     t0,0x20,t0
0xfffffc00003159bc:     ldbu    t2,2052(t0)
0xfffffc00003159c0:     lda     t0,-887(zero)
0xfffffc00003159c4:     sll     t0,0x20,t0
0xfffffc00003159c8:     ldbu    t1,2053(t0)
0xfffffc00003159cc:     lda     t0,-887(zero)

Now I can only make assumptions, so take it with care:

I would say that this is the inb(0x804) instruction that is provoking
the machine check failure because it show on the PCI bus as a read to
location 800804 instead of location 000804 (one bit change).
I have already seen exactly the same bug on a lx164 with the bit at
the same place that was wrong (but that was while accessing a network
card, the behaviour was debugged with a logic analyser at the time
and I was told that given its history it was not unlikely to be a bug
in the pyxis chipset that show up sometimes.

Or maybe you want to check that all your cards are properly seated.

 > I don't have disas --- which package is it in?  I assume it is a disassembler
 > --- it it like /usr/bin/as with some command argument?

My suggestion was to use the disassemble command under gdb (for
instance with a pipe), it is not a standalone command.

 > Also, where would milo/palcode/... be? --- is this in the kernel source?  I
 > have kernel source for 2.0.36 (in tar.gz form) --- should I be unpacking it
 > (even though we are running 2.0.35)?

No, the milo/palcode reference was just a hint for people that want to
interpret machine checks that they should look in the source of milo
(available as a tar.gz for instance on
ftp://genie.ucd.ie/pub/alpha/milo or on gatekeeper). Anyway unless you
have the hardware reference for your processor and chipset, it will
not be very useful.

Is it a higly loaded machine? How often do you get the error message?
and does the you need to reboot the machien after the error?

Maybe we can try to insert some nop or mb operations around each inb() 
to see if that changes something.

Loic


Reply to: