Bug#339080: Frequent crash in handle_IRQ_event on alpha with kernel 2.6
Package: linux-2.6
Tags: patch
Since beginning of 2005 I tried different kernel/Linux-images 2.6.x
on my Alphastation 500/500:
=============================================================================================
cpu : Alpha
cpu model : EV56
cpu variation : 7
cpu revision : 0
cpu serial number :
system type : Alcor
system variation : Alcor
system revision : 0
system serial number :
cycle frequency [Hz] : 500000000
timer frequency [Hz] : 1024.00
page size [bytes] : 8192
phys. address bits : 40
max. addr. space # : 127
BogoMIPS : 994.44
kernel unaligned acc : 0 (pc=0,va=0)
user unaligned acc : 0 (pc=0,va=0)
platform string : Digital AlphaStation 500/500
cpus detected : 1
L1 Icache : 8K, 1-way, 32b line
L1 Dcache : 8K, 1-way, 32b line
L2 cache : 96K, 3-way, 64b line
L3 cache : 8192K, 1-way, 64b line
Tried kernels were 2.6.8-1, 2.6.8-2, 2.6.10, 2.6.12,... All kernels
crash on this machine with the following message (ksymoops):
ksymoops 2.4.9 on alpha 2.6.8-2-generic. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.6.8-2-generic/ (default)
-m /boot/System.map-2.6.8-2-generic (default)
Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.
Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Trace:
[<fffffc000031a164>] handle_IRQ_event+0x74/0xf0
[<fffffc000031ab50>] handle_irq+0xe0/0x1c0
[<fffffc0000329b04>] srm_device_interrupt+0x24/0x40
[<fffffc000031b1f4>] do_entInt+0xf4/0x140
[<fffffc0000315260>] ret_from_sys_call+0x0/0x10
[<fffffc0000316e30>] default_idle+0x0/0x10
[<fffffc0000316e98>] cpu_idel+0x58/0x80
[<fffffc0000316e30>] default_idle+0x0/0x10
[<fffffc0000316e30>] default_idle+0x0/0x10
[<fffffc0000310234>] rest_init+0x34/0x50
[<fffffc000031001c>] __start+0x1c/0x20
Code: 243f0010 245f0020 21c10100 21a20200 a4490008 a4290000
<b4410008> b4220000
Using defaults from ksymoops -t elf64-alpha -a alpha
Trace; fffffc000031a164 <handle_IRQ_event+74/f0>
Trace; fffffc000031ab50 <handle_irq+e0/1c0>
Trace; fffffc0000329b04 <srm_device_interrupt+24/40>
Trace; fffffc000031b1f4 <do_entInt+f4/140>
Trace; fffffc0000315260 <ret_from_sys_call+0/10>
Trace; fffffc0000316e30 <default_idle+0/10>
Trace; fffffc0000316e98 <cpu_idle+58/80>
Trace; fffffc0000316e30 <default_idle+0/10>
Trace; fffffc0000316e30 <default_idle+0/10>
Trace; fffffc0000310234 <rest_init+34/50>
Trace; fffffc000031001c <_stext+1c/20>
Code; ffffffffffffffe8 <END_OF_CODE+3ffff9a83a8/????>
0000000000000000 <_PC>:
Code; ffffffffffffffe8 <END_OF_CODE+3ffff9a83a8/????>
0: 10 00 3f 24 ldah t0,16
Code; ffffffffffffffec <END_OF_CODE+3ffff9a83ac/????>
4: 20 00 5f 24 ldah t1,32
Code; fffffffffffffff0 <END_OF_CODE+3ffff9a83b0/????>
8: 00 01 c1 21 lda s5,256(t0)
Code; fffffffffffffff4 <END_OF_CODE+3ffff9a83b4/????>
c: 00 02 a2 21 lda s4,512(t1)
Code; fffffffffffffff8 <END_OF_CODE+3ffff9a83b8/????>
10: 08 00 49 a4 ldq t1,8(s0)
Code; fffffffffffffffc <END_OF_CODE+3ffff9a83bc/????>
14: 00 00 29 a4 ldq t0,0(s0)
Code; 0000000000000000 Before first symbol
18: 08 00 41 b4 stq t1,8(t0)
Code; 0000000000000004 Before first symbol
1c: 00 00 22 b4 stq t0,0(t1)
Kernel panic: Aiee, killing interrupt handler!
=============================================================================================
The time of the crash depends on multiple factors. Sometimes after 3
hours, sometimes after two days, but mostly during idle time. A
device driver is not affected because the crash always occurs inside
arch/alpha/kernel/irq.c in function handle_irq_event. This could be a
problem after the call to an interrupt handler of a driver but this
also happens with changed hardware/drivers (2 different drivers for
scsi, 3 different drivers for ethernet, with/without SATA,
with/without USB). Nevertheless, here the hardware configuration:
=============================================================================================
0000:00:06.0 Ethernet controller: Digital Equipment Corporation
DECchip 21040 [Tulip] (rev 26)
Flags: bus master, medium devsel, latency 255, IRQ 29
I/O ports at 9400 [size=128]
Memory at 00000000022dd000 (32-bit, non-prefetchable) [size=128]
0000:00:07.0 RAID bus controller: Silicon Image, Inc. (formerly CMD
Technology Inc) SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
Subsystem: Silicon Image, Inc. (formerly CMD Technology Inc)
SiI 3114 SATARaid Controller
Flags: bus master, 66MHz, medium devsel, latency 240, IRQ 24
I/O ports at 9810 [size=8]
I/O ports at 9820 [size=4]
I/O ports at 9818 [size=8]
I/O ports at 9824 [size=4]
I/O ports at 9800 [size=16]
Memory at 00000000022db000 (32-bit, non-prefetchable) [size=1K]
Expansion ROM at 0000000002200000 [disabled] [size=512K]
Capabilities: [60] Power Management version 2
0000:00:08.0 VGA compatible controller: Digital Equipment Corporation
PBXGB [TGA2] (rev 22) (prog-if 00 [VGA])
Flags: bus master, medium devsel, latency 255, IRQ 32
Memory at 0000000002400000 (32-bit, prefetchable) [size=4M]
Expansion ROM at 00000000022d0000 [disabled] [size=32K]
0000:00:09.0 SCSI storage controller: QLogic Corp. ISP1020 Fast-wide
SCSI (rev 02)
Flags: bus master, medium devsel, latency 248, IRQ 28
I/O ports at 9000 [size=256]
Memory at 00000000022d8000 (32-bit, non-prefetchable) [size=4K]
Expansion ROM at 00000000022c0000 [disabled] [size=64K]
0000:00:0a.0 Non-VGA unclassified device: Intel Corporation
82375EB/SB PCI to EISA Bridge (rev 15)
Flags: bus master, medium devsel, latency 248
0000:00:0b.0 Ethernet controller: Digital Equipment Corporation
DECchip 21140 [FasterNet] (rev 20)
Subsystem: Digital Equipment Corporation: Unknown device 500a
Flags: bus master, medium devsel, latency 255, IRQ 16
I/O ports at 9480 [size=128]
Memory at 00000000022de000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at 0000000002280000 [disabled] [size=256K]
0000:00:0c.0 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI])
Subsystem: NEC Corporation USB
Flags: bus master, medium devsel, latency 252, IRQ 20
Memory at 00000000022d9000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 2
0000:00:0c.1 USB Controller: NEC Corporation USB (rev 43) (prog-if 10 [OHCI])
Subsystem: NEC Corporation USB
Flags: bus master, medium devsel, latency 252, IRQ 21
Memory at 00000000022da000 (32-bit, non-prefetchable) [size=4K]
Capabilities: [40] Power Management version 2
0000:00:0c.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
(prog-if 20 [EHCI])
Subsystem: HaSoTec GmbH: Unknown device 2928
Flags: bus master, medium devsel, latency 252, IRQ 22
Memory at 00000000022dc000 (32-bit, non-prefetchable) [size=256]
Capabilities: [40] Power Management version 2
=============================================================================================
But I have a fix:
Looking through the kernel surces of the different architectures I
have seen that almost all architectures use the same irq.c code. In
newer kernels (>2.6.8) for example x86, ia64, amd64, powerpc, parisc
change to a generic IRQ handler code. The others are not yet changed,
others have different IRQ handlers.
Alpha has not yet changed, but also uses the same IRQ code, with some
"small" but for this bug important changes. It seems that the code of
handle_irq_event() is a little bit outdated (seems to be unmodified
since 2.2 kernels!!!), all other architectures changed it since 2.2.
It was a little bit too much work to change alpha to the generic
code, but waht helped was copy/paste of the handle_irq_event() code
from x86 to alpha. After that it works, the machine ran 40 days with
2.6.8, 90 days with 2.6.10, since september with 2.6.12 and since
yesterday with 2.6.14 (all kernels patched with this patch) -- if I
did not shut down because of kernel update, the first patched 2.6.8
would sure also run until today :-)
==================================================================================================================
diff -ru
--- arch/alpha/kernel/irq.c 2005-03-02 08:38:18.000000000 +0100
+++ arch/alpha/kernel/irq.c 2005-05-15 23:32:09.000000000 +0200
@@ -79,29 +79,27 @@
.end = no_irq_enable_disable,
};
-int
-handle_IRQ_event(unsigned int irq, struct pt_regs *regs,
- struct irqaction *action)
+int handle_IRQ_event(unsigned int irq, struct pt_regs *regs,
+ struct irqaction *action)
{
- int status = 1; /* Force the "do bottom halves" bit */
- int ret;
+ int ret, retval = 0, status = 0;
- do {
- if (!(action->flags & SA_INTERRUPT))
- local_irq_enable();
- else
- local_irq_disable();
+ if (!(action->flags & SA_INTERRUPT))
+ local_irq_enable();
+ do {
ret = action->handler(irq, action->dev_id, regs);
if (ret == IRQ_HANDLED)
status |= action->flags;
+ retval |= ret;
action = action->next;
} while (action);
+
if (status & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
local_irq_disable();
- return status;
+ return retval;
}
/*
=====================================================================================================================
The patch is also available on the machine itself:
http://alpha.thetaphi.de/alpha-irq.patch
The best solution would be to move alpha also to the generic IRQ code
(If I have time I would help with that), but this patch helps.
Another person had also this crash, but he said it only happens with
udev/hotplug running - so this could be the cause: The old kernel-2.2
code is not compatible with hotplug features. I did not test this
because my machine needs hotplug for usb and not all drivers are put
to initrd or /etc/modules.
Reply to: