Bug#461407: kernel-image: Kernel panic with clusterip
Package: kernel-image
Version: 2.6-amd64
Severity: normal
Using the clusterip feature of the linux kernel often provoke a kernel panic
when the module ipt_CLUSTERIP is removed.
This behaviour is non constan but it happens frequently.
The passages to reproduce this bug are:
1. setup two debian machines on the same subnet
2. choose a new (and unused) ip of the same subnet.
3. on the machine A and B type:
ip add $NEW_IP dev $IFACE
4. on the machine A type:
iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 1
on the machine B type:
iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 2
5. ping form a third machine the $NEW_IP and see if it's working.
If not reread the previous passages and detect the error.
6. on the machine A type:
iptables -D INPUT iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 1
on the machine B type:
echo "+1" > /proc/net/ipt_CLUSTERIP/$NEW_IP
7. At this point the machine A should show a kernel panic.
if not type on the machine B:
echo "-1" > /proc/net/ipt_CLUSTERIP/$NEW_IP
and on the machine A:
iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 1
then return at step 6.
This is the log from first kernel panic I've got:
NMI Watchdog detected LOCKUP on CPU 0
CPU 0
Modules linked in: ipv6 button ac battery xt_tcpudp xt_state ip_conntrack nfnetlink ipt_CLUSTERIP xt_multiport iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_mod loop i2c_amd756 i2c_core psmouse floppy amd_rng serio_raw pcspkr shpchp pci_hotplug evdev ext3 jbd mbcache sd_mod ide_cd cdrom generic mptspi mptscsih mptbase scsi_transport_spi scsi_mod tg3 amd74xx ohci_hcd ide_core thermal processor fan
Pid: 1650, comm: df_inode Not tainted 2.6.18-5-amd64 #1
RIP: 0010:[<ffffffff8025716f>] [<ffffffff8025716f>] cache_alloc_refill+0x14e/0x1da
RSP: 0018:ffff810073b4bcf8 EFLAGS: 00000017
RAX: ffff81000179b000 RBX: 000000000000000a RCX: 0000000000000008
RDX: ffff81000179b000 RSI: ffff810001773000 RDI: ffff810037b243c0
RBP: ffff810001773000 R08: ffff810037b03400 R09: ffff810037b06000
R10: ffffffff8024bd1e R11: 0000000000000000 R12: ffff810037b20dc0
R13: ffff810037b03400 R14: 0000000000000032 R15: ffff810037b243c0
FS: 00002b4ffe8176d0(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000005cae30 CR3: 0000000073adf000 CR4: 00000000000006e0
Process df_inode (pid: 1650, threadinfo ffff810073b4a000, task ffff81007df5a080)
Stack: 000000d07d9ef900 ffff810037b243c0 0000000000000286 00000000000000d0
ffff810073b4be14 ffff81007e2ab9c0 ffff810037de0980 ffffffff802b5b71
ffff81007c47a2c0 0000000000000400 00000000000000ff ffffffff8022e7b5
Call Trace:
[<ffffffff802b5b71>] __kmalloc+0x8a/0x94
[<ffffffff8022e7b5>] expand_files+0xc8/0x2b1
[<ffffffff802205c1>] dup_fd+0x13b/0x287
[<ffffffff80264024>] do_gettimeofday+0x50/0x94
[<ffffffff80245186>] copy_files+0x47/0x63
[<ffffffff8021d0ad>] copy_process+0x50f/0x1490
[<ffffffff8022f102>] do_fork+0xcd/0x1d0
[<ffffffff80257bd6>] system_call+0x7e/0x83
[<ffffffff80257ee3>] ptregscall_common+0x67/0xac
Code: 4c 89 65 08 49 89 2c 24 45 85 f6 0f 8f 42 ff ff ff 41 8b 45
console shuts up ...
<4>get_unused_fd: slot 0 not NULL!
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: ipv6 button ac battery xt_tcpudp xt_state ip_conntrack nfnetlink ipt_CLUSTERIP xt_multiport iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_mod loop i2c_amd756 i2c_core psmouse floppy amd_rng serio_raw pcspkr shpchp pci_hotplug evdev ext3 jbd mbcache sd_mod ide_cd cdrom generic mptspi mptscsih mptbase scsi_transport_spi scsi_mod tg3 amd74xx ohci_hcd ide_core thermal processor fan
Pid: 0, comm: swapper Not tainted 2.6.18-5-amd64 #1
RIP: 0010:[<ffffffff8025dff6>] [<ffffffff8025dff6>] .text.lock.spinlock+0x2/0x8a
RSP: 0018:ffff81000164fe78 EFLAGS: 00000082
RAX: 0000000000000000 RBX: ffff810080012cc0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff810080012cc0 RDI: ffff810037b20e00
RBP: ffff810037b20dc0 R08: ffff81000164fec0 R09: ffff81000164fec0
R10: ffff81000164ff30 R11: 00000000ffffffff R12: 0000000000000000
R13: ffff810037b243c0 R14: 0000000000000282 R15: 0000000000000000
FS: 00002b2ccb19cc80(0000) GS:ffff8100800833c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b2ccaff2160 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81008008c000, task ffff810001636830)
Stack: ffffffff802b629d 00000002ffffffff ffff810080012cc0 ffff81007ce79240
0000000000000000 ffff810037b243c0 ffffffff8020ae3d 0000000000000000
ffff81007ce79240 ffff81007c47a8a0 ffff81007c47a880 0000000000000000
Call Trace:
<IRQ> [<ffffffff802b629d>] __drain_alien_cache+0x2c/0x66
[<ffffffff8020ae3d>] kfree+0x12c/0x1bc
[<ffffffff802c2e67>] free_fdtable_rcu+0x75/0xe5
[<ffffffff8028db45>] __rcu_process_callbacks+0x122/0x1a8
[<ffffffff8028dbee>] rcu_process_callbacks+0x23/0x43
[<ffffffff80283db8>] tasklet_action+0x62/0xac
[<ffffffff80210381>] __do_softirq+0x5e/0xd5
[<ffffffff80258dac>] call_softirq+0x1c/0x28
[<ffffffff80263749>] do_softirq+0x2c/0x7d
[<ffffffff802617fd>] default_idle+0x0/0x50
[<ffffffff8025874a>] apic_timer_interrupt+0x66/0x6c
<EOI> [<ffffffff8026ea49>] physflat_send_IPI_mask+0x0/0x6a
[<ffffffff80261826>] default_idle+0x29/0x50
[<ffffffff8024508b>] cpu_idle+0x95/0xb8
[<ffffffff8026c0f1>] start_secondary+0x43e/0x44d
Code: 83 3f 00 7e f9 e9 6d fe ff ff e8 ff d7 ff ff e9 7d fe ff ff
console shuts up ...
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
hash=1 ct_hash=1 <4>warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip vprintk+0x29e/0x2ea
responsible
And this is the second one:
Unable to handle kernel NULL pointer dereference at 0000000000000007 RIP:
[<ffffffff8819c011>] :ipt_CLUSTERIP:__clusterip_config_find+0x11/0x22
PGD 7d1fe067 PUD 7d1ff067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: ipv6 button ac battery xt_tcpudp xt_state ip_conntrack nfnetlink ipt_CLUSTERIP xt_multiport iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_mod loop evdev psmouse shpchp serio_raw i2c_amd756 pcspkr i2c_core floppy pci_hotplug amd_rng ext3 jbd mbcache ide_cd cdrom generic sd_mod amd74xx ide_core mptspi mptscsih ohci_hcd mptbase scsi_transport_spi scsi_mod tg3 thermal processor fan
Pid: 0, comm: swapper Not tainted 2.6.18-5-amd64 #1
RIP: 0010:[<ffffffff8819c011>] [<ffffffff8819c011>] :ipt_CLUSTERIP:__clusterip_config_find+0x11/0x22
RSP: 0018:ffffffff804c0db8 EFLAGS: 00010293
RAX: 0000000000000007 RBX: 0000000018016e9e RCX: ffff810037f2c000
RDX: 0000000000000007 RSI: ffffffff804c0ea0 RDI: 0000000018016e9e
RBP: ffff81007dcc0a10 R08: ffffffff8022d7d5 R09: ffffffff8819e080
R10: ffff810037f2c1a8 R11: 00000000ffffffff R12: ffff81007dcc0a18
R13: ffff810037f2c000 R14: ffffffff804c0ea0 R15: ffffffff80513190
FS: 00002b56d29bc6d0(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 000000007d1fd000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80530000, task ffffffff804494c0)
Stack: ffffffff8819c62c 0000000000000082 ffffffff804c0e50 ffff810037f2c000
0000000000000000 0000000000000001 ffffffff80231d56 0000000000000080
0000000000000001 ffffffff804c0ea0 ffffffff80513190 ffffffff8022d7d5
Call Trace:
<IRQ> [<ffffffff8819c62c>] :ipt_CLUSTERIP:arp_mangle+0x63/0xd3
[<ffffffff80231d56>] nf_iterate+0x41/0x7d
[<ffffffff8022d7d5>] dev_queue_xmit+0x0/0x25c
[<ffffffff802523bf>] nf_hook_slow+0x58/0xc4
[<ffffffff8022d7d5>] dev_queue_xmit+0x0/0x25c
[<ffffffff803c0c83>] arp_xmit+0x3d/0x4f
[<ffffffff803c1fba>] arp_solicit+0x129/0x183
[<ffffffff80399ca1>] neigh_timer_handler+0x2ab/0x2fc
[<ffffffff803999f6>] neigh_timer_handler+0x0/0x2fc
[<ffffffff80287107>] run_timer_softirq+0x133/0x1b1
[<ffffffff80210381>] __do_softirq+0x5e/0xd5
[<ffffffff80258dac>] call_softirq+0x1c/0x28
[<ffffffff80263749>] do_softirq+0x2c/0x7d
[<ffffffff802617fd>] default_idle+0x0/0x50
[<ffffffff8025874a>] apic_timer_interrupt+0x66/0x6c
<EOI> [<ffffffff80261826>] default_idle+0x29/0x50
[<ffffffff8024508b>] cpu_idle+0x95/0xb8
[<ffffffff8053a799>] start_kernel+0x216/0x21b
[<ffffffff8053a288>] _sinittext+0x288/0x28c
Code: 48 8b 10 0f 18 0a 48 3d 40 e1 19 88 75 ea 31 c0 c3 48 89 f7
RIP [<ffffffff8819c011>] :ipt_CLUSTERIP:__clusterip_config_find+0x11/0x22
RSP <ffffffff804c0db8>
CR2: 0000000000000007
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!
-- System Information:
Debian Release: 4.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-5-amd64
Locale: LANG=it_IT.UTF-8, LC_CTYPE=it_IT.UTF-8 (charmap=UTF-8)
Reply to: