[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#461407: kernel-image: Kernel panic with clusterip



Package: kernel-image
Version: 2.6-amd64
Severity: normal

Using the clusterip feature of the linux kernel often provoke a kernel panic
when the module ipt_CLUSTERIP is removed.
This behaviour is non constan but it happens frequently.
The passages to reproduce this bug are:
1. setup two debian machines on the same subnet
2. choose a new (and unused) ip of the same subnet.
3. on the machine A and B type:
   ip add $NEW_IP dev $IFACE
4. on the machine A type:
   iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 1
   on the machine B type:
   iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 2
5. ping form a third machine the $NEW_IP and see if it's working.
   If not reread the previous passages and detect the error.
6. on the machine A type:
   iptables -D INPUT iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 1
   on the machine B type:
   echo "+1" > /proc/net/ipt_CLUSTERIP/$NEW_IP
7. At this point the machine A should show a kernel panic.
   if not type on the machine B:
   echo "-1" > /proc/net/ipt_CLUSTERIP/$NEW_IP
   and on the machine A:
   iptables -I INPUT -d $NEW_IP -i $IFACE -j CLUSTERIP --new --hashmode sourceip-sourceport --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 1
   then return at step 6.

This is the log from first kernel panic I've got:
NMI Watchdog detected LOCKUP on CPU 0
CPU 0
Modules linked in: ipv6 button ac battery xt_tcpudp xt_state ip_conntrack nfnetlink ipt_CLUSTERIP xt_multiport iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_mod loop i2c_amd756 i2c_core psmouse floppy amd_rng serio_raw pcspkr shpchp pci_hotplug evdev ext3 jbd mbcache sd_mod ide_cd cdrom generic mptspi mptscsih mptbase scsi_transport_spi scsi_mod tg3 amd74xx ohci_hcd ide_core thermal processor fan
Pid: 1650, comm: df_inode Not tainted 2.6.18-5-amd64 #1
RIP: 0010:[<ffffffff8025716f>]  [<ffffffff8025716f>] cache_alloc_refill+0x14e/0x1da
RSP: 0018:ffff810073b4bcf8  EFLAGS: 00000017
RAX: ffff81000179b000 RBX: 000000000000000a RCX: 0000000000000008
RDX: ffff81000179b000 RSI: ffff810001773000 RDI: ffff810037b243c0
RBP: ffff810001773000 R08: ffff810037b03400 R09: ffff810037b06000
R10: ffffffff8024bd1e R11: 0000000000000000 R12: ffff810037b20dc0
R13: ffff810037b03400 R14: 0000000000000032 R15: ffff810037b243c0
FS:  00002b4ffe8176d0(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000005cae30 CR3: 0000000073adf000 CR4: 00000000000006e0
Process df_inode (pid: 1650, threadinfo ffff810073b4a000, task ffff81007df5a080)
Stack:  000000d07d9ef900 ffff810037b243c0 0000000000000286 00000000000000d0
 ffff810073b4be14 ffff81007e2ab9c0 ffff810037de0980 ffffffff802b5b71
 ffff81007c47a2c0 0000000000000400 00000000000000ff ffffffff8022e7b5
Call Trace:
 [<ffffffff802b5b71>] __kmalloc+0x8a/0x94
 [<ffffffff8022e7b5>] expand_files+0xc8/0x2b1
 [<ffffffff802205c1>] dup_fd+0x13b/0x287
 [<ffffffff80264024>] do_gettimeofday+0x50/0x94
 [<ffffffff80245186>] copy_files+0x47/0x63
 [<ffffffff8021d0ad>] copy_process+0x50f/0x1490
 [<ffffffff8022f102>] do_fork+0xcd/0x1d0
 [<ffffffff80257bd6>] system_call+0x7e/0x83
 [<ffffffff80257ee3>] ptregscall_common+0x67/0xac


Code: 4c 89 65 08 49 89 2c 24 45 85 f6 0f 8f 42 ff ff ff 41 8b 45
console shuts up ...
 <4>get_unused_fd: slot 0 not NULL!
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: ipv6 button ac battery xt_tcpudp xt_state ip_conntrack nfnetlink ipt_CLUSTERIP xt_multiport iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_mod loop i2c_amd756 i2c_core psmouse floppy amd_rng serio_raw pcspkr shpchp pci_hotplug evdev ext3 jbd mbcache sd_mod ide_cd cdrom generic mptspi mptscsih mptbase scsi_transport_spi scsi_mod tg3 amd74xx ohci_hcd ide_core thermal processor fan
Pid: 0, comm: swapper Not tainted 2.6.18-5-amd64 #1
RIP: 0010:[<ffffffff8025dff6>]  [<ffffffff8025dff6>] .text.lock.spinlock+0x2/0x8a
RSP: 0018:ffff81000164fe78  EFLAGS: 00000082
RAX: 0000000000000000 RBX: ffff810080012cc0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff810080012cc0 RDI: ffff810037b20e00
RBP: ffff810037b20dc0 R08: ffff81000164fec0 R09: ffff81000164fec0
R10: ffff81000164ff30 R11: 00000000ffffffff R12: 0000000000000000
R13: ffff810037b243c0 R14: 0000000000000282 R15: 0000000000000000
FS:  00002b2ccb19cc80(0000) GS:ffff8100800833c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b2ccaff2160 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81008008c000, task ffff810001636830)
Stack:  ffffffff802b629d 00000002ffffffff ffff810080012cc0 ffff81007ce79240
 0000000000000000 ffff810037b243c0 ffffffff8020ae3d 0000000000000000
 ffff81007ce79240 ffff81007c47a8a0 ffff81007c47a880 0000000000000000
Call Trace:
 <IRQ> [<ffffffff802b629d>] __drain_alien_cache+0x2c/0x66
 [<ffffffff8020ae3d>] kfree+0x12c/0x1bc
 [<ffffffff802c2e67>] free_fdtable_rcu+0x75/0xe5
 [<ffffffff8028db45>] __rcu_process_callbacks+0x122/0x1a8
 [<ffffffff8028dbee>] rcu_process_callbacks+0x23/0x43
 [<ffffffff80283db8>] tasklet_action+0x62/0xac
 [<ffffffff80210381>] __do_softirq+0x5e/0xd5
 [<ffffffff80258dac>] call_softirq+0x1c/0x28
 [<ffffffff80263749>] do_softirq+0x2c/0x7d
 [<ffffffff802617fd>] default_idle+0x0/0x50
 [<ffffffff8025874a>] apic_timer_interrupt+0x66/0x6c
 <EOI> [<ffffffff8026ea49>] physflat_send_IPI_mask+0x0/0x6a
 [<ffffffff80261826>] default_idle+0x29/0x50
 [<ffffffff8024508b>] cpu_idle+0x95/0xb8
 [<ffffffff8026c0f1>] start_secondary+0x43e/0x44d


Code: 83 3f 00 7e f9 e9 6d fe ff ff e8 ff d7 ff ff e9 7d fe ff ff
console shuts up ...
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


hash=1 ct_hash=1 <4>warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip vprintk+0x29e/0x2ea
responsible


And this is the second one:
Unable to handle kernel NULL pointer dereference at 0000000000000007 RIP:
 [<ffffffff8819c011>] :ipt_CLUSTERIP:__clusterip_config_find+0x11/0x22
PGD 7d1fe067 PUD 7d1ff067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: ipv6 button ac battery xt_tcpudp xt_state ip_conntrack nfnetlink ipt_CLUSTERIP xt_multiport iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_mod loop evdev psmouse shpchp serio_raw i2c_amd756 pcspkr i2c_core floppy pci_hotplug amd_rng ext3 jbd mbcache ide_cd cdrom generic sd_mod amd74xx ide_core mptspi mptscsih ohci_hcd mptbase scsi_transport_spi scsi_mod tg3 thermal processor fan
Pid: 0, comm: swapper Not tainted 2.6.18-5-amd64 #1
RIP: 0010:[<ffffffff8819c011>]  [<ffffffff8819c011>] :ipt_CLUSTERIP:__clusterip_config_find+0x11/0x22
RSP: 0018:ffffffff804c0db8  EFLAGS: 00010293
RAX: 0000000000000007 RBX: 0000000018016e9e RCX: ffff810037f2c000
RDX: 0000000000000007 RSI: ffffffff804c0ea0 RDI: 0000000018016e9e
RBP: ffff81007dcc0a10 R08: ffffffff8022d7d5 R09: ffffffff8819e080
R10: ffff810037f2c1a8 R11: 00000000ffffffff R12: ffff81007dcc0a18
R13: ffff810037f2c000 R14: ffffffff804c0ea0 R15: ffffffff80513190
FS:  00002b56d29bc6d0(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 000000007d1fd000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80530000, task ffffffff804494c0)
Stack:  ffffffff8819c62c 0000000000000082 ffffffff804c0e50 ffff810037f2c000
 0000000000000000 0000000000000001 ffffffff80231d56 0000000000000080
 0000000000000001 ffffffff804c0ea0 ffffffff80513190 ffffffff8022d7d5
Call Trace:
 <IRQ> [<ffffffff8819c62c>] :ipt_CLUSTERIP:arp_mangle+0x63/0xd3
 [<ffffffff80231d56>] nf_iterate+0x41/0x7d
 [<ffffffff8022d7d5>] dev_queue_xmit+0x0/0x25c
 [<ffffffff802523bf>] nf_hook_slow+0x58/0xc4
 [<ffffffff8022d7d5>] dev_queue_xmit+0x0/0x25c
 [<ffffffff803c0c83>] arp_xmit+0x3d/0x4f
 [<ffffffff803c1fba>] arp_solicit+0x129/0x183
 [<ffffffff80399ca1>] neigh_timer_handler+0x2ab/0x2fc
 [<ffffffff803999f6>] neigh_timer_handler+0x0/0x2fc
 [<ffffffff80287107>] run_timer_softirq+0x133/0x1b1
 [<ffffffff80210381>] __do_softirq+0x5e/0xd5
 [<ffffffff80258dac>] call_softirq+0x1c/0x28
 [<ffffffff80263749>] do_softirq+0x2c/0x7d
 [<ffffffff802617fd>] default_idle+0x0/0x50
 [<ffffffff8025874a>] apic_timer_interrupt+0x66/0x6c
 <EOI> [<ffffffff80261826>] default_idle+0x29/0x50
 [<ffffffff8024508b>] cpu_idle+0x95/0xb8
 [<ffffffff8053a799>] start_kernel+0x216/0x21b
 [<ffffffff8053a288>] _sinittext+0x288/0x28c


Code: 48 8b 10 0f 18 0a 48 3d 40 e1 19 88 75 ea 31 c0 c3 48 89 f7
RIP  [<ffffffff8819c011>] :ipt_CLUSTERIP:__clusterip_config_find+0x11/0x22
 RSP <ffffffff804c0db8>
CR2: 0000000000000007
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
 
-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-5-amd64
Locale: LANG=it_IT.UTF-8, LC_CTYPE=it_IT.UTF-8 (charmap=UTF-8)


Reply to: