[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#657309: linux image 3.1 panics on brctl addif bond to bridge



Package: linux-image-3.1.0-1-amd64
Version: 3.1.8-2



Hello,

After the upgrade to linux-image-3.1.0-1-amd64, we can't brctl addif bond interfaces (active-backup) on a bridge. We get a kernel panic each time. This behaviour is not observed with 2.6.32-5-amd64, but is observed also with backports' 3.1 kernel

/etc/network/interfaces:


        auto eth0
        iface eth0 inet manual
                up ifconfig $IFACE mtu 9000 || true
                up echo 0 > /proc/sys/net/ipv6/conf/$IFACE/autoconf

        auto eth1
        iface eth1 inet manual
                up ifconfig $IFACE mtu 9000 || true
                up echo 0 > /proc/sys/net/ipv6/conf/$IFACE/autoconf

        auto eth2
        iface eth2 inet manual
                up ifconfig $IFACE mtu 9000 || true
                up echo 0 > /proc/sys/net/ipv6/conf/$IFACE/autoconf

        auto eth3
        iface eth3 inet manual
                up ifconfig $IFACE mtu 9000 || true
                up echo 0 > /proc/sys/net/ipv6/conf/$IFACE/autoconf

        # The primary network interface
        auto bond0
        iface bond0 inet static
                address 10.1.5.123
                netmask 255.255.255.192
                broadcast 10.1.5.127
                gateway 10.1.5.65
                mtu     1500
                bond-mode               active-backup
                primary                 eth0
                bond-miimon             100
                slaves                  eth0 eth1 eth2 eth3
                up echo 0 > /proc/sys/net/ipv6/conf/$IFACE/autoconf
auto prv
iface prv inet manual
        up   prv-net-helper up   bond0 1 100 2999
        down prv-net-helper down bond0 1 100 2999

/usr/sbin/prv-net-helper:

#!/bin/bash

function usage {
echo "Usage: $0 <mode> <parent interface> <prv min> <prv max> <offset>"
        exit 1
}

if [ $# -ne 5 ]; then
        usage
fi

function up {
        iface=$1
        prv_min=$2
        prv_max=$3
        offset=$4

        echo "Adding VLANs $2 - $3"
        vconfig set_name_type DEV_PLUS_VID_NO_PAD
        for prv in $(seq $prv_min $prv_max); do
                vlan=$(($prv+$offset))
                bridge=prv$prv

                vconfig add $iface $vlan
                ifconfig $iface.$vlan up
                brctl addbr $bridge
                brctl setfd $bridge 0
                brctl addif $bridge $iface.$vlan
                ifconfig $bridge up
                sleep 3
        done
}

function down {
        iface=$1
        prv_min=$2
        prv_max=$3
        offset=$4

        echo "Removing VLANs $2 - $3"
        for prv in $(seq $prv_min $prv_max); do
                vlan=$(($prv+$offset))
                bridge=prv$prv

                (
                ifconfig $bridge down
                brctl delif $bridge $iface.$vlan # dev_plus_vid
                vconfig rem $iface.$vlan
                brctl delbr $bridge
                ) 2>/dev/null
        done
}

mode=$1; shift
if [ "$mode" = "up" ]; then
        up $@
elif [ "$mode" = "down" ]; then
        down $@
else
        usage
fi

After the script runs, we should have prv1-100 bridges, each one having a different bond0.VLAN interface:
For example:
# brctl show
bridge name     bridge id               STP enabled     interfaces
prv1            8000.001517cff668       no              bond0.3000

Instead we get a kernel panic on "brctl addif $bridge $iface.$vlan"

Backtrace:

rados0-01 login: [  586.287504] device bond0.3001 entered promiscuous mode
[  586.293343] device bond0 entered promiscuous mode
[  586.298691] device eth1 entered promiscuous mode
[ 588.195088] skb_over_panic: text:ffffffffa009fa8e len:2048 put:2048 head:ffff880626066000 data:ffff880626066040 tail:0x840 end:0x640 dev:eth1
[  588.209409] ------------[ cut here ]------------
[ 588.214651] kernel BUG at /build/buildd-linux-2.6_3.1.8-2-amd64-XPJTbL/linux-2.6-3.1.8/debian/build/source_amd64_none/net/core/skbuff.c:128!
[  588.228851] invalid opcode: 0000 [#1] SMP
[  588.233650] CPU 0
[ 588.235758] Modules linked in: 8021q garp bridge stp drbd lru_cache cn nfnetlink_queue nfnetlink kvm_intel kvm ip6table_raw ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_mangle ip6_tables xt_NOTRACK iptable_raw ipt_REJECT xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_filter xt_tcpudp xt_NFQUEUE iptable_mangle ip_tables x_tables ext4 jbd2 crc16 ipmi_devintf ipmi_si nf_conntrack ipmi_poweroff ipmi_msghandler mptctl bonding psmouse ohci_hcd ioatdma i2c_i801 i7core_edac edac_core i2c_core snd_pcm snd_timer snd soundcore snd_page_alloc joydev evdev tpm_tis processor ac acpi_power_meter tpm pcspkr tpm_bios button container power_supply thermal_sys ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid sg sr_mod cdrom ata_generic uhci_hcd mptsas mptscsih ata_piix mptbase libata scsi_transport_sas ehci_hcd usbcore igb e1000e scsi_mod dca [last unloaded: scsi_wait_scan]
[  588.331722]
[ 588.333469] Pid: 0, comm: swapper Not tainted 3.1.0-1-amd64 #1 FUJITSU PRIMERGY RX200 S5 /D2786 [ 588.347093] RIP: 0010:[<ffffffff81267df5>] [<ffffffff81267df5>] skb_put+0x78/0x82
[  588.355736] RSP: 0018:ffff88063fc03d70  EFLAGS: 00010282
[ 588.361755] RAX: 0000000000000097 RBX: ffff880c25360200 RCX: 0000000000000dc7 [ 588.369813] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 [ 588.377871] RBP: ffff880625d9fe80 R08: 0000000000000000 R09: 0000000000000000 [ 588.385930] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc9000775e4c8 [ 588.393988] R13: ffffc9000775e4a0 R14: ffff880c25be3850 R15: ffff880c25be3840 [ 588.402047] FS: 0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000
[  588.411199] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 588.417704] CR2: 0000000000b29008 CR3: 0000000001605000 CR4: 00000000000006f0 [ 588.425760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 588.433808] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 588.441866] Process swapper (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020)
[  588.451017] Stack:
[ 588.453346] 0000000000000840 0000000000000640 ffff880c267a6000 ffff880c25360200 [ 588.461996] ffff880625d9fe80 ffffffffa009fa8e ffff880c257cd0c4 0000000000000044 [ 588.470647] ffff88063fc03e38 ffffffffa0054eef ffffffff81601fd8 000000008108e8b1
[  588.479296] Call Trace:
[  588.482111]  <IRQ>
[  588.484613]  [<ffffffffa009fa8e>] ? igb_poll+0x44c/0x9d1 [igb]
[ 588.491220] [<ffffffffa0054eef>] ? e1000_clean_rx_irq+0x257/0x291 [e1000e]
[  588.499089]  [<ffffffffa00554a1>] ? e1000_clean+0x1f7/0x208 [e1000e]
[  588.506274]  [<ffffffff81271694>] ? net_rx_action+0xa1/0x1af
[  588.512684]  [<ffffffff8104ad58>] ? __do_softirq+0xb9/0x177
[  588.518997]  [<ffffffff8102333c>] ? __setup_APIC_LVTT+0x4a/0x66
[  588.525696]  [<ffffffff8133506c>] ? call_softirq+0x1c/0x30
[  588.531914]  [<ffffffff8100f845>] ? do_softirq+0x3c/0x7b
[  588.537934]  [<ffffffff8104afc0>] ? irq_exit+0x3c/0x9a
[  588.543760]  [<ffffffff8100f575>] ? do_IRQ+0x82/0x98
[  588.549392]  [<ffffffff8132e16e>] ? common_interrupt+0x6e/0x6e
[  588.555992]  <EOI>
[  588.558483]  [<ffffffff811d586f>] ? intel_idle+0xd4/0xf9
[  588.564503]  [<ffffffff811d584e>] ? intel_idle+0xb3/0xf9
[  588.570523]  [<ffffffff81251e5a>] ? cpuidle_idle_call+0xf0/0x175
[  588.577319]  [<ffffffff8100d250>] ? cpu_idle+0x9c/0xe0
[  588.583146]  [<ffffffff816a6b4e>] ? start_kernel+0x3bd/0x3c8
[  588.589553]  [<ffffffff816a6140>] ? early_idt_handlers+0x140/0x140
[  588.596543]  [<ffffffff816a63c4>] ? x86_64_start_kernel+0x104/0x111
[ 588.603628] Code: 8b 57 68 48 89 44 24 10 8b 87 d0 00 00 00 48 89 44 24 08 8b bf cc 00 00 00 31 c0 48 89 3c 24 48 c7 c7 79 80 50 8 58 fc 0b 00 <0f> 0b 4c 01 c0 48 83 c4 28 c3 41 57 41 56 41 55 41 54 41 89 d4
[  588.629346] RIP  [<ffffffff81267df5>] skb_put+0x78/0x82
[  588.635327]  RSP <ffff88063fc03d70>
[  588.639558] ---[ end trace 83fa0875c297a122 ]---
[  588.644923] Kernel panic - not syncing: Fatal exception in interrupt
[  588.652217] Pid: 0, comm: swapper Tainted: G      D      3.1.0-1-amd64 #1
[  588.660003] Call Trace:
[  588.662930]  <IRQ>  [<ffffffff8132793d>] ? panic+0x95/0x1a5
[  588.669429]  [<ffffffff8132eecb>] ? oops_end+0xa9/0xb6
[  588.675328]  [<ffffffff8100e8c0>] ? do_invalid_op+0x87/0x91
[  588.681755]  [<ffffffff81267df5>] ? skb_put+0x78/0x82
[  588.687580]  [<ffffffff810464d1>] ? vprintk+0x39e/0x3d9
[  588.693682]  [<ffffffff81334deb>] ? invalid_op+0x1b/0x20
[  588.699777]  [<ffffffff81267df5>] ? skb_put+0x78/0x82
[  588.705627]  [<ffffffffa009fa8e>] ? igb_poll+0x44c/0x9d1 [igb]
[ 588.712334] [<ffffffffa0054eef>] ? e1000_clean_rx_irq+0x257/0x291 [e1000e]
[  588.720319]  [<ffffffffa00554a1>] ? e1000_clean+0x1f7/0x208 [e1000e]
[  588.727611]  [<ffffffff81271694>] ? net_rx_action+0xa1/0x1af
[  588.734136]  [<ffffffff8104ad58>] ? __do_softirq+0xb9/0x177
[  588.740563]  [<ffffffff8102333c>] ? __setup_APIC_LVTT+0x4a/0x66
[  588.747376]  [<ffffffff8133506c>] ? call_softirq+0x1c/0x30
[  588.753712]  [<ffffffff8100f845>] ? do_softirq+0x3c/0x7b
[  588.759852]  [<ffffffff8104afc0>] ? irq_exit+0x3c/0x9a
[  588.765792]  [<ffffffff8100f575>] ? do_IRQ+0x82/0x98
[  588.771538]  [<ffffffff8132e16e>] ? common_interrupt+0x6e/0x6e
[  588.778258]  <EOI>  [<ffffffff811d586f>] ? intel_idle+0xd4/0xf9
[  588.785140]  [<ffffffff811d584e>] ? intel_idle+0xb3/0xf9
[  588.791332]  [<ffffffff81251e5a>] ? cpuidle_idle_call+0xf0/0x175
[  588.798245]  [<ffffffff8100d250>] ? cpu_idle+0x9c/0xe0
[  588.804189]  [<ffffffff816a6b4e>] ? start_kernel+0x3bd/0x3c8
[  588.810721]  [<ffffffff816a6140>] ? early_idt_handlers+0x140/0x140
[  588.817826]  [<ffffffff816a63c4>] ? x86_64_start_kernel+0x104/0x111



If there's no bond0, just an ethernet interface, everything works as expected.

The cards are Intel Corporation PRO/1000 PT Dual Port Server Adapter with the driver e1000e (lspci -n gives 8086:105e (rev 06) )

The machine is a debian squeeze, with only the kernel from wheezy or backports (Tried both)


Thanks in advance,
Costas Drogos



Reply to: