[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#483005: linux-image-2.6.24-1-686: Sky2 driver's watchdog hangs vlans



Package: linux-image-2.6.24-1-686
Severity: important

Note: I posted this bug to
http://bugzilla.kernel.org/show_bug.cgi?id=10693 because I think it is
a "generic" kernel bug, not specifically linked to Debian.  I suppose
here is the right place to post it.  Sorry for the double posting.

I configured multiple VLANs on eth1 (vlan161 to vlan166 + vlan170,
using /etc/network/interfaces).  Everything works fine for some time
(a few
minutes up to a few days), then for some unknown reason the sky2
driver suddenly hangs and restarts.  Unfortunately, the VLAN support
seems broken after sky2 restarts.  I get the following dmesg output:

May 13 14:07:42 wibox kernel: sky2 eth1: hung mac 0:124 fifo 195 (115:110)
May 13 14:07:42 wibox kernel: sky2 eth1: receiver hang detected
May 13 14:07:42 wibox kernel: sky2 eth1: disabling interface
May 13 14:07:42 wibox kernel: sky2 eth1: enabling interface
May 13 14:07:44 wibox kernel: sky2 eth1: Link is up at 100 Mbps, full
duplex, flow control rx
May 13 14:08:14 wibox kernel: sky2 eth1: rx length error: status
0x402300 length 64
May 13 14:08:14 wibox last message repeated 5 times
May 13 14:08:14 wibox kernel: sky2 eth1: rx length error: status
0x522100 length 82
May 13 14:08:14 wibox kernel: sky2 eth1: rx length error: status
0x402300 length 64
[...]

Steps to reproduce:
Configure some VLANs on a sky2-managed Gigabit Ethernet port, and
manage to get the sky2 driver to hang and automatically restart (I
don't know how to
force the sky2 driver to hang, I would just flow some trafic through
the VLANs for some time, but there is probably a better way).  You
will then see that VLAN tagged packets are dropped, but untagged
packets are ok.

My opinion:
Basically, it seems that everything is properly initialized, including
VLAN tags associated to each interface.  But if for some reason the
sky2 watchdog detects a hang, it restarts the interface, but it
forgets to set the VLANs again.  From then on, all packets received
are rejected because they are tagged and the sky2 driver excepts
untagged packets (hence the "rx length" error message).

Therefore, after any hang, the watchdog does not restart the interface
properly when VLAN tagging is used.

In sky2.c (line 2195), the error message "%s: rx length error: status
%#x length %d\n" is displayed only if (line 2177) length != count
(actual length different than expected length).  The VLAN ID bytes are
taken into account (on line 2151) like this:
#ifdef SKY2_VLAN_TAG_USED
        /* Account for vlan tag */
        if (sky2->vlgrp && (status & GMR_FS_VLAN))
                count -= VLAN_HLEN;
#endif

In my error messages, I can read that the status is equal to 0x402300
or 0x522100, for example (see above), and therefore I known that
(status & GMR_FS_VLAN) is TRUE (GMR_FS_VLAN is equal to 1<<13).  Since
I get rx length errors, I believe that the count does not take into
account the VLAN header bytes, and I think that the only possibility
for this to happen is if sky2->vlgrp is NULL.

Apparently, sky2->vlgrp gets set properly upon driver initialization,
but it gets unset when the sky2 watchdog restarts the device.

sky2->vlgrp seems to be set only in function sky2_vlan_rx_register (on
line 1155).

And function sky2_vlan_rx_register gets called only in
sky2_init_netdev (on line 4011):
#ifdef SKY2_VLAN_TAG_USED
        /* The workaround for FE+ status conflicts with VLAN tag
         * detection. */
        if (!(sky2->hw->chip_id == CHIP_ID_YUKON_FE_P &&
              sky2->hw->chip_rev == CHIP_REV_YU_FE2_A0)) {
                dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
                dev->vlan_rx_register = sky2_vlan_rx_register;
        }
#endif

So the solution might be to cut and paste this code from
sky2_init_netdev (which does not seem to be called when the sky2
watchdog restarts the device) into sky2_up (which is called both upon
initialization of the device, and when the watchdog restarts the
device)?

Thanks for your kind help!

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.24-1-686
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8
(charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C)



Reply to: