v100 ethernet problems
I'm seeing several problems with the network on my v100 machine. I'm
running 2.4.21. I compared the sources between this and 2.4.25, and
it looks like nothing relevant has changed, but I'm willing to try a
newer kernel if someone tells me it fixes any of these problems.
Problem 1: Somewhat correlated with periods of moderate to heavy
traffic, the interface stops working and I get an infinite series of
these:
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
NETDEV WATCHDOG: eth1: transmit timed out
if I log into the serial console and ifdown/ifup the interface,
everything works ok until it happens again.
Problem 2: Sometimes, after many timeouts as in problem 1, the host
panics:
\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
swapper(0): Kernel bad sw trap 5
TSTATE: 0000004480f09600 TPC: 0000000000428cb0 TNPC: 0000000000428cb4 Y: 00000000 Not tainted
Using defaults from ksymoops -t elf32-sparc -a sparc
g0: 000000000000002d g1: 0000000000000000 g2: 0000000000000000 g3: fffff800659b0000
g4: fffff80000000000 g5: 0000000000001de2 g6: 0000000000414000 g7: 0000000000000000
o0: 0000000000000000 o1: 0000000000000001 o2: 00000000003fffff o3: 00000000007ab000
o4: 00000000007ab230 o5: 0000000000000000 sp: 0000000000417091 ret_pc: 0000000000428be8
l0: fffff80066ff8220 l1: fffff80067830000 l2: fffff8006058e010 l3: 000000000000000f
l4: fffff80067eb8820 l5: 0000000000000003 l6: 0000000000000003 l7: fffff800650a7650
i0: 0000000000001fff i1: fffff80067831de2 i2: 0000000000000001 i3: 0000000000000001
i4: 000000000000001b i5: 0000000006898702 i6: 0000000000417151 i7: 00000000005463f8
Caller[00000000005463f8]
Caller[00000000005f9df0]
Caller[00000000005f0c8c]
Caller[000000000044de84]
Caller[000000000040ef40]
Caller[000000000041a504]
Caller[00000000007206f4]
Caller[0000000000404638]
Caller[0000000000000000]
Instruction DUMP: 10680004 01000000 9194c000 <91d02005> 9194c000 81cfe008 91316000 9de3bf40 83366000
ksymoops says:
>>PC; 00428cb0 <pci_map_single+110/120> <=====
>>g6; 00414000 <init_task_union+0/4000>
>>o3; 007ab000 <reserve.2+38/e0>
>>o4; 007ab230 <xtime+0/10>
>>sp; 00417091 <init_task_union+3091/4000>
>>ret_pc; 00428be8 <pci_map_single+48/120>
>>i6; 00417151 <init_task_union+3151/4000>
>>i7; 005463f8 <tulip_start_xmit+38/160>
Trace; 005463f8 <tulip_start_xmit+38/160>
Trace; 005f9df0 <qdisc_restart+50/120>
Trace; 005f0c8c <net_tx_action+ac/100>
Trace; 0044de84 <do_softirq+e4/100>
Trace; 0040ef40 <__handle_softirq+0/10>
Trace; 0041a504 <cpu_idle+44/60>
Trace; 007206f4 <start_kernel+1b4/1e0>
Trace; 00404638 <tlb_fixup_done+54/5c>
Trace; 00000000 Before first symbol
Code; 00428ca4 <pci_map_single+104/120>
00000000 <_PC>:
Code; 00428ca4 <pci_map_single+104/120>
0: 10 68 00 04 unknown
Code; 00428ca8 <pci_map_single+108/120>
4: 01 00 00 00 nop
Code; 00428cac <pci_map_single+10c/120>
8: 91 94 c0 00 unknown
Code; 00428cb0 <pci_map_single+110/120> <=====
c: 91 d0 20 05 ta 5 <=====
Code; 00428cb4 <pci_map_single+114/120>
10: 91 94 c0 00 unknown
Code; 00428cb8 <pci_map_single+118/120>
14: 81 cf e0 08 rett %i7 + 8
Code; 00428cbc <pci_map_single+11c/120>
18: 91 31 60 00 srl %g5, 0, %o0
Code; 00428cc0 <pci_unmap_single+0/160>
1c: 9d e3 bf 40 save %sp, -192, %sp
Code; 00428cc4 <pci_unmap_single+4/160>
20: 83 36 60 00 srl %i1, 0, %g1
Problem 3: Full-duplex doesn't work right. At boot, the interface
autonegotiates to 100baseTx-HD. If I use mii-tool to force it to full
duplex, I see reduced throughput, and a transmit error on every packet
is reported (but the packets are sent).
Marc
Reply to: