[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#572201: forcedeth driver hangs under heavy load



Le lundi 12 avril 2010 à 17:11 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit :
> > 
> > Do you have some netfilters rules ?
> > 
> 
> Hi Eric,
> 
> I don't have any netfilters rules:
> 
> root@node34:~# for table in filter nat mangle raw; do iptables -t $table 
> -L; done
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain PREROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain POSTROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain PREROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain POSTROUTING (policy ACCEPT)
> target     prot opt source               destination
> Chain PREROUTING (policy ACCEPT)
> target     prot opt source               destination
> 
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> 
> 
> I re-ran this on the 2.6.32 kernel (with the 2.6.32 forcedeth module) 
> just in case that was screwing something up.
> 
> node33 is in the unresponsive state this time. I'm running tcpdump on 
> node34. on node33 I try to ssh to node34 (using ip address of node34). I 
> note that I can ping between node33 and node34.
> 
> root@node34:~# tcpdump -v host node34 and node33
> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 
> bytes
> 17:05:19.622384 IP (tos 0x0, ttl 64, id 21435, offset 0, flags [DF], 
> proto TCP (6), length 60)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [S], cksum 0xb994 
> (correct), seq 1675314077, win 5840, options [mss 1460,sackOK,TS val 
> 331814 ecr 0,nop,wscale 7], length 0
> 17:05:19.622754 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 
> TCP (6), length 60)
>      node34.ssh > node33.webstar.cnet.43653: Flags [S.], cksum 0x9d81 
> (correct), seq 1669769379, ack 1675314078, win 5792, options [mss 
> 1460,sackOK,TS val 331779 ecr 331814,nop,wscale 7], length 0
> 17:05:19.622813 IP (tos 0x0, ttl 64, id 21436, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xe2bf 
> (correct), ack 1, win 46, options [nop,nop,TS val 331814 ecr 331779], 
> length 0
> 17:05:19.627666 IP (tos 0x0, ttl 64, id 47271, offset 0, flags [DF], 
> proto TCP (6), length 84)
>      node34.ssh > node33.webstar.cnet.43653: Flags [P.], seq 1:33, ack 
> 1, win 46, options [nop,nop,TS val 331780 ecr 331814], length 32
> 17:05:19.627748 IP (tos 0x0, ttl 64, id 21437, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xe29c 
> (correct), ack 33, win 46, options [nop,nop,TS val 331816 ecr 331780], 
> length 0
> 17:05:19.627833 IP (tos 0x0, ttl 64, id 21438, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 1f8a (->d189)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 23413:23445, ack 2749038625, win 46, options [nop,nop,TS val 331816 ecr 
> 331780], length 32
> 17:05:19.831634 IP (tos 0x0, ttl 64, id 21439, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum d189 (->d188)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
> 33, win 46, options [nop,nop,TS val 331867 ecr 331780], length 32
> 17:05:20.239603 IP (tos 0x0, ttl 64, id 21440, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 15c6 (->d187)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 30492:30524, ack 809893921, win 46, options [nop,nop,TS val 331969 ecr 
> 331780], length 32
> 17:05:21.055534 IP (tos 0x0, ttl 64, id 21441, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum d187 (->d186)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
> 33, win 46, options [nop,nop,TS val 332173 ecr 331780], length 32
> 17:05:22.687386 IP (tos 0x0, ttl 64, id 21442, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum d186 (->d185)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 1:33, ack 
> 33, win 46, options [nop,nop,TS val 332581 ecr 331780], length 32
> 17:05:25.950935 IP (tos 0x0, ttl 64, id 21443, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 15c4 (->d184)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 30492:30524, ack 809893921, win 46, options [nop,nop,TS val 333397 ecr 
> 331780], length 32
> 17:05:32.478527 IP (tos 0x0, ttl 64, id 21444, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum c01 (->d183)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 43997:44029, ack 1311047713, win 46, options [nop,nop,TS val 335029 ecr 
> 331780], length 32
> 17:05:45.533370 IP (tos 0x0, ttl 64, id 21445, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum 23d (->d182)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 3348:3380, 
> ack 4054450209, win 46, options [nop,nop,TS val 338293 ecr 331780], 
> length 32
> 17:06:08.719187 IP (tos 0x0, ttl 64, id 27660, offset 0, flags [DF], 
> proto TCP (6), length 1500, bad cksum 5360 (->b3b3)!)
>      node33.webstar.cnet.50060 > node34.35725: Flags [.], seq 
> 1203473738:1203475186, ack 1191452767, win 54, options [nop,nop,TS val 
> 344089 ecr 256770], length 1448
> 17:06:11.643080 IP (tos 0x0, ttl 64, id 21446, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum e4f2 (->d181)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 47331:47363, ack 4110811169, win 46, options [nop,nop,TS val 344821 ecr 
> 331780], length 32
> 17:06:13.715233 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> node34 tell node33.webstar.cnet, length 46
> 17:06:13.715257 ARP, Ethernet (len 6), IPv4 (len 4), Reply node34 is-at 
> 00:30:48:f0:06:72 (oui Unknown), length 28
> 17:07:03.866492 IP (tos 0x0, ttl 64, id 21447, offset 0, flags [DF], 
> proto TCP (6), length 84, bad cksum b413 (->d180)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [P.], seq 
> 28939:28971, ack 1913782305, win 46, options [nop,nop,TS val 357877 ecr 
> 331780], length 32
> 17:07:08.862055 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
> node34 tell node33.webstar.cnet, length 46
> 17:07:08.862370 ARP, Ethernet (len 6), IPv4 (len 4), Reply node34 is-at 
> 00:30:48:f0:06:72 (oui Unknown), length 28
> 17:07:19.627910 IP (tos 0x0, ttl 64, id 47272, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node34.ssh > node33.webstar.cnet.43653: Flags [F.], cksum 0x6d6b 
> (correct), seq 33, ack 1, win 46, options [nop,nop,TS val 361780 ecr 
> 331816], length 0
> 17:07:19.628403 IP (tos 0x0, ttl 64, id 21448, offset 0, flags [DF], 
> proto TCP (6), length 844, bad cksum aa4d (->ce87)!)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [FP.], seq 
> 20399:21191, ack 2356871202, win 46, options [nop,nop,TS val 361818 ecr 
> 361780], length 792
> 17:07:19.833456 IP (tos 0x0, ttl 64, id 47273, offset 0, flags [DF], 
> proto TCP (6), length 52)
>      node34.ssh > node33.webstar.cnet.43653: Flags [F.], cksum 0x6d37 
> (correct), seq 33, ack 1, win 46, options [nop,nop,TS val 361832 ecr 
> 331816], length 0
> 17:07:19.833517 IP (tos 0x0, ttl 64, id 21449, offset 0, flags [DF], 
> proto TCP (6), length 64)
>      node33.webstar.cnet.43653 > node34.ssh: Flags [.], cksum 0xa5e9 
> (correct), ack 34, win 46, options [nop,nop,TS val 361870 ecr 
> 361832,nop,nop,sack 1 {33:34}], length 0
> 
> At this point, I see a "Connection closed by 10.141.0.34" message on 
> node33 (from where I am attempting to ssh).
> 
> Again, if I ifdown on node33 and ifup again - I can then see from node33 
> to node34 without problems.
> 

OK it seems forcedeth has problem with checksums ?

Try to change "ethtool -k eth0" settings ?

ethtool -K eth0 tso off tx off






Reply to: