[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: that "tcp_v4_rechecksum" thing



Its been on my "todo" list also.  I originally made this patch and it
was not meant to stay nearly as long as it did.  On Netwinder.org we
see similar behaviour in the logs (running 2.2.19).

I don't think it is slowing down networking that much (other than the
extra disk activity due to syslong).  Because if the routine were not
called, then the packets would be going out with an incorrect checksum,
and tcp would eventually need to retransmit them.  Which doesn't help
your throughput at all.

A fix that should go in is to only print the message if the checksums
have changed - that should be pretty easy, and will avoid some of the
messages (i'll try to do that soon).

To actually fix the problem we need to figure out why the checksum is
occasionally wrong.  It looks like somebody's data is being walked
over, and could well be a cache issue.  Any suggestions on how to
detect this would be welcomed.

One thing that might be useful would be to see if the problem occurs
with a particular interface only - eg. is it related to the network
drivers.  I guess we should log that so we can start to make some
educated guesses about it.

The underlying problem could be present (but unfixed) in 2.4.  This
problem was originally found when running specweb testsuite, which
counts "bad" packets as well as good ones in its results.  I don't
think that anyone at Rebel ever got around to measuring specweb on ARM
on 2.4 kernels (they were focussed on the crusoe-based model then).

-Ralph

On Tue, Feb 05, 2002 at 10:23:04AM +0000, Phil Blundell wrote:
> Did anybody ever really get to the bottom of that problem with TCP
> checksums in 2.2?  I just noticed that elara and europa, running a
> 2.2.19 build from back in April, are both spewing out large numbers of
> these debug messages, which might account for the poor network
> performance that we're seeing on those machines.  (The amusing thing is
> that most of the "old" checksum values look more like kernel pointers
> than actual checksums.)
> 
> tcp_v4_rechecksum: c29a4510 => f5be936d
> tcp_v4_rechecksum: c29a4110 => 9e64904b
> tcp_v4_rechecksum: c29a4d10 => 9ec21e00
> tcp_v4_rechecksum: a2967359 => 98647d8b
> tcp_v4_rechecksum: c29a4710 => ba338715
> tcp_v4_rechecksum: 6e86feae => 611d0c18
> tcp_v4_rechecksum: c29a4510 => 73b5b42e
> tcp_v4_rechecksum: c29a4b10 => c2af117d
> 
> FWIW, medusa is back on 2.2.13 or something; it shows some of these
> messages too, but apparently less often.  And this time, most of the
> numbers are identical in the "before" and "after" case:
> 
> tcp_v4_rechecksum: 0 => 0
> tcp_v4_rechecksum: 0 => 0
> tcp_v4_rechecksum: 0 => 0
> tcp_v4_rechecksum: 0 => 0
> tcp_v4_rechecksum: 0 => 0
> tcp_v4_rechecksum: 3aa8dc8e => 3aa8dc8e
> tcp_v4_rechecksum: ab340fb3 => ab340fb3
> tcp_v4_rechecksum: 90dd61d7 => 90dd61d7
> tcp_v4_rechecksum: 90dd61d7 => 90dd61d7
> tcp_v4_rechecksum: 90dd61d7 => 90dd61d7
> tcp_v4_rechecksum: cc9d6d4d => cc9d6d4d
> tcp_v4_rechecksum: cc9d6d4d => cc9d6d4d
> 
> So, what's to be done - any suggestions?  I'd rather not upgrade to a
> 2.4 kernel at the moment.
> 
> Anyone else on debian-arm seeing this kind of effect?
> 
> p.
> 
> 
> -- 
> To UNSUBSCRIBE, email to debian-arm-request@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> 



Reply to: