[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#45912: `ping', and name resolution in general, hangs



Package: libc6
Version: 2.1.2-3
Severity: critical

I'm not sure which package to report this against; libc6 is my best
guess.  Other relevant packages might be

Package: netbase
Version: 3.16-2

Package: kernel-image-2.2.9
Version: 2.2.9-2

  My network card driver is 3c59x:

    Sep 24 07:21:13 potato kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
    Sep 24 07:21:13 potato kernel: eth0: 3Com 3Com Boomerang (unknown version) at 0xb800,  00:50:04:1b:f6:df, IRQ 11
    Sep 24 07:21:13 potato kernel:   8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface.
    Sep 24 07:21:13 potato kernel:   MII transceiver found at address 24, status 182d.
    Sep 24 07:21:13 potato kernel:   Enabling bus-master transmits and whole-frame receives.

Here's the problem:

When I type `ping blarg.net' at a shell, `ping' hangs.  I expect it to display

	PING blarg.net (206.124.128.1): 56 data bytes
	64 bytes from 206.124.128.1: icmp_seq=0 ttl=62 time=25.7 ms
	...

Other name resolution also fails.  For example, Netscape hangs when
trying to visit web pages on machines other than mine.

I've never sat around and waited to see if `ping' eventually gets
unstuck; I've always given up and hit control-C after no more than
perhaps a minute.

In short, the entire network is completely unusable.

I'm using potato, which I installed by first installing slink from an
official CD-ROM, and then using `apt-get dist-upgrade' from 

	 http://http.us.debian.org/debian unstable main

This problem didn't always happen, although I don't remember exactly
when it started.  I know for certain that it didn't happen immediately
after I installed slink, nor did it happen immediately after I
upgraded to potato the first time.

I've also seen this problem on a different installation of slink (on
the same machine with the same hardware), but that problem
mysteriously went away, and I never reported it.

     * The exact and complete text of any error messages printed or
       logged. This is very important!

I haven't noticed any error messages -- certainly none at the shell on
which I ran `ping', and none in /var/log.

     * Exactly what you typed or did to demonstrate the problem.

As above.

     * A description of the incorrect behaviour: exactly what behaviour
       you were expecting, and what you observed. A transcript of an
       example session is a good way of showing this.

As above.

     * A suggested fix, or even a patch, if you have one.

Sorry!  But see the bizarre workaround involving `tcpdump', below.

     * Details of the configuration of the program with the problem.
       Include the complete text of its configuration files.

I don't think a particular program is at fault; if the problem is in
Debian at all, I assume it's in the resolver library, or perhaps in
the driver for the network card.  And I'm not aware of any
configuration files for either the library or the network card.

     * The versions of any packages on which the buggy package depends.

I don't believe libc6 or the net card driver depend on any package.

     * What kernel version you're using (type uname -a).

    Linux potato 2.2.9 #2 Fri Jun 4 23:14:38 EST 1999 i686 unknown

... but note that, as I explain above, I've seen the same problem on
slink, using kernel 2.0.36 with version 0.99E of the net card driver,
and libc6 version 2.0.7.

     * What shared C library you're using (type ls -l /lib/libc.so.6).

    bash-2.02$ ls -l /lib/libc.so.6
    lrwxrwxrwx   1 root     root           13 Sep 24 07:35 /lib/libc.so.6 -> libc-2.1.2.so

     * Any other details of your Linux system, if it seems appropriate.
       For example, if you had a problem with a Debian Perl script, you
       would want to provide the version of the `perl' binary (perl -v).

I connect to the Internet via DSL, using a Cisco 675 router, which is
a little grey box that sits on the floor.  I have a phone cord that
connects the router and my phone jack; I have an Ethernet cable that
connects the router and my network card.

The router is quite configurable, and perhaps its configuration is
relevant: 

* I've got it set to act as a DHCP server, although since I don't know
  how to make Debian use DHCP, I've told Debian to use a static IP
  address.  Since I only have one computer, there is no risk of having
  two IP addresses conflict.

* It's doing something called `network address translation', which, as
  I understand it, means that my machine "appears" to the outside
  world to have a different IP address than what the machine thinks.
  That is (as you can see below in my network configuration files), my
  machine thinks its IP address is 10.0.0.2, but the outside world
  uses 206.124.128.30 (that address might change from time to time,
  because the router might be a DHCP client of my ISP).  Also, if I
  were to connect other machines to the router (with an Ethernet hub),
  they would get IP addresses like 10.0.0.3, 10.0.0.4, etc.; but they
  would *all* appear to the outside world as 206.124.128.30.  It would
  appear that this would cause total confusion, but it doesn't;
  somehow this `network address translation' keeps things from getting
  confused.  I don't understand how it does this, but it seems to work
  OK.  (The place I work used to have a similar setup; they had five
  machines connected to the Internet, all "sharing" an outside IP
  address; the machines all worked fine.)  The one tradeoff that I
  know of is that nobody in the outside world can connect to any
  servers that I run, because the network address translation
  apparantly futzes with port numbers.  For example, my SMTP server
  listens on port 25, but someone who tries to connect to that port
  using my outside IP address 206.124.128.30 won't be able to.
  Presumably, if they could guess the port to which the router has
  "mapped" port 25, they could connect to that port.

  There may be some more information about the configuration of this
  box that is relevant.  Please feel free to ask.

Perhaps some of the following network configuration files are
relevant:

/etc/resolv.conf:
    nameserver 206.124.128.1
    nameserver 206.124.128.3

/etc/hosts:
    127.0.0.1	localhost loopback
     10.0.0.1	cisco-router
     10.0.0.2	potato

/etc/init.d/network:
    #! /bin/sh
    ifconfig lo 127.0.0.1
    route add -net 127.0.0.0
    IPADDR=10.0.0.2
    NETMASK=255.255.255.0
    NETWORK=10.0.0.0
    BROADCAST=10.0.0.255
    GATEWAY=10.0.0.1
    ifconfig eth0 ${IPADDR} netmask ${NETMASK} broadcast ${BROADCAST}
    route add -net ${NETWORK}
    [ "${GATEWAY}" ] && route add default gw ${GATEWAY} metric 1

     * Appropriate details of the hardware in your system. If you're
       reporting a problem with a device driver please list all the
       hardware in your system, as problems are often caused by IRQ and
       I/O address conflicts.

    bash-2.02$ cat /proc/devices 
    Character devices:
      1 mem
      2 pty
      3 ttyp
      4 ttyS
      5 cua
      7 vcs
     10 misc
     12 tpqic02
     29 fb

    Block devices:
      1 ramdisk
      2 fd
      3 ide0
      7 loop
      9 md
     22 ide1
     36 ed

Oddly, the problem goes away if I run `tcpdump': I do

       tcpdump &
       ping blarg.net

and `ping' responds correctly.  I can then kill `tcpdump', and until
the next time I boot, the network works fine.  It's as if `tcpdump'
changed something, and that change allows name resolution to work.

**

I'd be happy to help debug this, by perhaps making some program run
more verbosely, and then reporting the output; or perhaps installing a
special debugging version of something, and trying it out.  Let me
know.


Reply to: