[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ssh hangs for 5 seconds for a particular machine



Vincent Lefevre wrote:
> Vincent Lefevre wrote:
> > Bob Proulx wrote:

> nameserver 140.77.1.32
> nameserver 140.77.167.2

That is a second potential source of timeout.

  man resolv.conf

       nameserver Name server IP address
              Internet address of a  name  server  that  the  resolver  should
              query,  either  an  IPv4  address  (in dot notation), or an IPv6
              address in colon (and possibly dot) notation as  per  RFC  2373.
              Up  to  MAXNS  (currently 3, see <resolv.h>) name servers may be
              listed, one per keyword.  If there  are  multiple  servers,  the
              resolver  library queries them in the order listed.  If no name-
              server entries are present, the  default  is  to  use  the  name
              server  on  the  local machine.  (The algorithm used is to try a
              name server, and if the query times out, try the next, until out
              of name servers, then repeat trying all the name servers until a
              maximum number of retries are made.)

Two are listed.  All fine.

              timeout:n
                     sets the amount of time the  resolver  will  wait  for  a
                     response  from  a  remote name server before retrying the
                     query via a different name server.  Measured in  seconds,
                     the default is RES_TIMEOUT (currently 5, see <resolv.h>).
                     The value for this option is silently capped to 30.

The default timeout is 5.  DNS lookups will query the first nameserver
listed.  It will wait until that nameserver responds.  If 5 seconds
elapses then it will fall through to the next nameserver.  If the
second nameserver responds but the first one does not then it will
result in a 5 second delay for every DNS lookup.

              attempts:n
                     sets the number of times the resolver will send  a  query
                     to  its  name  servers  before giving up and returning an
                     error  to  the  calling  application.   The  default   is
                     RES_DFLRETRY  (currently  2,  see <resolv.h>).  The value
                     for this option is silently capped to 5.

If both nameservers are down then it will cycle back to the first one
again and repeat.  The overall timeout is dependent upon the number of
nameservers listed.  The timeout for the second round of queries is
ten seconds divided by the number of nameservers configured.  With two
retries the overall timeout will be 60 seconds if both are down.

> > >   grep "^hosts" /etc/nsswitch.conf
> > 
> > hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4

I see mdns there and immediately have an immune reaction due to many
problems with it before.  As you have already determined it is the
source of your current 5 second timeout.  I recommend purging
libnss-mdns from your system.  That will solve your problem.

Upon purge of libnss-mdns the package postrm will clean up the mdns
entry from /etc/nsswitch.conf.  Alternatively you can leave the
package installed but remove that configuration from the file.

  hosts:          files dns

The above config will return the system to fast operation.

> According to
> 
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=414569
>   "avahi-daemon: delay on resolving IP addresses when mdns is
>   specified in /etc/nsswitch.conf"
> 
> the 5-second delay shouldn't occur, as this bug was fixed with the
> changelog:
> 
>    * On new installations, do not add "mdns4" to nsswitch.conf, only
>      "mdns4_minimal [NOTFOUND=return]". This means we can't
>      perform reverse DNS using mDNS for addresses outside 169.254.x.x and
>      fe80::/10, but avoids a 5 second delay if such addresses do not
>      have a PTR record in DNS (Closes: #412714, #414569, #561622 for
>      new installations).
> 
> The "mdns4" in my /etc/nsswitch.conf file occurs after
> "mdns4_minimal [NOTFOUND=return]" so that I don't see why I get
> the delay.

You could probably debug it completely to root cause.  Wouldn't take
more than a few days at the most.  Dig through the source code and
compile debug versions.  But do you need it in your environment?
Removing it avoids the problem completely.  I always purge it from my
systems.

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: