[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Help debug : DNS failed when recover from suspend



Thanks you very much Etienne for your help ! I will try to give as
much precision as possible using your tips.

Le mar. 17 sept. 2019 à 23:13, Étienne Mollier
<etienne.mollier@mailoo.org> a écrit :
>
> Baptiste, on 2019-09-17:
> > I have two critical systemd services running on my clients :
> > -> "puppet" that ensure propagation of my whole network configuration.
> > -> "samba winbind" that allow users pam authentication and Name Service Switch.
> > These two services use DNS to find their services servers
>
> Hi Baptiste,
>
> > Any Idea where I can start to search ?
>
> DNS resolvers are listed in /etc/resolv.conf: have a look and
> see if the content is consistent with your proper configuration.
> Network components tend to conflict for the control of
> /etc/resolv.conf (dhclient, NetworkManager, the admin and its
> trusty "vi" editor, etc).  The program "resolvconf" can be
> installed to arbitrate this if necessary, although I've never
> used it for myself, yet.
>

My resolv.conf contain the following entries :

~# cat /etc/resolv.conf
domain my.domain.lan
search my.domain.lan.
nameserver 172.16.0.30

Everything seems correct. I have only one domain name server and the
other parameters correspond to my domain.
The stat command say that the file is accessed and modified regularly
(just after the suspend recover). But the content does not seems to
change. I have checked the content of the file as soon as possible
after host wake and I can't see any change.

If I change the file manually by removing the domain line for example.
The file is restored few minutes later.

I will give a try to resolvconf.

> > Anything that I can try to identify the origin of the problem ?
>
> If those particular services are critical, and assuming your IP
> addresses attributions are static, at least for these core
> components of your network, then maybe you will want to consider
> using the plain IP address instead of relying on the DNS
> resolver's availability.  At least, it would be worth trying
> this with a given machine, to see if services are starting
> correctly, or if you hit the next error message instead.
>

My IP addresses distribution for clients is not "really" static. I use
a DHCP server. But as lease times are not too short the host IP change
very rarely. But domain search parameters are given by the dhcp server
( see below ).


I use static IP for servers. So I can try your tricks with puppet. But
with samba, this will be difficult as many DNS entries are used for
the various active directory services.
Moreover I don't have only one domain controller. So I absolutely need
correct DNS resolution.

> [... rewinding ...]
> > But when the client recover from suspend these two services failed to
> > works until the next DNS query.
> >
> > -> Puppet give the following error :
> > puppet-agent[3312]: Failed to open TCP connection to puppet:8140
> > (getaddrinfo: Name or service not known)
>
> Make sure your search domains are present in /etc/resolv.conf,
> otherwise your machine will certainly not be able to resolve the
> name "puppet".
>

The "domain" and "search" parameters are presents in resolv.conf.
Maybe the "domain" line is useless. But this line is added
automatically by the dhcp client. Or maybe there is a misconfigured
option in my dhcp server :

~# cat /etc/dhcpd.conf
....
subnet 172.16.0.0 netmask 255.255.0.0 {

   option routers 172.16.0.1;
   option domain-name "my.domain.lan";
   option domain-search "my.domain.lan";
   option domain-name-servers 172.16.0.30;
....

> When you mention "suspend", is it after an actual hibernation ?
> There is a bug (more like poor wording in a startup message
> actually) in Debian 10.0 were the machine always seem to wake up
> from hibernation, which has been fixed in Debian 10.1.  It could
> be worth upgrading to clarify this point, if it is not already
> the case.

Yes I talk about a real "suspend" not hibernate. The computer restart
in only 2 seconds.
I use unattended upgrade so all my host are fully upgraded. I you're
right the startup message gone. My my users never shutdown the
computers as they suspend 5 second after logout to save power
consumption.

>
> Are affected machines mobile ones ?  If so, it could be caused
> by a complete change of network during the hibernation (while
> moving from home to the high school typically), and the resolver
> configuration was still the one from home somehow.
>

No there are not mobile stations. But I use DHCP server. So the host
IP can change. But I don't see any connectivity problem. I can ssh the
host 2 seconds after the wake.

> À plus,  :)
> --
> Étienne Mollier <etienne.mollier@mailoo.org>
> Fingerprint:  5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d
>
>

So with you help here my current check list :
-> Maybe a bug in the resolv.conf file access just after the suspend
recover. I need to find who is accessing the file and when. And why
this prevent DNS resolution working.

-> Maybe a bug in the systemd configuration files that awake service
in wrong order ? ( I will do soon a not related bug report to Debian,
puppet.service does not contain any "After=" line )

-> Maybe a bug in network-manager when the host receive a response
from the dhcp server. As the ip can change maybe this make DNS failed.
But NACK is not often sent. So it can't explain the problem
completely. The problem appear even if the IP does not change.

If someone have an idea !

Thanks again.
Baptiste.


Reply to: