[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Network services started before NIC UP.



On Wed, Dec 11, 2013 at 07:02:14PM CET, Bob Proulx <bob@proulx.com> said:

> 
> In that case we will need to keep peeling back layers until the root
> cause is found.  How are you starting the network?  Is this a section
> listed in /etc/network/interfaces?  Please show us the section.  Or is
> this using NetworkManager / wicd?  (If NetworkManager/wicd then there
> will be no section in /etc/network/interfaces for your network
> device.  No config there means that NM/wicd manages it.)

Everything in /etc/networkinterfaces.

It is a bit complicated let me explain the situation before going to configuration :

The machine has 2 IPv4 addresses and 1 IPv6 on same physical
interface.  IPv6 is fixed, but I must get a prefix delegation by a
dhcpv6 client for the prefix I get from hoster to be routed to the
machine (I use dibbler per the hoster doc, but I could you another one)

So here is /etc/network/interfaces

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
    address 88.190.17.120
    netmask 255.255.255.0
    gateway 88.190.17.1
    up ip addr add 88.191.245.121/32 dev eth0 label eth0:0
    up ip -6 addr add 2001:0bc8:30d3::1/64 dev eth0
    down ip addr del 88.191.245.121/32 dev eth0 label eth0:0
    down ip -6 addr del 2001:0bc8:30d3::1/64 dev eth0

88.190.17.120 is the "private" address (if I change server I will get
another address) 88.191.245.121 and 2001:0bc8:30d3::1 are the "public
addresses", becaus I may migrate them to another machine at same
hoster, making them more robust for public facing services (web email
and ntp server in pool.ntp.org for this one)

The router for IPv6 is given through the RA (I have the correct sysctl
set up for accepting teh RA *and* routing IPv6)

resolv.conf is 

search rail.eu.org
nameserver 127.0.0.1


unbound listen on loopback when it is started:

unbound 3048 unbound    3u  IPv4  11035      0t0  UDP 127.0.0.1:domain 
unbound 3048 unbound    4u  IPv4  11036      0t0  TCP 127.0.0.1:domain (LISTEN)
unbound 3048 unbound    5u  IPv6  11037      0t0  UDP [::1]:domain 
unbound 3048 unbound    6u  IPv6  11038      0t0  TCP [::1]:domain (LISTEN)



> To get ahead in the discussion I will suggest this excellent reference
> for understanding the /etc/network/interfaces file.
> 
>   http://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_basic_syntax_of_etc_network_interfaces

I know it thanks

> 
> Just for the purposes of debugging if you are using "allow-hotplug"
> then try switching that to "auto".  In theory allow-hotplug should
> always work but since it is the newer event driven method sometimes
> there are still bugs to be found.  It is possible that your case is
> one of those.  Try "auto" instead and see if that older start ordering
> causes things to work in the correct way.

I always use auto for fixed machines, like this server.

> The errors you showed in the log file were from dns name resolution
> failures.  How are nameservers configured for your machine?  Are you
> using DHCP to set them?  Or are they statically definited?  Are you
> running a local machine nameserver daemon such as bind9 or dnsmasq or
> other?  What is in the /etc/resolv.conf file?

I use 2 dns servers, on different IP addresses : NSD on public
addresses, authoritative for the rail.eu.org zone and
2001:0bc8:30d3::/48 reverse zone, unbound on loopback and
88.190.17.120 as recursive server for my small infrastructure

But the problem is not here. I realize that my choice of logs was
rather poor. Here is another excerpt that I will comment

Dec 10 18:21:24 tee kernel: [   14.722600] loop: module loaded
Dec 10 18:21:24 tee kernel: [   15.276191] bnx2 0000:02:00.0: firmware: agent loaded bnx2/bnx2-mips-09-6.2.1b.fw into memory
Dec 10 18:21:24 tee kernel: [   15.282623] bnx2 0000:02:00.0: firmware: agent loaded bnx2/bnx2-rv2p-09-6.0.17.fw into memory
Dec 10 18:21:24 tee kernel: [   15.282687] bnx2 0000:02:00.0: irq 49 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282699] bnx2 0000:02:00.0: irq 50 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282712] bnx2 0000:02:00.0: irq 51 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282718] bnx2 0000:02:00.0: irq 52 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282725] bnx2 0000:02:00.0: irq 53 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282732] bnx2 0000:02:00.0: irq 54 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282739] bnx2 0000:02:00.0: irq 55 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282746] bnx2 0000:02:00.0: irq 56 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.282752] bnx2 0000:02:00.0: irq 57 for MSI/MSI-X
Dec 10 18:21:24 tee kernel: [   15.347478] bnx2 0000:02:00.0 eth0: using MSIX
Dec 10 18:21:24 tee kernel: [   15.347685] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

-> link not ready : no IPv6

Dec 10 18:21:24 tee acpid: starting up with netlink and the input layer
Dec 10 18:21:24 tee acpid: 1 rule loaded
Dec 10 18:21:24 tee acpid: waiting for events: event logging is off
Dec 10 18:21:24 tee ntpd[2361]: ntpd 4.2.6p5@1.2349-o Mon May 20 14:24:35 UTC 2013 (1)
Dec 10 18:21:24 tee ntpd[2479]: proto: precision = 0.134 usec
Dec 10 18:21:24 tee ntpd[2479]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Dec 10 18:21:24 tee ntpd[2479]: Listen and drop on 1 v6wildcard :: UDP 123
Dec 10 18:21:24 tee ntpd[2479]: Listen normally on 2 lo 127.0.0.1 UDP 123
Dec 10 18:21:24 tee ntpd[2479]: Listen normally on 3 eth0 88.190.17.120 UDP 123
Dec 10 18:21:24 tee ntpd[2479]: Listen normally on 4 eth0:0 88.191.245.121 UDP 123
Dec 10 18:21:24 tee ntpd[2479]: Listen normally on 5 lo ::1 UDP 123
Dec 10 18:21:24 tee ntpd[2479]: peers refreshed
Dec 10 18:21:24 tee ntpd[2479]: Listening on routing socket on fd #22 for interface updates
Dec 10 18:21:24 tee ntpd[2479]: Deferring DNS for canon.inria.fr 1
Dec 10 18:21:24 tee ntpd[2479]: Deferring DNS for ntp.obspm.fr 1
Dec 10 18:21:24 tee ntpd[2479]: Deferring DNS for ntp.sceen.net 1
Dec 10 18:21:24 tee ntpd[2479]: Deferring DNS for clock.tix.ch 1
Dec 10 18:21:24 tee ntpd[2479]: Deferring DNS for ntp.proserve.nl 1
Dec 10 18:21:24 tee ntpd[2519]: signal_no_reset: signal 17 had flags 4000000

-> Here we see that ntpd is started before NIC is ready, and the IPv6
   address iis clearly absent

Dec 10 18:21:25 tee nsd[2521]: can't bind udp socket: Cannot assign requested address
Dec 10 18:21:25 tee nsd[2521]: server initialization failed, nsd could not be started

-> Same thing for nsd (it cannot bind to 2001:bc8:30d3::1 thus fails)

Dec 10 18:21:26 tee kernel: [   18.584193] bnx2 0000:02:00.0 eth0: NIC Copper Link is Up, 1000 Mbps full duplex
Dec 10 18:21:26 tee kernel: [   18.584196] , receive & transmit flow control ON
Dec 10 18:21:26 tee kernel: [   18.584281] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

-> There the link becomes ready, but too late...

Dec 10 18:21:26 tee ntpd_intres[2519]: host name not found: canon.inria.fr
Dec 10 18:21:26 tee ntpd_intres[2519]: host name not found: ntp.obspm.fr
Dec 10 18:21:26 tee ntpd_intres[2519]: host name not found: ntp.sceen.net
Dec 10 18:21:26 tee ntpd_intres[2519]: host name not found: clock.tix.ch
Dec 10 18:21:26 tee ntpd_intres[2519]: host name not found: ntp.proserve.nl

-> Here we see that no DNS yet : unbound did not start 

Dec 10 18:21:27 tee kernel: [   19.684249] Netfilter messages via NETLINK v0.30.
Dec 10 18:21:27 tee kernel: [   19.686807] ip_set: protocol 6

Dec 10 18:21:30 tee ntpd[2479]: Listen normally on 6 eth0 2001:bc8:30d3::1 UDP 123
Dec 10 18:21:30 tee ntpd[2479]: Listen normally on 7 eth0 fe80::be30:5bff:fecf:8bcb UDP 123
Dec 10 18:21:30 tee ntpd[2479]: peers refreshed

-> IPv6 is now on, ntpd adapts...


Dec 10 18:21:33 tee unbound-anchor: /var/lib/unbound/root.key has content
Dec 10 18:21:33 tee unbound-anchor: success: the anchor is ok
Dec 10 18:21:33 tee unbound: [3048:0] notice: init module 0: validator
Dec 10 18:21:33 tee unbound: [3048:0] notice: init module 1: iterator
Dec 10 18:21:33 tee unbound: [3048:0] info: start of service (unbound 1.4.21).

-> Recursive DNS resolver starts now...


In parrallel lets see the logs for the DHCPv6 client :

2013.12.10 18:21:24 Client Notice    Detected iface eth1/3, MAC=bc:30:5b:cf:8b:cc.
2013.12.10 18:21:24 Client Notice    Detected iface eth0/2, MAC=bc:30:5b:cf:8b:cb.
2013.12.10 18:21:24 Client Notice    Detected iface lo/1, MAC=00:00:00:00:00:00.
2013.12.10 18:21:24 Client Notice    Parsing /etc/dibbler/client.conf config file...
2013.12.10 18:21:24 Client Critical  Interface eth0/2 is down or doesn't have any link-local address.
2013.12.10 18:21:24 Client Critical  Fatal error during CfgMgr initialization.

See it starts at 18:21:24 2 s before link is on, and fails : The IPv6
network is thus not routed to me...


So the problem is clearly that several network services (ntpd, nsd,
dibbler-client at least) are started *before* interface is
completely up.


Thanks for reading this long message, but I think this logs
examination is important to understand what happens.


Reply to: