autopkgtest-build-lxd failing with bionic
On Tue, Feb 20, 2018 at 10:44:42PM +0100, Martin Pitt wrote:
> Steve Langasek [2018-02-16 11:12 -0800]:
> > > > [ -n "$(ip route show to 0/0)" ]
> > > This is better though, and works too. Please take a look at the attached
> > > patch. Thanks! :-)
> > Actually no, this is racy, because the route comes up before DNS resolution
> > is in place.
> I'm not actually sure if network-online.target would actually guard against
> that with all implementations.
Then to be blunt, the definition of the target should be fixed in those
implementations so that it's not useless.
I understand and agree with the argument that modern services should be
robust in the face of intermittent networks. But I don't agree that
network-online is "legacy" only for sysvinit compatibility, or that its
definition is too mushy to be useful. For oneshot-style operations (such
as... things you want to do on a one-time basis on first boot of an
autopkgtest runner VM, without having to write a daemon around them that
listens to netlink), network-online.target is precisely the right semantic.
autopkgtest is *not* the only thing that cares about this. The problem
should be solved once, well, in the systemd network stack, not pushed onto
the consumers to repeatedly reimplement poorly.
> But in practice, in most cases you'll get DNS either via static
> configuration (in which case there's nothing further to wait for) or via
> DHCP (in which case your address and DNS solvers ought to arrive at the
> same time).
With systemd-networkd and systemd-resolved, we have genuinely seen races
in autopkgtests because the time between networkd applying the routes from
DHCP, and resolved applying the DNS settings from the same DHCP source, is
Maybe us catching this race points to missing optimizations; but the race
will always remain, the route is always going to be configured before the
DNS in this setup so if you're only watching for the route there is a race.
> And there's still the "apt retries several times" fallback (which is why I
> do see the initial apt failure, but the retry works).
But we have all the tools at our disposal to run apt at the /right/ time,
without polling or retrying, for maximum efficiency :)
> > It's also not forwards-compatible with ipv6-only deploys.
> Right now the container network config created by lxc/lxd/netplan assumes
> IPv4 only, so let's cross that bridge when we get to it. Indeed adding an
> alternative `ip -6 show...` would easily rectify that.
But any way you slice it, you're encoding network policy information in the
autopkgtest runner that is appropriately the domain of the network
configuration manager. You can't know, without evil introspection, whether
you're *supposed* to have default route on ipv4, ipv6, or both.
> > I think the network-online.target is the better thing to key on.
> I still don't like that much, though:
> - there is no requirement that this actually gets "implemented" or even
> started (it's a passive target)
Right, which is addressed by the explicit call to 'systemctl start'
(granted, not pretty)
> - it's supposed to be a SysV backwards compat shim for LSB's "network"
> dependency, and not well-defined