Finishing deprecation of isc-dhcp-client in ifupdown*
Hi Debianites,
tl;dr: ifupdown + dhcpcd-base in Trixie is broken in all but the most
defaultest configurations, consequently isc-dhcp-client deprecation is a
wash. It's a mess. Lets fix it.
Proposed ifupdown unstable upload:
  https://salsa.debian.org/debian/ifupdown/-/merge_requests/28
After a comprehensive review of the situation I see the following problems
with ifupdown + dhcpcd-base:
 - [Independent control] over IPv6 and IPv4 operation from ifupdown is
   broken. A single `inet` stanza will enable both, so `inet6 static` also
   does SLAAC/DHCPv6.
 - Non-deterministic [daemon conflics] between ifup's dhcpcd instances and
   dhcpcd.service may cause boot or renew problems.
 - dhcpcd's [home grown IPv6 RA/SLAAC] client defies ifupdown assumptions,
   has [missing RA features] and integration holds [privacy surprises].
 - interfaces(5) [settings not respected]: privext, autoconf, accept_ra, dhcp.
I'm working on an ifupdown upload to fix most problems in unstable, but we
also need to figure out what to do about Trixie and how much of the fixing
should be ifupdown integration glue (which ened up being a good chunk of
hairy code) vs dhcpcd patches (which would be **much** easier).
## What happened so far?
As a reminder: starting with Trixie dhcpcd-base is now Priority:important
(#1038882) instead of isc-dhcp-client. AFACT this only affects new
installations, not upgrades so we've split the userbase.
The ifupdown in Trixie and unstable still prefers dhclient over dhcpcd as
#1006263 was never acted on. Therefore systems upgraded to Trixie continue
to use dhclient unless the user manually replaced the isc-dhcp-client
package based on advise from release-notes ("The isc-dhcp suite is
deprecated upstream.").
## Users facing problems
Since we released Trixie at least some users have actually attempted to
switch to dhcpcd-base. Reports indicate a pattern of running into at least
the [independent control] problem:
  - ifupdown: Stateful DHCPv6 with dhcpcd-base causes ifup to fail
    (#1110741)
  - ifupdown: Stateless DHCPv6 not working with dhcpd-base
    (#1110071)
  - ifupdown: dhclient not started after upgrading to Trixie
    (#1115725 msg#15).
Before release we already had
  - ifupdown: ifup/down fail on inet6 interfaces with auto/dhcp method when
    using dhcpcd (#1065085)
Unfortunately nobody uploaded the patch that was included.
## Implementation Context
Before diving into details let's set the scene for those unfamilliar with
these packages: The dhcpcd-base package provides the /usr/bin/dhcpcd
binary, a combined DHCP client daemon and corresponding control interface,
but no system service. ifupdown calls this binary when brining up/down an
interface (see inet*.defn source files),
A dhcpcd call will either connect to a running instance or start a fresh
one. It has three daemonised modes of operation with separate control
sockets: per-interface, per-interface-and-address-family and global
"manager" mode. Several per-iface* and one global manager instance can end
up running at the same time. A running per-iface instance shadows the
global manager for control calls involving that interface.
Currently ifupdown's dhcpcd integration uses per-iface instances, but when
a global manager is running it connects to it instead see [daemon
conflicts].
Each dhcpcd daemon can run with dual-stack (the implicit default),
ipv6-only (-6) or ipv4-only (-4) enabled. The set of enabled address
families cannot be changed once a daemon is running. This is important in
the global manager case, however with per-iface instances distinct daemons
handling 6/4/dual-stack can be started.
The dhcpd *package* additionally provides a system service that will start
dhcpcd in dual-stack global manager mode at boot. It will bring UP most
interfaces automatically with IPv4 DHCP, IPv4-LL and IPv6 SLAAC and/or
Stateful DHCPv6 as appropriate, but regardless of ifupdown configuration or
whether a per-iface instance is already running(!), see [daemon conflicts].
## Problem [Independent control]
Currently ifupdown calls dhcpcd with dual-stack commands for the `inet`
stanza. The `inet6` stanza otoh doesn't support dhcpcd at all. A
/etc/network/interfaces that was working with dhclient such as:
    iface eth0 inet dhcp
    iface eth0 inet6 dhcp
Will fail to ifup with dhcpd-base as `inet6 dhcp` doesn't support dhcpcd
and it will bail with "No DHCPv6 client software found!".
Not all users hit this problem as debian-installer is only capable of
generating `inet6 auto` stanzas currently, and since dhcpcd's defaults
(somewhat) match what you'd expect from that its not obviously broken.
However this also implies ifupdown users have no control over whether IPv6
is brought up or not other than by fiddling with /etc/dhcpcd.conf.
What's more concerning this /etc/network/interfaces will bring up
SLAAC/DHCPv6 in addition to static addressess:
    iface eth0 inet dhcp
    iface eth0 inet6 static
        address 2001:db8::1/64
I doubt that's what anybody expects from this config.
All this is clearly completely broken, it only superficially seems to work
as our default config is dual-stack with `inet6 auto` which is what dhcpcd
does by default.
Action [dhcpcd per-af]: I'm switching ifupdown to calling dhcpcd such that
address families are handled independently.
## Problem [daemon conflicts]
The global manager started by dhcpcd.service has interface autodiscovery
enabled. If it starts after a per-iface instance is already running the
manager will also try to configure this interface i.e. we can end up with
two DHCP daemons fighting over an interface. Clearly completely broken.
In unstable dhcpcd.service starts After=networking.service (ifup) meaning
it triggers this conflict reproducibly. In Trixie no Before/After on
networking is declared so the conflict will happen non-deterministically --
oh joy.
The design space of how to fix this is complicated. I've explored most of
it and the current implementation is version four or so ;-).
The main challanges are 1) --waitip doesn't always work, 2) instance
selection is fickle, 3) changing instance type needs special
handling.
Brave souls wanting to know the details of all the frustrating quirks may
consult my long string of upstream issues:
https://github.com/NetworkConfiguration/dhcpcd/issues?q=is%3Aissue%20author%3ADanielG
Long-term plan [dhcpcd interface conflict patch]: All we really need to do
to fix this properly in dhcpcd is ignore interfaces that already have a
daemon running during discovery. My code review suggests this would be a
simple change of a couple of lines.
A dhcpcd patch could be suitable for Trixie even, but so far dhcpd's
maintainer has refused to consider downstream changes -- else I'd have
fixed this well before Trixie already :-]. Truth be told I'm pretty annoyed
I had to spend more than a full blown focused work week on designing,
implementing, documenting this hacky fix on the ifupdown side instead of
spinning a 10 LoC dhcpcd patch in half an hour, but here we are. Ugh.
Anyway.
Action [use global dhcpcd from ifupdown]: For now the ifupdown centric
solution I came up with is to switch from per-interface instances to using
the global manager (starting it from ifup as needed). This way ifupdown and
dhcpcd.service interoperate nicely though the glue code need to make it
work is substantial.
## Problem [home grown IPv6 RA/SLAAC]
With dhclient we used to use the kernel's RA/SLAAC implementation. When
dhcpcd starts IPv6 on an interface it disables these in favor of its own
userspace implementation.
Problem is ifupdown currently assumes it can control the kernel
implementation using several sysctls which dhcpcd doesn't respect. Looking
at dhcpcd's git history it seems it used to respect these years ago but not
anymore.
dhcpcd sets with the following sysctls:
    net.ipv6.conf.$IFACE.addr_gen_mode=1  (disables auto link-local address)
    net.ipv6.conf.$IFACE.autoconf=0   (disables SLAAC)
    net.ipv6.conf.$IFACE.accept_ra=0  (disables picking up (default) routes)
Naturally it doesn't clean these up when releasing an interface so if you
run into fun when you switch to static addressing at runtime. Gah.
Obviously none of this works corretly anymore, but we can fix it with some
more glue code :-).
## Problem [settings not respected]
Action [dhcpd.sh wrapper]: To fix the impedance mismatch between ifupdown
settings and dhcpcd I implemented a wrapper (dhcpcd.sh) that generates
dhcpcd cmdline options.
This works out well, except one particular configuration simply has no
equivalent in dhcpcd settings: accept_ra=1.
What this should do is only allow picking up routes on interfaces that
aren't configured for forwarding. Right now my implementation errors in
this case but since it's the default for `inet6 dhcp` that's not very
satisfactory.
Question [silently ignore accept_ra=1]?: Maybe we should accept the
misconfiguration risk and silently do the equivalent of accept_ra=2 here
instead? I've filed an upstream bug about it in any case:
https://github.com/NetworkConfiguration/dhcpcd/issues/540
## Problem [privacy surprises]
In addition to RA/SLAAC options interfaces(5) also supports the `privext`
option. Naturally this is also ignored by our current dhcpcd integration.
Since the /etc/dhcpd.conf default to enable RFC 7217 privacy addressing
(`slaac private`) is incongruent with the interfaces(5) default of
privext=0 we cause nasty surprises for upgrading users that have hardcoded
EUI/MAC based SLAAC addressess into DNS.
## Problem [missing RA features]
Action: [Avoid dhcpcd in inet6/auto]: I think some users may need features
only available in the of the kernel RA implementation (I do) so I'm
including an escape hatch: dhcpcd is not used for `inet6 auto` unless
dhcp=1 (non-default).
Note: The noipv6rs option isn't good enough here as it will still
addr_gen_mode=1 preventing link-local assignment entirely. There is an
upstream PR for this one at least:
https://github.com/NetworkConfiguration/dhcpcd/pull/380
## Next steps
As things stand dhcpcd integration with ifupdown has many problems. I'm
planning to make an ifudown upload to unstable if there are no major
objections. There are still a couple of TODOs (prefix delegation, RFC 7217,
accept_ra=1 risk assessment) but the code should be good enough for
unstable.
That leaves us to think about what to do about current and future Trixie
users: Retract the isc-dhcp-client deprecation? Fix it all in a stable
update? Leave it broken-as-is and leave extra-fun^TM transition problems
for Forky? Something inbetween?
Thoughts?
--Daniel
Reply to: