[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Adding steps to debian-installer before downloading debconf



Hello,
I asked a similar question last week on debian-user, but I realize now that this is probably a better list to try.  I apologize if this is the wrong place for this question; I notice most posts seem to be about pull requests.

tl;dr: I want to add some custom scripts and/or programs to the debian installer that execute after it configures networking, but before it attempts to download the debconf file. 
I've had limited success by injecting my program into the initrd image  that pxeboot downloads from our local tftpboot server, replacing the existing /bin/netcfg binary, but this causes other problems.  Not insurmountable ones, but ones I'd rather not have.
I've tried tracing init scripts that get called by busybox and then debian-installer, but most seem to call precompiled binaries. I've tried digging through their source codes, but I'm not a C programmer, or even a software engineer.  I'm a sysadmin tasked with finding generalized solutions for annoying, esoteric problems in our cluster.
The solution need not be an atomic step in the menu; the only requirement is that it happens after the netcfg detects the network hardware, and before debian-installer attempts the download the preseed/url specified in the pxeconfig file.

Long version:
I work with a production cluster with a large number of eccentricities with regards to the networking configuration on any given host.  This manifests as DHCP failing on any host where the interface where the link to a dhcp server is anything but the first one the kernel configures.

While playing around with it manually, I found that if I either let DHCP fail on the host, or disable networking autoconfig entirely in the pxeconfig file, I can get an IP address assigned by executing `dhclient` in a shell on the host. I can then select 'download the preconfiguration file' manually and everything will work without issue from there. 
This tells me that it should be possible to automate this recovery scenario, as long as I can find the code that defines the order in which the steps are executed.  My hope is that I'll be able to inject scripts/custom programs into the initrd image or modify existing scripts to add the functionality I need. The wiki for debian-installer declares that it was designed for modularity and customizability so that it can work on any hardware, no matter how esoteric. Most of the documentation I've found for modifying debian-installer though seems to center around adding drivers.
My first attempt was to modify /init in the initrd to get a DHCP lease before `busybox init` is executed.  The problem with this is that even if I set `netcfg/enable boolean false` in the pxeconfig, this lease is wiped out when netcfg detects networking hardware. But it was able to get a lease and resolve internal addresses before netcfg executed. I've also been able to reliably solve this problem with custom programs injected into the initrd that I execute when the problem happens, but that requires manual intervention, and is not something we consider acceptable for a permanent solution.

We've tried a number of solutions to this problem that have been suggested in previous posts: separate links for imaging and application traffic, manually setting the interface name to use for dhcp in the pxeconfig with `interface=fooX`, disabling unneeded interfaces in the system BIOS on affected hosts, and manually configuring the network interactively in the event that DHCP fails, and disabling all network devices on the host and installing our own NIC.  Unfortunately, these solutions either will not work universally for every host we have in our cluster, cost too much to be scalable, or require more hands-on attention than we are willing to dedicate.  We've considered replacing our imaging system with something that doesn't have this problem, but the work involved in that is not trivial. It may be the route we go if I can't find a solution within our current system.

The distros that we are currently installing are ubuntu trusty and bionic, as well as debian stretch.  All 3 (and every ubuntu distro we've supported in the past) show the same behavior.

I appreciate any support, and apologize for both the length of this question.  I do humbly and politely request that suggestions remain in the confines of what I'm asking for.  I know that when presented with a weird problem where the proposed solutions are unorthodox or not immediately apparent, the urge is to find ways to work around solving the problem.

Reply to: