Re: speeding up installs
On Thu, Jun 13, 2019 at 05:04:34PM +0200, Guus Sliepen wrote:
> On Fri, Jun 07, 2019 at 07:29:49PM +0200, Adam Borowski wrote:
>
> > I care about two use cases:
> > * boxes with HDDs or SD cards
> > * datacenter VMs, buildds
> [...]
> > No, there's no such thing as a 1-way machine that can
> > install a modern distro anymore[3]: oldest machine I own, a non-NX Pentium4,
> > is already -j2; when 3 years ago I needed the cheapest possible box with
> > • USB, • local storage, • ethernet; it had 4 cores and 512MB RAM. Non-SMP
> > is dead and buried, forget about ever optimizing for that.
>
> Non-SMP is pretty alive when it comes to VM guests. So if you claim you
> care about that usecase, please do optimize for that as well.
That's -j1 which is a degenerate but valid case. It won't run any slower
than current dpkg does.
> > * let's not care about power loss during install. So no fsyncs, and no
> > writing a single byte that's going to be overwritten later. Do a global
> > sync() only when entering grub-install.
>
> With KVM installs, I usually configure it to use unsafe IO, which
> basically has the same effect as eatmydata. If the installation was
> succesful, I can switch the IO mode back to something reliable. This
> indeed makes a huge difference in install speeds.
Yeah but even with eatmydata it's pointless to write the whole status file
after every step, then sometimes parse it back.
> > * being able to unpack in parallel also means you don't need to care about
> > order: install can go before apt-download has finished. This is awesome
> > when your mirror has a slower link than that 10Gb... We can install
> > package X the moment apt has fetched it even though it's still downloading
> > packages Y and Z.
> > (Nb: what's a good way to know apt is done? I screen-scrape
> > -oDebug::pkgAcquire looking for "Dequeuing" which is a nasty hack.)
>
> We already know before downloading packages what their dependencies are,
> so we can order the download such that the ones with the least
> dependencies are downloaded first, and so on. This will allow starting
> to install stuff while downloading other packages in a safe way.
Good idea but I already decided to ignore dependencies altogether, to
further improve parallelizing unpack. Ordering could improve apt+dpkg for
upgrades, though.
> It might be interesting to create a bootgraph-like chart of the
> installation process, to identify the actual bottlenecks and potentials
> for parallelization. Maybe we already have such a tool?
The current graph looks pretty linear. You can just timestamp debootstrap /
apt messages.
> > So... any comments so far? Any hints how to cheat the configure step?
>
> If two packages don't (reverse-)depend on each other in some way, how
> safe is it to configure them in parallel?
Alas, not at all -- postinsts assume they have exclusive control, dpkg
errors out instead of waiting when a lock is taken, etc.
Roughly 6-in-7 packages have nice ENOENT-compliant postinst, but that's
about the only case that can be cleanly parallelized. Anything else would
require some heurestics and/or manual review.
For example, update-alternatives calls apparently require serializing upon a
single lock, but are parallelizeable with anything else.
A big majority of postinsts in general apply only to upgrades, but most are
written by hand, with more unique snowflake cases than my socks drawer.
There's quite a bit of "# Automatically added by dh_ponies" stanzas, but a
cursory look shows they're accompanied by manual parts often enough to not
warrant even automating them away. Even worse, while manually written
postinsts stay unchanged in subsequent uploads, eg. dh_installdeb inserts
its version number, making the builds non-reproducible.
My current plan is to do the hard work manually, and store data about
elideable postinsts by their mangled hash, such as: deadbeef -> ignore (for
completely skippable) or deadbeef -> Lock: alternatives, to mark scripts
that need to be run exclusively within their group but without regard to
anything else.
Meow!
--
How to squander your resources: those silly Swedes have a sauce named
"hovmästarsås", the best thing ever to put on cheese, yet they waste it
solely on mere salmon.
Reply to: