[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: speeding up installs



On Thu, Jun 13, 2019 at 05:04:34PM +0200, Guus Sliepen wrote:
> On Fri, Jun 07, 2019 at 07:29:49PM +0200, Adam Borowski wrote:
> 
> > I care about two use cases:
> > * boxes with HDDs or SD cards
> > * datacenter VMs, buildds
> [...]
> >  No, there's no such thing as a 1-way machine that can
> > install a modern distro anymore[3]: oldest machine I own, a non-NX Pentium4,
> > is already -j2; when 3 years ago I needed the cheapest possible box with
> > • USB, • local storage, • ethernet; it had 4 cores and 512MB RAM.   Non-SMP
> > is dead and buried, forget about ever optimizing for that.
> 
> Non-SMP is pretty alive when it comes to VM guests. So if you claim you
> care about that usecase, please do optimize for that as well.

That's -j1 which is a degenerate but valid case.  It won't run any slower
than current dpkg does.
 
> > * let's not care about power loss during install.  So no fsyncs, and no
> >   writing a single byte that's going to be overwritten later.  Do a global
> >   sync() only when entering grub-install.
> 
> With KVM installs, I usually configure it to use unsafe IO, which
> basically has the same effect as eatmydata. If the installation was
> succesful, I can switch the IO mode back to something reliable. This
> indeed makes a huge difference in install speeds.

Yeah but even with eatmydata it's pointless to write the whole status file
after every step, then sometimes parse it back.

> > * being able to unpack in parallel also means you don't need to care about
> >   order: install can go before apt-download has finished.  This is awesome
> >   when your mirror has a slower link than that 10Gb...  We can install
> >   package X the moment apt has fetched it even though it's still downloading
> >   packages Y and Z.
> >   (Nb: what's a good way to know apt is done?  I screen-scrape
> >   -oDebug::pkgAcquire looking for "Dequeuing" which is a nasty hack.)
> 
> We already know before downloading packages what their dependencies are,
> so we can order the download such that the ones with the least
> dependencies are downloaded first, and so on. This will allow starting
> to install stuff while downloading other packages in a safe way.

Good idea but I already decided to ignore dependencies altogether, to
further improve parallelizing unpack.  Ordering could improve apt+dpkg for
upgrades, though.

> It might be interesting to create a bootgraph-like chart of the
> installation process, to identify the actual bottlenecks and potentials
> for parallelization. Maybe we already have such a tool?

The current graph looks pretty linear.  You can just timestamp debootstrap /
apt messages.

> > So... any comments so far?  Any hints how to cheat the configure step?
> 
> If two packages don't (reverse-)depend on each other in some way, how
> safe is it to configure them in parallel?

Alas, not at all -- postinsts assume they have exclusive control, dpkg
errors out instead of waiting when a lock is taken, etc.

Roughly 6-in-7 packages have nice ENOENT-compliant postinst, but that's
about the only case that can be cleanly parallelized.  Anything else would
require some heurestics and/or manual review.

For example, update-alternatives calls apparently require serializing upon a
single lock, but are parallelizeable with anything else.

A big majority of postinsts in general apply only to upgrades, but most are
written by hand, with more unique snowflake cases than my socks drawer.

There's quite a bit of "# Automatically added by dh_ponies" stanzas, but a
cursory look shows they're accompanied by manual parts often enough to not
warrant even automating them away.  Even worse, while manually written
postinsts stay unchanged in subsequent uploads, eg. dh_installdeb inserts
its version number, making the builds non-reproducible.

My current plan is to do the hard work manually, and store data about
elideable postinsts by their mangled hash, such as: deadbeef -> ignore (for
completely skippable) or deadbeef -> Lock: alternatives, to mark scripts
that need to be run exclusively within their group but without regard to
anything else.


Meow!
-- 
How to squander your resources: those silly Swedes have a sauce named
"hovmästarsås", the best thing ever to put on cheese, yet they waste it
solely on mere salmon.


Reply to: