Speeding up dpkg, a proposal
I have recently been looking into where dpkg spends most of its time
when installing very many small packages, and came up with the following
idea to speed it up.
- Most of the time is spent writing files very carefully, a lot of them
- We can avoid this by writing the files less carefully (without fsync)
and even skipping the journal entries in /var/lib/dpkg/updates
- Instead, we move all packages that are to be unpacked into
half-installed / reinstreq before touching the first one, and put a
big sync() right before carefully writing /var/lib/dpkg/status.
[ There are more details to this than this, please check the code before
trying to find the holes in this short version of the idea.
This should be just as safe as writing very many small journal entries,
but if dpkg does get interrupted harshly, it leaves its database behind
in a correct but quite outdated and not so friendly state. Many
packages that have not been touched will have to be reinstalled because
dpkg can't be sure that they have in fact not been touched.
This should only happen when the system goes down abruptly without any
chance for dpkg to write a checkpoint and without unmounting the
filesystem cleanly. In any other case, such as a maintainer script
failing or the user interrupting dpkg with C-c, dpkg will write a
accurate checkpoint as the last thing it does.
I have experimental code for this here, based on dpkg 126.96.36.199:
It shows a speed up between factor six and two in our environment (ext4
on a slowish flash drive) . I am not sure whether messing with the
fundamentals of dpkg is worth a factor of two in performance, but I
still think the idea is sound and worth sharing here, if only to be shot