Re: dpkg 1.15.6 is slow as hell
[ Colin, CCing you as I'm not sure you follow the list. ]
On Fri, 12 Mar 2010 15:57:28 +0000, Colin Watson wrote:
> I'm worried about the syncing changes though;
> apparently they're *really* *really* pessimal on some systems, e.g. ext4
> with data=ordered (which considers rename() as a barrier itself so the
> fsync() isn't necessary in that configuration). Scott James Remnant
> reported that it took over an hour to unpack a linux-headers-* package!
That's probably going to be really bad on buildds...
> I don't know what the right answer is here. On the one hand, not
> fsyncing kills reliability on some systems; on the other hand, fsyncing
> kills performance on other systems.
Yeah, it's a bit sad though that the changes to ext4 to accomodate
non-behaving applications penalize so much the ones that try to do the
On Fri, 2010-03-12 at 10:19:50 +0100, Raphael Hertzog wrote:
> On Fri, 12 Mar 2010, Sven Joachim wrote:
> > The decision to immediately fsync() all files written to disk has a
> > detrimental influence on dpkg's performance, especially when unpacking
> > large packages. On my system which has a 2.5" hard disk with an ext4
> > filesystem, installing emacs23-common (containing 2123 files) with a hot
> > cache takes 76 seconds, almost all of which is spent during unpacking.
> > With dpkg 18.104.22.168, it takes 5.3 seconds, including processing three
> > triggers. This is really painful. :-(
> Not as bad here with ext3 but still worrying, taking gnome-icon-theme
> (6534 files) I get ~3 seconds for 22.214.171.124 and 12 seconds for 1.15.6.
> That's still a 300% increase.
Right, I didn't see much degradation on ext3, or I'd probably would have
considered doing alternative changes instead. I'll be testing on a
slower box with ext3 to see how it behaves there though.
> Removing the single fsync() added in tarobject() completely restores the
> original performance. Adding a single sync() after the whole unpack has
> way less impact (1 or 2 seconds more).
> Other possibility would be to use the loop afterwards to reopen all
> installed files and call fsync() on them. The disadvantage of sync() is
> obviously when unrelated disk activity happens in parallel to dpkg, it
> will have to wait more due to this.
Neither of those are good replacements, as the fsync() must be done before
the rename(), as we want the guarantee that there's always a valid file in
place in case of a crash, either the old or the new, which dpkg should be
able to discern and roll-back if needed on reexecution. Doing a sync
afterwards might only guarantee the package is not wrongly marked as
properly installed if there's a system crash, but that's it. And in such
case there's a high probability the files will be zero-length, which would
be pretty bad for example Essential packages. In addition POSIX does not
guarantee sync() will wait until the writes have finished (only Linux
seems to be doing that though).
A possible solution could be to do the unpack for all files in a package,
just leaving the new files as file.dpkg-new, not do either of fsync() or
replace, and with one pass afterwads fsync() and do the "atomic" (except
for dirs etc) replace. I guess this might improve a bit the situation for
packages with lots of files, but not sure how much.