[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#605009: serious performance regression with ext4


On Fri, 2010-11-26 at 16:52:54 -0500, Ted Ts'o wrote:
> On Fri, Nov 26, 2010 at 03:53:27PM +0100, Raphael Hertzog wrote:
> > Just to sum up what dpkg --unpack does in
> > 1/ set the package status as half-installed/reinst-required
> > 2/ extract all the new files as *.dpkg-new
> > 3/ for all the unpacked files: fsync(foo.dpkg-new) followed by
> >    rename(foo.dpkg-new, foo)

> What are you doing?

We already had this conversation some time ago in

> 1) Suppose package contains files "a", "b", and "c".  Which are you
> doing?


Anyway, dpkg is currently doing the variation on c that Raphaël
posted, including making backups so that it can rollback the entire
package if something goes wrong.

> (c) will perform the best for most file systems, including ext4.

Well it does not, and that's also what was mentioned in the bug
report. Something we've found out recently (as Raphaël mentioned too)
is that with nodelalloc the performance issues *and* the zero-length
issues disappear, which seems like a clear win to me, and so IMO
changing the default file system mount option to nodelalloc seems to
be the way to go.

> As a further optimization, if "b" and "c" does not exist, of course
> it would be better to extract into "b" and "c" directly, and skip the
> rename, i.e.:

> d)  extract(a.dpkg-new);
>     extract(b);			# assuming the file "b" does not yet exist
>     extract(c);			# assuming the file "c" does not yet exist
>     fsync(a.dpkg-new);
>     fsync(b);
>     fsync(c);
>     rename(a.dpkg-new, a);
> ... and then set the package status as unpacked.

That would make possible for partial files to appear on their final path
and thus available for external use while they are being extracted. I
don't think that's a good idea.


Reply to: