Re: dpkg --smallmem has a larger footprint than --largemem
Ian Jackson wrote:
>
> Ben Collins writes ("Re: dpkg --smallmem has a larger footprint than --largemem"):
> > Simply calling stat() on all those files is a real performance jerk.
>
> This is because the .../info directory is so large - because it
> contains at least one file (and usually more) per package. This is
> fine for the maintainer scripts, only a few of which are touched for
> each thing dpkg does. But it's not so good for the .list files. It
> used to be OK when we only had a few '00 packages :-).
>
> So I think this problem will go away if we amalgamate several packages
> file listings' into one file, provided we put the new amalgamated
> *.list files in a different directory to all the maintainer scripts.
Could they all be amalgamated into *one* locatedb style file? I realize we
would have to add "which package this file belongs to" information, but
locate seems to be usably fast for arbitrary sub-string searches as it
stands.
> Transition to this new on-disk format, in a way where the whole thing
> doesn't go wrong if dpkg somehow dies halfway through, will be fun.
I suggest keeping the .list files (although compression seems a likely
enhancement) but only regenerate the whole derived locatedb style database
when any of the *.list files change. Consistency checks could be run
offline regularly for the paranoid and the fallback for detection of
database corruption would be to rebuild it from the .list files.
> Also, of course, the new format will break all those programs that
> parse the *.list files directly like they're not supposed to (though I
> suppose we *could* keep them up to date too).
Compressing the *.list files would kill this for sure :) Leaving them
uncompressed is an (expensive?) option.
Albert.
--
Albert den Haan, Lead Developer @ Linux Port Team . Corel Corporation
albertd@corel.com (613) 728-0826 x 5318
Reply to: