[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: dpkg --smallmem has a larger footprint than --largemem



Ian Jackson wrote:
> 
> Ben Collins writes ("Re: dpkg --smallmem has a larger footprint than --largemem"):
> > Simply calling stat() on all those files is a real performance jerk.
> 
> This is because the .../info directory is so large - because it
> contains at least one file (and usually more) per package.  This is
> fine for the maintainer scripts, only a few of which are touched for
> each thing dpkg does.  But it's not so good for the .list files.  It
> used to be OK when we only had a few '00 packages :-).
> 
> So I think this problem will go away if we amalgamate several packages
> file listings' into one file, provided we put the new amalgamated
> *.list files in a different directory to all the maintainer scripts.

Could they all be amalgamated into *one* locatedb style file? I realize we
would have to add "which package this file belongs to" information, but
locate seems to be usably fast for arbitrary sub-string searches as it
stands.

> Transition to this new on-disk format, in a way where the whole thing
> doesn't go wrong if dpkg somehow dies halfway through, will be fun.

I suggest keeping the .list files (although compression seems a likely
enhancement) but only regenerate the whole derived locatedb style database
when any of the *.list files change.  Consistency checks could be run
offline regularly for the paranoid and the fallback for detection of
database corruption would be to rebuild it from the .list files.

> Also, of course, the new format will break all those programs that
> parse the *.list files directly like they're not supposed to (though I
> suppose we *could* keep them up to date too).

Compressing the *.list files would kill this for sure :)  Leaving them
uncompressed is an (expensive?) option.

	Albert.

--
Albert den Haan, Lead Developer @ Linux Port Team . Corel Corporation
albertd@corel.com  (613) 728-0826 x 5318



Reply to: