[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug reporting script



Richard Kettlewell <richard@elmail.co.uk> said:

> >>Bill Mitchell writes:
> 
> >> > [package overhead file with name:mode:md5sum:size of package files]
> 
> Point to note: the only thing that is slow to extract from a .deb file
> if it isn't stored separately is a file signature.  Everything else
> can be squirted out fast enough by dpkg --contents.

I did a quick test with dpkg-0.93.66 on 40 devel section packages using
an otherwise unbusy Gateway 2000 4DX2-66V.  The times for dpkg --contents
ranged from 0.20 seconds to 11.93 seconds, and averaged 2.08 seconds.

That may be acceptable, or not.  If a large number of packages are
processed, two seconds per package adds up.  If we're taking about
interactive use, 12 seconds for a large package is a long time.
If the information was pre-packaged at package build time, it could
be extracted quickly, and in a pretty uniform time per package (modulo
package file format).

> If the decision were taken *not* to store this information in
> /var/lib/dpkg/info - as is currently the case - then there would be
> little point in changing the deb format to add in signatures for all
> files; the audit operation under discussion will be suffciently rare
> that perormance is not an issue, and tar can do most of the work for
> us.

Yeah.  I had other uses for some of that info besides audit in the back
of my mind.

> OTOH if the decision were taken to include this information in
> /var/lib/dpkg/info, I remain unconvinced that calculating it on the
> fly would be a serious performace hit.  [...]
>
> That's from 153 packages; about 3Kb extra per package, just from the
> md5sum.  Next I gzip'd all the *.md5 files (these contain both
> filenames and signatures.)

This isn't all 'extra'.  The filename part of it is equivalent to
what's currently in the .list files (or should be, except that
my .list files add up to 325 kB for 97 packages today -- about
3 Kb per package.  Anyhow, I estimated that adding this other
info would about double that.  `cat *list | wc -l` gives
about 10K files listed in the .list files, a suprising average
of 100 per package. a one-char field separator and 32 md5sum chars
per file gives about 3K per package for md5sum. Adding field
separators and data for filemode and filesize would add about another
800 butes or so per package to the average uncompressed .list file.

> I don't know how much extra it would be with permissions and ownership
> information; I'd guess at 10-20%.

Another 20 bytes for file to store group,user ownerships would
add another 2K per .list file.  Since most files in a package have
the same group,user owners   However, this could be substantially
reduced at the cost of a little complexity by not repeating this info
for every file in the package, not including meanigless md5sum info
for directories, possibly eliminating filesize info from individual
files, and providing totals per directory (in the place of the md5sum
info?), etc.  Lots of design options to be explored there.

So, we're looking at an increase from the current 3K per package to
7-9K per package.  Increasing from around 300K to 800K on a sparse
system, or perhaps triple those numbers for a pretty loaded system.
All based on uncompressed .list file sizes, of course, and assuming
that all the the info is added to each line of the .list file..

> [...]


Reply to: