[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug reporting script



Bill Mitchell writes:
>richard@elmail.co.uk (Richard Kettlewell) said:
>>Bill Mitchell writes:

>> > [package overhead file with name:mode:md5sum:size of package files]
>> I don't think that's right - all the information required is already
>> in the package in the form of the files themselves; why duplicate it?
>> If it's going to be anywhere it's much more sensible to generate it at
>> install time.  There's no sense in bloating .deb files more than is
>> necessary.
>
>I argue that the info is useful. If that point is granted, one question
>which arises is whether to generate the info at package install time or
>at package build time.  I'd argue for generating it at build time as a
>size vs. time tradeoff to allow a fairly fast operation to extract the
>pre-packaged info vs. a series of fairly slow operations per-package
>to generate it on the fly.  Also, some of the info, size info in
>particular, would likely be mose useful at package browse time while
>the user is deciding whether he wants to unpack the packages.

Point to note: the only thing that is slow to extract from a .deb file
if it isn't stored separately is a file signature.  Everything else
can be squirted out fast enough by dpkg --contents.

If the decision were taken *not* to store this information in
/var/lib/dpkg/info - as is currently the case - then there would be
little point in changing the deb format to add in signatures for all
files; the audit operation under discussion will be suffciently rare
that perormance is not an issue, and tar can do most of the work for
us.

OTOH if the decision were taken to include this information in
/var/lib/dpkg/info, I remain unconvinced that calculating it on the
fly would be a serious performace hit.  Permissions, sizes and
ownerships are easy; md5sum involves reading the whole file but it we
do it just after it has been extracted, it will mostly be in RAM - so
the operation will be fast.  We'd have the further advantages of being
able to take advantage of the new feature without having to change any
existing packages other than dpkg repair.

>I'm guessing that a package file with this info would be about double
>the size of the current /var/lib/dpkg/info/<package>.list files.  For
>the packages which I`ve currently got installed, those files average a
>bit over 3K per package.  Compressed with gzip -9, they'd avarage a bit
>over 500 bytes per package.

As an experiment, I md5sum'd all 16000 or so files mentioned in
/var/lib/dpkg/info/*.list.  (It wasn't a very quick operation, which
perhaps somewhat contradicts my assertion above.  More experimentation
is required.  Perhaps one could get accepable speed by writing code
which checksummed the file in the same loop as extracting it to disc.)

richard@muskogee:/var/lib/dpkg/info$ cat *.list | wc -c
 494379
richard@muskogee:/var/lib/dpkg/info$ du *.list | awk '{x+=$1}END{print x}'
588
richard@muskogee:/var/lib/dpkg/info$ cat *.md5 | wc -c
 970404
richard@muskogee:/var/lib/dpkg/info$ du *.md5 | awk '{x+=$1}END{print x}'
1053

That's from 153 packages; about 3Kb extra per package, just from the
md5sum.  Next I gzip'd all the *.md5 files (these contain both
filenames and signatures.)

richard@muskogee:/var/lib/dpkg/info$ cat *.md5.gz | wc -c
 329183
richard@muskogee:/var/lib/dpkg/info$ du *.md5.gz | awk '{x+=$1}END{print x}'
421

So we're look at something like 400-500Kb of disc space on a
reasonably set up machine; over twice that if we don't compress.  I
could cope with an extra meg of such stuff, but on a machine which was
short of disc space it might be quite irritating.

I don't know how much extra it would be with permissions and ownership
information; I'd guess at 10-20%.

>So, if the file is stored in the package in
>compressed form (reasonable, I think), we'd be talking about adding an
>average of about 1K per package.

I'm not clear what Ian's problem with compressing large chunks of the
dpkg database is, but I've been assuming he has some good reason.  I
propose to discuss this off-line this evening ;-)

>> Your suggestion also has the disadvantage that every package in the
>> distribution would need updating; whereas if the extra information is
>> generated at install time, all current packages will just work.
>
>I suggested that the package admin tool could recognize whether
>packages contained this info or not, and handle them appropriately.

Yes, one could do that.  Sorry I didn't spot it the first time round.

-- 
Richard Kettlewell
<richard@elmail.co.uk>                  http://www.elmail.co.uk/staff/richard/


Reply to: