[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug reporting script



Bill Mitchell writes:
>Richard Kettlewell <richard@elmail.co.uk>said:
>>>>Bill Mitchell writes:

>Yeah.  I had other uses for some of that info besides audit in the
>back of my mind.

Perhaps it would help the discussion if we knew something about them?

>>That's from 153 packages; about 3Kb extra per package, just from the
>>md5sum.  Next I gzip'd all the *.md5 files (these contain both
>>filenames and signatures.)
>
>This isn't all 'extra'.  The filename part of it is equivalent to
>what's currently in the .list files (or should be, except that my
>.list files add up to 325 kB for 97 packages today -- about 3 Kb per
>package.

I already subtracted the size due to filenames - it is all extra.

>Anyhow, I estimated that adding this other info would about double
>that.  `cat *list | wc -l` gives about 10K files listed in the .list
>files, a suprising average of 100 per package. a one-char field
>separator and 32 md5sum chars per file gives about 3K per package for
>md5sum. Adding field separators and data for filemode and filesize
>would add about another 800 butes or so per package to the average
>uncompressed .list file.

An md5sum can be stored in 16 bytes, of course; and one can't really
compress it beyond this.  (I should have done the previous experiment
with this in mind, in fact.  Silly me.)  MD5 may be overkill, a CRC
might be OK for recording signatures of all the files in a package.

>Another 20 bytes for file to store group,user ownerships would add
>another 2K per .list file.  Since most files in a package have the
>same group,user owners However, this could be substantially reduced
>at the cost of a little complexity by not repeating this info for
>every file in the package, not including meanigless md5sum info for
>directories, possibly eliminating filesize info from individual
>files, and providing totals per directory (in the place of the md5sum
>info?), etc.  Lots of design options to be explored there.

What do you want file size information in addition to a signature for?
If the file gets truncated (or has junk added to the end) then the
signature will change anyway.

>So, we're looking at an increase from the current 3K per package to
>7-9K per package.  Increasing from around 300K to 800K on a sparse
>system, or perhaps triple those numbers for a pretty loaded system.
>All based on uncompressed .list file sizes, of course, and assuming
>that all the the info is added to each line of the .list file..

Apparently the problems with compressing .list files include 1) huge
numbers of extra calls to gzip will slow things down a great deal 2)
it adds too much extra scope for things to go wrong.  However, Ian and
I discussed a faster and simpler method of compression which could
easily be implemented as part of dpkg and would - for lists of files -
give pretty good results (perhaps comparable to gzip).  It might even
get done some time ;-)

-- 
Richard Kettlewell
<richard@elmail.co.uk>                   http://www.elmail.co.uk/staff/richard/


Reply to: