[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: On adding size info to Packages files [very long]



Charles Briscoe-Smith <cpbs@debian.org> writes:

> Hello everyone,
> 
> I've been working on the 'du -S' stuff over the last week (approx),
> and I think it's time I 'went public' with it.  I'm afraid there's a
> lot of stuff here; I thought it worth presenting the supporting data to
> back up my arguments...  Put it down to practice for the PhD thesis if
> you want.  ;-)

> Here's a sample of the output:
> 
> | Package: stow
> | Version: 1.3.2-9
> [...]
> | installed-size: 140
> | Du: 1	usr
> |  16	usr/bin
> |  1	usr/doc
> |  13	usr/doc/stow
> |  74	usr/doc/stow/html
> |  19	usr/info
> |  1	usr/lib
> |  2	usr/lib/menu
> |  1	usr/man
> |  7	usr/man/man8

Would that already be a correct Packages file or would dpkg and
similar scream about wrong entries? Could old dpkg's handle the new
entries?

> A couple of points.  Firstly, the size of the 'contrib' section's
> Packages.du file is large because of the picon-* packages.  These packages
> contain a very deep directory hierarchy with a few small files in each
> leaf directory.  That gives us a very large 'du' output, even wrt the
> size of the package, and the packages are large anyway.  We might want
> to prune some of those directories, despite the inelegance of doing so.
> './fnfilter' does that.

Lets trimm those to reasonable size. Directories that are package
intern will hardly be on separate partition. Especially when they are
small. Any package specific directory with less than 100 blocks of
size can be assumed to be on one partition. In the above example we
coulkd assume that /usr/doc/stow will all be on one partition and that 
/usr/doc/stow/html can be omitted, because its less than 100K
(/usr/doc/stow/html should be counted to /usr/doc/stow giving 89
Blocks for it). Creating a partition for /usr/doc/stow/html will waste 
more space than save and linking it somewhere else will not shift
enough to make it worth while.

[snip]
> I looked at several possibilities (percentage expansion is relative to
> the original Packages files):
> 
>  - Leave the 'Du' entry unencoded:
>  - Encode using 'sort +1 | ./myfrcode':
>  - Encode with 'gzip -9n | uuencode -':
>  - Encode with './squish':

Theres another possibility:
Normal users wont backup their System, repartition differently and
restore the backup (at least not often). Also they wont move
directories around and link them often. We could thus trimm the du
tree down to what the current system reflects. In case the user does
repartition or shift directories around, the free size increase when
deleating Package would be wrong. Extra Size needed for updating
should not be affected, since that would use the untrimmed tree from
the Packages.gz file and the current disk structures. If its wrong,
than only a bit. Also the du trees could be recalculated easily by
regenerating them from the Packages.gz. Sine we need a Function to
generate a du tree anyway, so people can update to the new system,
this wouldn't mean more work.

> Summarising:
> 
> If network bandwidth is the only factor, we should do some form of
> front-coding on the du entries (preferably a non-broken form), because
> this gives the smallest compressed size.
> 
> If the size of the 'available' file is the only factor, we could use
> './squish', because this gives the smallest uncompressed size.
> 
> On balance, I'd say that front-coding gives a reasonable compromise
> between network bandwidth, size of available file, and human-readableness.
> If human-readability is more important than size of 'available' file, then
> we should simply put the un-encoded 'du' output into the Packages files.

May the Source be with you.
			Mrvn


--
To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: