[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: On adding size info to Packages files [very long]

Manoj Srivastava writes:
>        I think we should look at the possibility of not including the
> information in either the Packages file nor the available file. The
> Du files hsould be separately kept on the archives, and they maybe
> compressed with gzip (bzip2?); and downloaded and kept in
> /var/lib/dpkg/DU.gz or something on the users machine; and they need
> only be downloaded if required by the user. Keeping this information
> separate makes using this optional.
>        I see no technical advantage encoding this in Packages files
> and available file.


  $ for x in main contrib non-free; do
  >   gzip -cd Packages.hamm.$x.du.gz \
  >   | sed -n '/^Package:/p;/^Du:/,/^$/p' \
  >   | gzip -9n > Sizes.hamm.$x.gz
  > done
  $ gzip -l Sizes.hamm.*.gz
  compressed  uncompr. ratio uncompressed_name
     105201    795450  86.7% Sizes.hamm.contrib
      66294    402982  83.5% Sizes.hamm.main
      11446     63766  82.0% Sizes.hamm.non-free
     182941   1262198  85.5% (totals)
  $ gzip -cd Sizes.hamm.main.gz | head -15
  Package: 2utf
  Du: 3	etc
   1	usr
   111	usr/bin
   1	usr/doc
   8	usr/doc/2utf
   25	usr/doc/2utf/examples
   1	usr/man
   5	usr/man/man1
   1	var
   12	var/lib
  Package: 3dchess
  Du: 1	usr
   1	usr/doc

Looks reasonable...  In practice, this information is only going to be
used while installing packages, and 180K isn't much anyway.  We could
save far more space by compressing the available, available-old, status
and status-old files (2.5Mb on my system).

>        We have conflicting data here. Mrvn says that the total du
> data is only 76k. Charles says that the data is about 400k (which is
> way more in line with my off the cuff calculations).

The 400K was for normal hamm Packages files with additional Du data added
to it.  That makes my numbers far closer to Mrvn's.  Also, weren't Mrvn's
figures were for main only?

>        I am inclined to believe the 400k figures. I would, for
> scalability reasons, advocate that we re run our scripts on a _ful__
> i386 mirror (which I do not have at the moment -- ran out of space).

I generated my data from unix.hensa.ac.uk's mirror.

>        I also would strongly advocate *NOT* stuffing this data into
> the Packages or the Available files, but keeping this apart on the
> archive and when downloaded on the users disk.

I'm now with you on this one.  Given the sizes involved, I don't think we
even need to go to the trouble of generating the "top N levels" versions.
Using this would make it difficult to take symlinks into account.

Charles Briscoe-Smith
White pages entry, with PGP key: <URL:http://alethea.ukc.ac.uk/wp?95cpb4>
PGP public keyprint: 74 68 AB 2E 1C 60 22 94  B8 21 2D 01 DE 66 13 E2

To UNSUBSCRIBE, email to debian-devel-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: