Re: On adding size info to Packages files [very long]
Manoj Srivastava writes:
> I think we should look at the possibility of not including the
> information in either the Packages file nor the available file. The
> Du files hsould be separately kept on the archives, and they maybe
> compressed with gzip (bzip2?); and downloaded and kept in
> /var/lib/dpkg/DU.gz or something on the users machine; and they need
> only be downloaded if required by the user. Keeping this information
> separate makes using this optional.
> I see no technical advantage encoding this in Packages files
> and available file.
$ for x in main contrib non-free; do
> gzip -cd Packages.hamm.$x.du.gz \
> | sed -n '/^Package:/p;/^Du:/,/^$/p' \
> | gzip -9n > Sizes.hamm.$x.gz
$ gzip -l Sizes.hamm.*.gz
compressed uncompr. ratio uncompressed_name
105201 795450 86.7% Sizes.hamm.contrib
66294 402982 83.5% Sizes.hamm.main
11446 63766 82.0% Sizes.hamm.non-free
182941 1262198 85.5% (totals)
$ gzip -cd Sizes.hamm.main.gz | head -15
Du: 3 etc
Du: 1 usr
Looks reasonable... In practice, this information is only going to be
used while installing packages, and 180K isn't much anyway. We could
save far more space by compressing the available, available-old, status
and status-old files (2.5Mb on my system).
> We have conflicting data here. Mrvn says that the total du
> data is only 76k. Charles says that the data is about 400k (which is
> way more in line with my off the cuff calculations).
The 400K was for normal hamm Packages files with additional Du data added
to it. That makes my numbers far closer to Mrvn's. Also, weren't Mrvn's
figures were for main only?
> I am inclined to believe the 400k figures. I would, for
> scalability reasons, advocate that we re run our scripts on a _ful__
> i386 mirror (which I do not have at the moment -- ran out of space).
I generated my data from unix.hensa.ac.uk's mirror.
> I also would strongly advocate *NOT* stuffing this data into
> the Packages or the Available files, but keeping this apart on the
> archive and when downloaded on the users disk.
I'm now with you on this one. Given the sizes involved, I don't think we
even need to go to the trouble of generating the "top N levels" versions.
Using this would make it difficult to take symlinks into account.
White pages entry, with PGP key: <URL:http://alethea.ukc.ac.uk/wp?95cpb4>
PGP public keyprint: 74 68 AB 2E 1C 60 22 94 B8 21 2D 01 DE 66 13 E2
To UNSUBSCRIBE, email to email@example.com
with a subject of "unsubscribe". Trouble? Contact firstname.lastname@example.org