[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Let's shrink Packages.xz

On Mon, Jul 14, 2014 at 06:25:47PM +0200, Jakub Wilk wrote:
> Description-md5     794.3 KiB   11.9%

Needed to provide a mapping as versions change a lot more often than
descriptions do; also, historically, Translation-* were outside of the
control of ftpmasters (at least, that is what history digging told me).
It is also relatively new in the Packages file, which leads me to:

With a slight change in semantic we could drop the field from the
Packages file again anyhow: At the moment it is the MD5sum of the long
description. If it isn't present the clients are expected to calculate
it for themselves (well, this was required to work with Translation-*
before we moved long descriptions to Translation-en, so very new clients
might not know about that). So if we change this to MD5sum of whatever
is in the description field (short or long), we could drop it from the
Packages file and clients will again calculate this themselves to look
stuff up with it in the Translation files (where this field came from).

I haven't tested, but that should work without any change in apt (okay,
apt-ftparchive needs to be patched), so first stop for someone wanting
to drive this is probably dak - takers? Other servers and maybe clients
need to be adapted, but that could be done rather uncoordinated as there
is usually just one server creating both Packages and Translation-*
files, so it will have the same semantic interpretation and clients
either take what they get or already implicitely have the "whatever is
in the field" semantic.

(sidenote: see my other mail for the non-existent security implications
of using md5 here if you care)

> Description         463.4 KiB    7.0%

ftpmaster's actually wanted to drop that in their final implementation
of the long description splitout. We got the short description back as
it wasn't part of the initial plan and clients didn't liked that (=
apt-cache search would segfault for example), beside that I prefer to
have at least a short description around in any case. I think if we drop
one of them, it should be the -md5 field as it isn't as compressible as
human-readable text… (not to mention quite useless for a human).

> SHA256             1463.8 KiB   22.0%
> SHA1                938.9 KiB   14.1%
> MD5sum              752.4 KiB   11.3%

I *guess* the most painless drop would be SHA1. Entirely dropping it
from the archive means changing the pdiff infrastructure though.
Someone ought to check that claim…

Dropping MD5 will break some scripts parsing apt output. I personally
hate breaking users, so any takers to check/fix that at least Debian
tools do not break? Entirely dropping would be easy after this is done
(modulo Description-md5 of course, but see there).

Adding/Changing to SHA512 in the indexes is probably close to useless,
in the Release file the benefit is probably not worthwhile, but it is
here if need would arise. I have some hope that with apt/experimental we
will be able to add new hashsums with less pain (aka: no abibreak), too,
but that just as a sidenote.

> [other fields - present hopefully only for comparison proposes]

For the rest it is hopefully clear why we can't drop them, even though
I kinda like the idea of dropping dependencies… would make installing
stuff so much simpler… ;)

> Format changes ala base-whatever, \0, …

Changing the format is _*EXTREMELY*_ painful. It is also nice to have
a textfile you can work with easily… If you want to improve, this
improvement should be factored into a compression algorithm so that not
every parser in the universe needs to be rewritten… (one of apts
testcases uses 'rev' as a "compression" algorithm.  You just need to set
some options, advertise the availability in the Release file and you are
good to go…)

Best regards

David Kalnischkies

Attachment: signature.asc
Description: Digital signature

Reply to: