[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Decreasing packaging overhead



Hi,

[just picking a few random bits]

On Sun, Nov 01, 2015 at 12:33:19PM -0800, Josh Triplett wrote:
> Files, Checksums-Sha1, and Checksums-Sha256 are clearly redundant; has
> it been long enough that we can drop the first two yet?

apt/jessie should be fine with that, but as mentioned the last few times
we had this dropping MD5/SHA1 discussion: Its not totally unrealistic
that there are still tools which need changes. If it hasn't changed
since then jigdo would be an example. Using either of these hashes is
'no' problem if you take it just for intermediate steps and verify the
result at the end more heavily. Its how pdiffs work at the moment for
example (but we are working on changing it [0]).

What is clearly missing here is someone working on getting this forward.
Just waiting isn't going to do it. apt waited >10 years before having
the radical idea of wanting to deprecate repositories without a Release
file. It took merely hours before the first complains[1] tickled in.

[0] https://lists.debian.org/debian-dak/2015/10/msg00010.html
[1] No pointers, just the obvious xkcd#1172 reference


> Now that we use a secure hash, do we really need the sizes in those
> fields?

Once upon a time even MD5 was considered secure. Now its relatively easy
to find collisions, a little harder to do pre-image, but adding
a same-size requirement makes it harder. Also, checking if you got
"too much" data based on size is important to prevent deny of service
attacks as an attacker can otherwise fill up your disk. Oh and people
love progress reports.


> Furthermore, we could generate the filenames from the source
> name and version.

Filenames with or without epoch? (yes, that is a trick question) There
is also v3 additional orig tarballs and other lovely things to worry
about. For binary packages it might make sense through to move the info
in the Release file with a field containing enough variables to make
that fly. I considered that briefly for Changelog: (see thread-start of
[0] above), but then decided that this is too complicated for this.

That could surely be done if someone would get behind this.


> In the Packages files for binaries, we could eliminate a *massive*
> amount of redundancy by having a dedicated Packages file for "all", to
> avoid duplicating entries into every architecture's Packages file.
> That should not significantly increase overhead for end-users, and for
> any user of multiarch it'll decrease overhead.  A quick check on amd64
> shows that splitting out "all" into a separate Packages file would not
> change the combined uncompressed size at all, should not change the
> pdiff size at all, and would increase the combined compressed
> full-download size by 94k, from 9957k to 10051k, an increase of less
> than 1%.  That seems reasonable in exchange for eliminating 12
> duplicate copies of the 4396k used for "all" Packages files, times
> suites (oldstable/stable/testing/unstable/experimental), and that
> doesn't even count unofficial architectures, or snapshot.debian.org.

You are a few days too late for suggesting that idea, as Johannes
already pointed out. Still, that will be a bunch of work, so if anyone
wants to help…


> Ditto for translated descriptions, except that there, we should share
> descriptions across architectures by default, even for arch-specific
> packages.  Almost no packages have descriptions that vary by
> architecture.

We already share descriptions, see i18n/ … or what do you mean?


> For translated descriptions, Package and Description-md5 seem redundant.

Well, Package + the md5 of the original description as identifier was
chosen because versions change way more often compared to descriptions.
Only doing it based on package name is dangerous in terms of packages
changing greatly between versions, which if you are unlucky both still
exist in different architectures. A rarely noticed sideeffect of having
-md5 is btw that translations can be shared across repositories, so that
e.g. security.d.o (or experimental or your random bikeshed) uses the
translated descriptions of the main archive.  That isn't possible
anymore if you go for package name only.

What could have been done back then would have been using a shorter hash
I guess. It seems a bit too late to change that now, but if someone
feels like working on it I am not going to complain…


Anyway, a giant list of things which could potentially be done isn't
going to change anything as the problem isn't that we have too few tasks
for the giant contributor armies working on the tools which need to be
changed for something to happen…


Best regards

David Kalnischkies

Attachment: signature.asc
Description: PGP signature


Reply to: