[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Improving your archive and package system for small package



Hi!

On Thu, 2015-09-03 at 13:26:12 -0700, Josh Triplett wrote:
> Jonas Smedegaard wrote:
> > Seems Osamu Aoki is working on at least part of the puzzle:
> > https://bugs.debian.org/797045
> 
> Merging multiple sources *really* shouldn't be necessary.  And the
> metadata for those sources will vary, so that likely won't save that
> much space.

Well, there seems to be different kinds of overhead when it comes to
extremely tiny packages (those with dozens or hundreds of lines of code).

Metadata is one, amount of packages on the distribution, installed systems
and files on the mirrors is another one.

All the above involve in one way or another some overhead on at least
the amount or size of source packages, binary packages, Sources and
Packages indices, package manager databases and possibly increased
dependency complexity, usage on disk after installation, inodes used
on mirrors or installed systems, number of source VCS, etc.

This can have a cost on the mirror network, buildds, on any team doing
distribution wide work, such as the ftp-masters, release, porter, QA or
reproducible teams, tools like lintian, autopkgtest, DUCK, VCS or watch
checkers, britney, botch, etc. On maintainers having to maintain hundreds
of similar tiny packages.


Doing package collections in Debian might reduce part of the above
overhead, but *if* this needs fixing, ideally it should be fixed
upstream. Having to package 100 new upstream release updates instead
of one is significant work, and that cannot be easily skipped if
upstreams do not do the conglomeration themselves.

> Perhaps we should add a few more things to common-licenses, or figure
> out if our packaging metadata could be further reduced or de-duplicated.
> It should be possible to package a 1kB library without several kB of
> overhead.

There are certain things that we could do to reduce overhead in some
places, I don't think we can easily reduce most of the overhead
anyway. For example each source and binary package contain a
changelog, that's usually what takes most space. Even if we went
with my proposal to store that and the copyright files in the dpkg
database, that might only reduce some overhead on installed systems.

> But even if we have to pay that overhead, so be it; we have
> tens of thousands of packages already, what's a few hundred more tiny
> JavaScript packages as long as they're actually used?

If we were talking about few hundred packages, I don't think anyone
would have much of an issue, I guess what people are worried about is
this setting precedent and opening the flood gates. That's probably
one of the reasons people have not tried to inject much of CPAN or CRAN
or similar upstream archives into Debian even if I don't think those
are as tiny as the ones proposed here, and most of it could be automated
for example.

Thanks,
Guillem


Reply to: