[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reasonable maximum package size ?



On Tue, Jun 05, 2007 at 06:28:53PM +0900, Charles Plessy wrote:
> Le Tue, Jun 05, 2007 at 10:09:07AM +0200, Michael Hanke a ?crit :
> > My question is now: Is it reasonable to provide this rather huge amount
> > of data in a package in the archive?
> > An alternative to a dedicated package would be to provide a
> > download/install script for the data (like the msttcorefonts package)
> > that is called at package postinst.
> I recently had a heretic idea that I did not dare to submit yet: we
> could port fink to Debian, and use it to build .debs from info files
> shipped in Debian packages in main, and sources downloaded from
> upstream's FTP sites.

Some thoughts on constraints:

	* it's better to have stuff distributed by Debian than sourced
	  elsewhere; we're a distribution, distributing is What We Do

	* it's better for users to have stuff in .deb's, so they don't
	  have to worry about different ways of managing different stuff
	  on their system

	* some large data sets are just "compiled" -- it can be good to
	  distribute a small amount of source in a .deb and compile
	  it on the user's machine.

	* some large data sets are "compiled" but it takes long enough that
	  we don't want to do it on user's machines, so we have the usual
	  source/deb situation here, and that's fairly easy too.

	* (***) many data sets don't fit those patterns though, but
	  instead are just a bunch of data that needs to be shipped to
	  users. doubling that by having it duplicated in a .orig.tar.gz
	  and _all.deb is less than ideal

	* some data sets have large raw data and large compiled versions,
	  so need a large source _and_ a large .deb containing different
	  info. nothing much to be done in that case, though

	* (###) having .deb's generated on a user's system means they
	  can't use aptitude or apt-get to install them easily; having
	  .deb's generated on mirrors requires smart mirroring software
	  rather than just rsync or similar; having .deb's generated by
	  the maintainer or buildds requires both the source and .deb
	  to be mirrored separately; having .deb's be the source format
	  requires converting from the upstream source format adding
	  complexity and making it harder to trace how the packaging
	  worked

For the ***'d case, it seems like having a debian.org mirror network
that distributes unprocessed data tarballs, that're converted into debs
and installed on user's systems would be workable.

I don't see how we could resolve that with the ###'d concern though.

If we were to resolve the ###'d concern by changing apt etc, we could
conceivably add foobar_1.0.7-1_data.tar.bz2 files to the archive in the
existing sections, for instance, and providing some form of "Packages.gz"
file for them.

I guess an evil solution to *** that doesn't cause problems with ###
would be to create a dummy source package that Build-Depends: on the
exact version of the package it builds, so that uploads include a
basically empty .tar.gz that just has instructions on how to download
new versions of the data, and an unprocessed copy of the actual data
converted to _all.deb form. That'd give the correct behaviour for all
the tools we've got, avoid unnecessarily duplicating the data, and maybe
not be *too* confusing.

Hrm, actually, I kind-of like that approach...

I'm not sure if avoiding duplicating the data (1G of data is bad, but
1G of the same data in a .orig.tar.gz _and_ a .deb is absurd) is enough
to just use the existing archive and mirror network, or if it'd still be
worth setting up a separate apt-able archive under debian.org somewhere
for _really_ big data.

Bug#38902 for hysterical interest, btw.

Cheers,
aj

Attachment: signature.asc
Description: Digital signature


Reply to: