Re: Reasonable maximum package size ?

To: debian-devel@lists.debian.org
Subject: Re: Reasonable maximum package size ?
From: Yaroslav Halchenko <debian@onerussian.com>
Date: Tue, 24 Jul 2007 18:01:38 -0400
Message-id: <[🔎] 20070724220138.GA18270@washoe.onerussian.com>
Mail-followup-to: Yaroslav Halchenko <debian@onerussian.com>
In-reply-to: <20070605131431.GE8403@azure.humbug.org.au>
References: <20070605080907.GA3416@gloin> <20070605092853.GF19396@kunpuu.plessy.org> <20070605131431.GE8403@azure.humbug.org.au>

Hi,

I am sorry to reincarnate the thread but I just wanted a simple
clarification and give also few cents of my thoughts.

> 	* it's better to have stuff distributed by Debian than sourced
> 	  elsewhere; we're a distribution, distributing is What We Do
> 	* it's better for users to have stuff in .deb's, so they don't
> 	  have to worry about different ways of managing different stuff
> 	  on their system
> 	* some large data sets are just "compiled" -- it can be good to
> 	  distribute a small amount of source in a .deb and compile
> 	  it on the user's machine.
2nd-ed all 3 points
> 	* some large data sets are "compiled" but it takes long enough that
> 	  we don't want to do it on user's machines, so we have the usual
> 	  source/deb situation here, and that's fairly easy too.
2nd-ed: just how subjective here is time/space tradeoff?

> 	* (***) many data sets don't fit those patterns though, but
> 	  >...<
> 	* (###) having .deb's generated on a user's system means they
> 	  >...<
> 	  to be mirrored separately; having .deb's be the source format
> 	  requires converting from the upstream source format adding
> 	  complexity and making it harder to trace how the packaging
> 	  worked
Since most of the time we (programmers ;-)) hate to do things manually,
such repackaging should be automated anyways. I can say that quite a few
people make use of dh_wraporig which was devised for the cases when
source package had to be repackaged before entering debian:
http://lists.debian.org/debian-mentors/2007/03/msg00268.html
So I see possibility and desire in a similar tool (or may be just
further development of dh_wraporig), so it could handle automatic
repackaging of the  datasets.

> I guess an evil solution to *** that doesn't cause problems with ###
> would be to create a dummy source package that Build-Depends: on the
> exact version of the package it builds, so that uploads include a
> >...<
here is where I got stuck with such approach: conventionally I just
dgetted sources and tried to build the package with dpkg-buildpackage.
Of cause I failed to accomplish the mission since Build-Depends weren't
satisfied... so indeed it seems to be confusing or my brain is not
working now...

My suggestion (I might be duplicating someone else' idea, please pardon
me) -- for arch 'all': automatically download (copy) data during
build of the package.  I know that somewhere now I will hit the roof in
debian policy, but for whatever it is worth.

automate building of the package so that smth like dh_wraporig downloads
(on_demand via debian/watch mechanism) the original dataset,
debian/rules prepares data and stuffs .deb packages with the necessary
data, so that .orig.tar.gz doesn't contain any data, and diff contains
all the scripts/instructions/verification (md5sum in
debian/README.Debian-source). I don't think this would require
Build-depends on the data packages.

Since _all packages need not be rebuild for each architecture, no buildd
box would have a problem, and anyone having internet access will be able
to rebuild the package if that needs to be done.

> I'm not sure if avoiding duplicating the data (1G of data is bad, but
> 1G of the same data in a .orig.tar.gz _and_ a .deb is absurd) is enough
> to just use the existing archive and mirror network, or if it'd still be
> worth setting up a separate apt-able archive under debian.org somewhere
> for _really_ big data.
2nd-ed. At first I thought about adding another major (to complement
main,contrib,non-free) but that ruins orthogonality since we would need
main-data,contrib-data... So, it was a bad idea. Thus, separate apt rep
like data.debian.org with fine-grained sections (science/med,
science/bio, etc) would allow easy and selective (debmirror
--exclude-deb-section=regex should be complemented with
--deb-section=regex, to make selection easier than exclusion of
everything besides necessary sections) mirroring. I think that any
research group using Debian should have their own debian mirror anyways
;) and now just 1 more mirror for data specific to their needs, but from
the global debian mirror.

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik

Reply to:

Follow-Ups:
- Re: Reasonable maximum package size ?
  - From: Charles Plessy <charles-debian-nospam@plessy.org>
- Draft: empty source package (was: Re: Reasonable maximum package size ?)
  - From: Michael Hanke <michael.hanke@gmail.com>

Prev by Date: Re: unarchiving doesn't work?
Next by Date: Re: New Debian Menu & Apps/Tools
Previous by thread: Re: unarchiving doesn't work?
Next by thread: Re: Reasonable maximum package size ?
Index(es):
- Date
- Thread