[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Large data packages in the archive



Hi folks,

quoting below the whole mail that went to debian-games@l.d.o[*] and
debian-devel@l.d.o. By the time I'm keeping up with a bunch of unread
mails, it looks like all answers were directed to debian-devel@l.d.o, so
I guess we (pkg-games, minus people already reading d-d) will have to
read it through the archives.

As a start, the following page should do the trick (though I can't say
for sure since I'm offline right now):
http://mid.gmane.org/87tzgm6yee.fsf@vorlon.ganneff.de

Mraw,
KiBi.

[*] The mail went to pkg-games-devel@l.a.d.o, but I don't know whether
    debian-games@l.d.o points there or whether the mail got bounced by
	someone reading debian-devel.

On 25/05/2008, Joerg Jaspert wrote:
> Hi,
> 
> one important question lately has been "What should we do with large
> packages containing data", like game data, huge icon/wallpaper sets,
> some science data sets, etc. Naturally, this is a decision ftpmaster has
> to take, so here are our thoughts on it to facilitate discussion and see
> if we missed important points but we keep the right to have the last
> word how it gets done. :)
> 
> 
> Basic Problem: "What to do with large data packages?"
> 
> That already has a problem: How to define "large"? One way, which we
> chose for now, is simply "everything > 50MB".
> 
> 
> While the archive software is written in Python, this problem sounds
> like a Perl one as "There is more than one way to do (solve) it":
> 
> a.) We can simply say that we don't want this in Debian and people
>     should use external hosting for such packages. After all they are
>     for a very small minority usually.
> 
> b.) We can just add another component "data" besides
>     main/contrib/non-free.
> 
> c.) We can host an own archive for it under control of ftpmaster.
> 
> 
> The first two seem to have grave problems:
> 
> a.) Is basically no (good) option. It is our job to maintain the
> 	archive, and if there is enough demand we should make it possible to
> 	also host things like these data packages. Additionally it has the
> 	problem that it would require a move of everything that needs those
> 	data packages into contrib, as there wouldn't be a good base for a
> 	Policy exception.
> 
> b.) While that would be the most simple solution it has other problems,
> 	large enough that we decided against it. The biggest one being that
> 	of the principle of least surprise for our mirrors. We are talking
> 	about this to not bloat the main archive too much. If we just add
> 	another component stuff will end up mirrored a lot. Even if we send
> 	an announcement weeks before. Requiring every mirror admin to take a
> 	decision if they want to mirror or exclude it, then adjust their
> 	scripts, is a simple no-go for us.
> 
> So the way to go for us seems to be c.), hosting the archive ourself
> (somewhere below data.debian.org probably).
> 
> 
> For all the rest of the mail I talk about solution c., unless otherwise
> stated.
> 
> 
> So assume we go for solution c. (which is what happens unless someone
> has a *very* strong reason not to, which I currently can't imagine) we
> will setup a seperate archive for this. This will work the same way as
> our main archive does, with a few notable points:
> 
>  - It will be solely arch:all, not splitted per architecture. Or, if
>    someone presents *good* reasons why a data archive needs to be
>    architecture-aware, we will also offer this, but *NO* autobuilder
>    support will be provided.
>    This is meant as a place for large datasets, and those should be
>    arch independent. And would kill many autobuilders (think of binary
>    packages having 500, 800 or more megabytes!)
> 
>  - It is an own archive, so it needs full source uploads to work,
>    every data package you create will be a full source package and you
>    have to split the source between this archive and the rest that goes
>    into the normal Debian one.
> 
>  - We need to change policy. It currently forbids packages in main to
>    Depend/Recommend something outside of it (which is good). As that
>    would make the data archive less useful, I propose to change this to
>    something including the meaning of "Packages in main are allowed to
>    recommend packages in the data archive".
>    Dependencies should *not* be allowed, but read the next point.
> 
>  - Packages in main need to be installable and not cause their (indirect)
>    reverse build-depends to FTBFS in the absence of data.debian.org.
>    If the data is necessary for the package to work and there is a small
>    dataset (like 5 to 10 MB) that can be reasonably substituted for the
>    complete data package, the smaller dataset should be included in
>    main and the package then may depend on "foo-data | foo-data-small".
> 
> 
> Any comments?
> 
> Timeframe for this? I expect it to be ready within 2 weeks.
> 
> -- 
> bye, Joerg
> Some AM after a mistake:
> Sigh.  One shouldn't AM in the early AM, as it were.  <grin>



> _______________________________________________
> Pkg-games-devel mailing list
> Pkg-games-devel@lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-games-devel

Attachment: pgpD4XUrKwHzN.pgp
Description: PGP signature


Reply to: