[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Data does NOT belong in Debian (was: Stop Archive bloat)



On 18 Oct 1999 18:16:58 -0700, Philippe Troin <phil@fifi.org> wrote:
>I think we need a policy on "pure data" packages.

Start by defining what a "pure data" package is.  Here's my humble attempt:

"pure data packages are packages which consist of minimal debian/rules
which simply re-package a tarball in .deb format without modifying
individual files from the original sources, possibly rearranging files
to comply with Debian policy in the process, and possibly applying a 
reversible transformation to some or all of the files (e.g. compression)."

>"Pure data" packages are a problem because:
>  1) The way the Debian archive works requires the data to be stored
>     twice (source package and .deb).
>  2) There is NO packaging needed. It's just a tar ball.

There are data packages that are not just generated from tar balls.
Some packages have non-trivial debian/rules, especially multi-format
documentation packages generated from a single SGML source.  The Linux
HOWTOs, if regenerated from source in all possible target formats,
would expand to many times their size in binary packages.  What is your
opinion on those?

>  3) Where do we stop ? As someone says, there's nothing preventing
>     me from uploading as debian package every single .wav or .mov
>     file on the Internet just because it's useful.

This is the real problem.  Some things just don't belong in Debian,
even though they legally and technically can be distributed via Debian.

>This is what I believe are acceptable "pure data" packages:
>  1) Data which is absolutely required for a program to work.

Hmmm...what about theme packages for desktops?  Will Debian allow packages
of sound files, icons, patterns, and color selections for GNOME etc?

What about "pure data" packages that use formats that are understood by
only one program?  Replacements for the big non-executable file that
comes with Quake and other games come to mind--none of those would be
"absolutely required", but they'd be darned useful.

Another problem with the words "absolutely required" is that it encourages
minimal-size packages in favor of higher-quality ones.  For example,
one might want to make a package out of better patches (sound data) for
Timidity, but the policy of "absolute requirement" would prefer that the
existing 11 megabytes be replaced with something smaller, e.g. by cutting
the sample rate by half.  We'd also lose the packages for optional
game sounds.

On the other hand, if we had WebDeb, (I like that name ;-), then all we'd
have to do is fetch these packages from a different place.

>  2) Data historically present on all Unix systems (eg
>     /usr/dict/words).

or /usr/share/games/fortunes ;-)

>  3) Documentation (documentation packages should still remain).

Make that "documentation for other non-data packages in Debian."

What happens when people start releasing packages with MPEG format
training videos instead of text documentation?

>  4) Small examples or data sets.

"Illustrative" examples might be a better term.  "Small" is ill-defined;
a "small" MPEG-2 example file might be dozens of megabytes.

>  5) Linux-specific or debian-specific data (HOWTOs, FAQs,
>     debian-user-guide).

>Examples of data packages which does NOT belong to debian (IMHO):
>  1) Any kind of religious or political texts (bible-kjv, anarchism)

"...which are also not documentation for programs."  Otherwise, I think
the emacs and vi manual or the GPL might count as "religious or political
texts" and be excluded.

>  2) Any kind of text easily findable on the web (RFCs (even though I
>     love to have RFCs around, but we have a draw a line))

RFC's are a kind of HOWTO for network administrators and engineers.
I think many RFC's qualify as program documentation.  Some others don't,
of course...but splitting them into packages based on their obsolescence,
utility, or humor content requires some editorial work.

>  3) Any datasets beyond examples or toy datasets
>     (gmc-coast-whatever).

I guess the coastline data in "xearth" qualifies as a "toy dataset."

>Pros of this policy:
>  1) Makes Debian smaller.
>  2) Avoids controversial materials (politics and religious texts)

I can see it now...

"Debian bans the bible but keeps all the foul language in the xscreensaver
sources.  What has this world come to?  Somebody, think of the children!"

	;-)

>Cons:
>  1) People which don't have access to the net find these packages
>     invaluable... 
>    Reply: Yes, then create a separate project "WebDeb" with the goal
>           of packaging anything in the .deb format.

I think this is by far the best solution, but I think Debian should be
broken up into smaller, more independent pieces anyway.  ;-)  

..deb is really just a tarball with extra information on the package and
some guidelines for what should be inside it.  It's much cleaner as a
"pure data" encapsulation format for distribution than some of the things
other people use, e.g. self-installing Win32 .EXE files.

As other people pointed out, there are other advantages to having pure
data in .deb format:  easy distribution via apt, and management of the
files when they're installed on the system.


-- 
I don't speak for Corel, I just work for them.  Use zygob@corel.ca for work, 
zblaxell@furryterror.org for play, and zblaxell@feedme.hungrycats.org for PGP.
PGP fingerprint: 01 94 0F B3 46 B7 71 C3  D4 98 39 99 1B 34 45 A1
PGP public key:  http://www.hungrycats.org/~zblaxell/pgp-public.txt


Reply to: