Questions to candidates: what is source?
The Debian Free Software Guidelines states that "The program must
include source code".
1. How do you define "source code" yourself?
2. I think that people have different ideas of what "source code" means.
Do you agree? Are there significant disagreements regarding this
issue within the Debian Project?
3. (If you answered "yes" to 2) Is that a problem?
4. (If you answered "yes" to 3) Is it necessary to amend DFSG?
5. (If you answered "yes" to 4) How it should be amended?
6. Which of the following satisfies DFSG #2? What is the general
principle? Or should it be case-by-case?
* ELF binary without C source
* Java class file without Java source
(This is reasonably decompilable: cf. jad package)
* Python bytecode without Python source
(This is easily decompilable: cf. decompyle package)
* Binary firmware data
* configure script without configure.in
* C source generated by Bison without .y source
* In general, automatically generated source without good way to
regenerate
(But generated file may include every line of original source,
perhaps as comments "This is generated from original line blah
blah")
* Prebuilt HTML file without LaTeX source
(cf. python-doc)
* Prebuilt CHM (Compiled HTML) file without source HTML
(This can be extracted: cf. chmlib, but perhaps not indexing
information)
* True type font made with autotracing without original bitmaps
(cf. autotrace, potrace)
* Opening book for board games without editing tools
(gnuchess-book and gnugo package have opening books, but these
are in well-known PGN and SGF format, so this is a hypothetical
question)
* Binary encoded data without source or encoding tools
(Wordlist, thesarus, etc. cf. bug #241279)
* Automatically generated character set encoding table without
tools originally used for generation.
(This rarely changes, so it's possible even the upstream doesn't
have tools anymore)
* Dump of neural network data without training data or without
exact method to duplicate the network
* In general, statistical data gathered from large amount of samples
(I am not sure, but I think Mozilla's "Universal Charset Detection"
uses character distribution table of East Asian languages gathered
from large samples)
* JPEG image without higher quality image from which it was compressed
(JPEG is lossy)
* Bitmap image merged from many layers without layer information
(e.g. GIMP's XCF format)
* Bitmap image without corresponding vector format
(e.g. SVG)
* MP3 compressed sound without original sound source
(MP3 encoders patent-encumbered? Also MP3 is lossy)
* Ogg Vorbis compressed sound without original sound source
(Ogg is lossy)
* FLAC compressed sound without original sound source
(FLAC is not lossy)
* Offline version of documentations in Wiki or FAQ CGI script, etc.
downloaded by, say, wget, without original Wikitext or FAQ database
dump
* Binary image of programming environment used for bootstrapping
purpose, but not exactly correspond to environment to be bootstrapped
(Think Lisp, Smalltalk, etc.)
* What else?
Seo Sanghyeon
Reply to: