Documentation of original source tarballs
During my NM process I became aware that the project has almost no
documentation of how .orig.tar.gz's in source uploads are expected
to behave. In this particular case, it turned out that I had entirely
misunderstood what "pristine source" means in a Debian context, and
none of the recommended reading for new maintainers had set me right
The only place in the project documentation where .orig.tar.gz is
mentioned seems to be Appendix C.3 of the policy manual, which turned
out to be woefully out of date when I started searching through the
list archives and the source for dpkg-source.
I have tried to consense my corrected understanding of the technical
requirements and current best practices into the following text, which
I intend to contribute to some appropriate collection of
documentation. For now, I solicit comments, especially on the
1. Is my understanding correct at all?
2. I had to invent a term for .orig.tar.gz's that are not pristine -
at least I haven't found evidence of any term in common use.
Suggestions for better terms than "repackaged upstream source"
would be welcome.
3. Where should a text such as this go? At the moment, it is phrased
as if it is to be inserted into section 6.4.1 of the Developer's
Reference, but I am not sure that this is the best place.
4. Would it be better to replace the entire Appendix C of the policy
manual with a freshly written document that explains all about
source packages in general? (Yes, of course that would be "better",
but it would also be much more work, so the question is whether it
is so much better that documenting .orig.tar.gz should be postponed
until we have the real thing).
5. At the end of my draft I list some normative guidelines for
repackaged source. These are partly from the current C.3, and
partly my understanding om "common sense". Assuming that they as as
noncontroversial as I believe them to be, should they rather be
part of the policy manual?
-- draft follows --
There are two kinds of original source tarballs: Pristine source
and repackaged upstream source.
The defining characteristic of a pristine source tarball is that
the .orig.tar.gz file is byte-for-byte identical to a tarball
officially distributed to the upstream author.  This makes it
possible to use checksums to easily verify that all changes between
Debian's version and upstream's are contained in the Debian
diff. Also, if the original source is huge, upstream authors and
others who already have the upstream tarball can (in principle)
save download time if they want to inspect your packaging in
There is no universally accepted guidelines that upstream authors
follow regarding to the directory structure inside their tarball,
but dpkg-source is nevertheless able to deal with most upstream
tarballs as pristine source. Its strategy is equivalent to the
1. Unpack the tarball in a empty temporary dicectory by doing
zcat path/to/<packagename>_<upstream-version>.orig.tar.gz | tar xf -
2. If, after this, the temporary directory contains nothing but one
directory and no other files, rename that directory to
<packagename>-<upstream-version>(.orig), and be done. The name
of the top-level directory in the tarball does not matter, and
3. Otherwise, the upstream tarball must have been packaged without
a common top-level directory (shame on the upstream author!).
Rename the temporary directory *itself* to
Repackaged upstream source
You SHOULD upload packages with a pristine source tarball if
possible, but there are various reasons why it might not be
possible. This is the case if upstream does not distribute the
source as gzipped tar at all, or if upstream's tarball contains
non-DFGS-free material that you must remove before uploading.
In these cases the developer must construct a suitable .orig.tar.gz
file himself. We refer to such a tarball as a "repackaged upstream
source". Note that this is different from a Debian-native package;
a repackaged source still comes with Debian-specific changes in a
separate .diff.gz and still has a version number composed of
<upstream-version> and <debian-revision>.
There may be cases where it is desirable to repackage the source
even though upstream distributes a .tar.gz that could in principle
be used in its pristine form. The most obvious is if *significant*
space savings can be achieved by recompressing the tar archive or
by removing genuinely useless crud from the upstream archive. Use
your own discretion here, but be prepared to defend your decision
if you repackage source that could have been pristine.
A repackaged .orig.tar.gz
1. MUST NOT contain any file that does not come from the upstream
author(s), or whose contents has been changed by you. 
2. SHOULD, except where impossible for legal reasons, preserve the
entire building and portablility infrastructure provided by the
upstream author. For example, it is not appropriate to omit
source files that are used only when building on MS-DOS, or to
omit a Makefile provided by upstream even if the first thing
your debian/rules does is to overwrite it by running a configure
(Rationale: It is common for Debian users who need to build
software for non-Debian platforms to fetch the source from a
Debian mirror rather than trying to locate a canonical upstream
3. SHOULD use <packagename>-<upstream-version>.orig as the name
of the top-level directory in its tarball. This makes it
possible to distinguish pristine tarballs from repackaged ones.
4. SHOULD be gzipped with maximal compression.
The canonical way to meet the latter two points it to let
"dpkg-source -b" construct the repackaged tarball from an unpacked
 We cannot prevent upstream authors from changing the tarball
they distribute without also upping the version number, so
there can be no guarantee that a pristine tarball is identical
to what upstream is *currently* distributing at any point in
time. All that can be expected is that it is identical to
something that upstream once *did* distribute.
If a difference arises later (say, if upstream notices that he
wasn't using maximal comression in his original distribution
and then re-gzips it), that's just too bad. Since there is no
good way to upload a new .orig.tar.gz for the same version,
there is not even any point in treating this situation as a bug.
 As a special exception, if the omission of non-free files would
lead to the source failing to build without assistance from the
Debian diff, it might be appropriate to instead edit the files,
omitting only the non-free parts of them, and/or explain the
situation in a README.Debian-source or similarly named file in
the root of the source tree. But in that case please also urge
the upstream author to make the non-free components easier
severable from the rest of the source.
Henning Makholm "What a hideous colour khaki is."