[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Documentation of original source tarballs


During my NM process I became aware that the project has almost no
documentation of how .orig.tar.gz's in source uploads are expected
to behave. In this particular case, it turned out that I had entirely
misunderstood what "pristine source" means in a Debian context, and
none of the recommended reading for new maintainers had set me right
about it.

The only place in the project documentation where .orig.tar.gz is
mentioned seems to be Appendix C.3 of the policy manual, which turned
out to be woefully out of date when I started searching through the
list archives and the source for dpkg-source.

I have tried to consense my corrected understanding of the technical
requirements and current best practices into the following text, which
I intend to contribute to some appropriate collection of
documentation. For now, I solicit comments, especially on the
following points:

1. Is my understanding correct at all?

2. I had to invent a term for .orig.tar.gz's that are not pristine -
   at least I haven't found evidence of any term in common use.
   Suggestions for better terms than "repackaged upstream source"
   would be welcome.

3. Where should a text such as this go? At the moment, it is phrased
   as if it is to be inserted into section 6.4.1 of the Developer's
   Reference, but I am not sure that this is the best place.

4. Would it be better to replace the entire Appendix C of the policy
   manual with a freshly written document that explains all about
   source packages in general? (Yes, of course that would be "better",
   but it would also be much more work, so the question is whether it
   is so much better that documenting .orig.tar.gz should be postponed
   until we have the real thing).

5. At the end of my draft I list some normative guidelines for
   repackaged source. These are partly from the current C.3, and
   partly my understanding om "common sense". Assuming that they as as
   noncontroversial as I believe them to be, should they rather be
   part of the policy manual?

-- draft follows --

   There are two kinds of original source tarballs: Pristine source
   and repackaged upstream source.

   Pristine source

   The defining characteristic of a pristine source tarball is that
   the .orig.tar.gz file is byte-for-byte identical to a tarball
   officially distributed to the upstream author. [1] This makes it
   possible to use checksums to easily verify that all changes between
   Debian's version and upstream's are contained in the Debian
   diff. Also, if the original source is huge, upstream authors and
   others who already have the upstream tarball can (in principle)
   save download time if they want to inspect your packaging in

   There is no universally accepted guidelines that upstream authors
   follow regarding to the directory structure inside their tarball,
   but dpkg-source is nevertheless able to deal with most upstream
   tarballs as pristine source. Its strategy is equivalent to the

   1. Unpack the tarball in a empty temporary dicectory by doing

      zcat path/to/<packagename>_<upstream-version>.orig.tar.gz | tar xf -

   2. If, after this, the temporary directory contains nothing but one
      directory and no other files, rename that directory to
      <packagename>-<upstream-version>(.orig), and be done. The name
      of the top-level directory in the tarball does not matter, and
      is forgotten.

   3. Otherwise, the upstream tarball must have been packaged without
      a common top-level directory (shame on the upstream author!).
      Rename the temporary directory *itself* to

   Repackaged upstream source

   You SHOULD upload packages with a pristine source tarball if
   possible, but there are various reasons why it might not be
   possible. This is the case if upstream does not distribute the
   source as gzipped tar at all, or if upstream's tarball contains
   non-DFGS-free material that you must remove before uploading.

   In these cases the developer must construct a suitable .orig.tar.gz
   file himself. We refer to such a tarball as a "repackaged upstream
   source". Note that this is different from a Debian-native package;
   a repackaged source still comes with Debian-specific changes in a
   separate .diff.gz and still has a version number composed of
   <upstream-version> and <debian-revision>.

   There may be cases where it is desirable to repackage the source
   even though upstream distributes a .tar.gz that could in principle
   be used in its pristine form. The most obvious is if *significant*
   space savings can be achieved by recompressing the tar archive or
   by removing genuinely useless crud from the upstream archive. Use
   your own discretion here, but be prepared to defend your decision
   if you repackage source that could have been pristine.

   A repackaged .orig.tar.gz

   1. MUST NOT contain any file that does not come from the upstream
      author(s), or whose contents has been changed by you. [2]

   2. SHOULD, except where impossible for legal reasons, preserve the
      entire building and portablility infrastructure provided by the
      upstream author. For example, it is not appropriate to omit
      source files that are used only when building on MS-DOS, or to
      omit a Makefile provided by upstream even if the first thing
      your debian/rules does is to overwrite it by running a configure

      (Rationale: It is common for Debian users who need to build
      software for non-Debian platforms to fetch the source from a
      Debian mirror rather than trying to locate a canonical upstream
      distribution point).

   3. SHOULD use <packagename>-<upstream-version>.orig as the name
      of the top-level directory in its tarball. This makes it
      possible to distinguish pristine tarballs from repackaged ones.

   4. SHOULD be gzipped with maximal compression.

   The canonical way to meet the latter two points it to let
   "dpkg-source -b" construct the repackaged tarball from an unpacked


   [1] We cannot prevent upstream authors from changing the tarball
       they distribute without also upping the version number, so
       there can be no guarantee that a pristine tarball is identical
       to what upstream is *currently* distributing at any point in
       time. All that can be expected is that it is identical to
       something that upstream once *did* distribute.

       If a difference arises later (say, if upstream notices that he
       wasn't using maximal comression in his original distribution
       and then re-gzips it), that's just too bad. Since there is no
       good way to upload a new .orig.tar.gz for the same version,
       there is not even any point in treating this situation as a bug.

   [2] As a special exception, if the omission of non-free files would
       lead to the source failing to build without assistance from the
       Debian diff, it might be appropriate to instead edit the files,
       omitting only the non-free parts of them, and/or explain the
       situation in a README.Debian-source or similarly named file in
       the root of the source tree. But in that case please also urge
       the upstream author to make the non-free components easier
       severable from the rest of the source.

Henning Makholm                            "What a hideous colour khaki is."

Reply to: