[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Request for help - faithful source format (related to dgit)



This is a call for help for one or two volunteers who:

 - are keen on gitish (or similar workflows)
 - have some time right now
 - can speak Perl (as found in dpkg-source)
 - are willing and able to do some negotiation as well as coding


Introduction:

One persistent difficulty with our current source package
representations, for non-native packages, is that none of them
represent deletions of files which are present in upstream tarball(s)
but which the maintainer deletes (or wishes to delete).

This causes especial trouble for dgit.  (See dgit(7).)

I would like to see a better solution to this, preferably in stretch.
I think this is still possible.  I started an attempt but I got
slightly discouraged by some unexpected qualms in some quarters.  And
I have substantial amounts of other dgit work to do (notably, sane
handling of gbp git trees), so I am not going to be able to put in the
time to fix this properly for stretch.

Is there anyone who would like to help with this ?


Solution-neutral problem statement:

 * The primary objective is to be able to represent any tree (let us
   say, anything which is representable as a git "tree object") as a
   Debian source package, while still retaining the space/bandwidth
   and traceability advantages of basing the source package on
   upstream tarballs.

 * So there should be a non-native source format which is capable of
   accurately representing at least any git-representable tree, as
   `differences' from any source tarballs.

 * "Accurately" means that it must represent:
     - changes to executableness of files
     - files present in upstream but removed in Debian
     - replacement of a directory with a file or vice versa
     - symlinks
     - removal of "unusual" filesystem objects such as fifos,
       sockets, devices, or whatever (which some upstream
       tarballs ship)
     - creating, deletion and changes to binary files (ideally,
       efficiently representing small changes to big binaries)

 * This format should be available in a patch-stack-less version, at
   the very least.

 * This format should support multiple orig tarballs in the same way
   as `3.0 (quilt)'.

 * Ideally this would be in stretch and backportable to jessie or
   even earlier.

 * It is NOT a requirement that `the same changes' (as represented by
   whatever actual container file - .diff or .rsync or whatever- ends
   up being in the source package), can be applied to different
   upstream tarballs.  The assumption is that maintainers who use this
   format will use other means (eg, a version control system) to
   rebase the Debian changes onto a new upstream.


Stakeholders:

 * The dpkg maintainers (CC'd) obviously have a big stake in this
   and their support will be essential

 * ftpmaster will need to support the new format too

 * Consultation with the wider community of Debian contributors, and
   derivatives, would be helpful, but should not be allowed to block.


Technical ideas:

 * rsync batch mode provides a stable binary format, and rsync is
   extremely reliable software of exceptional quality and stability.
   rsync has a very good history of backward-compatibility.

   A format a bit like 1.0 but supporting multiple tarballs and
   replacing the .diff.gz with a .rsync.gz (with a specified rsync
   protocol version, to ensure compatibility) would work.

   I proposed this some time last year.  ftpmaster were concerned that
   they would lose an easy way to view the diff between upstream and
   Debian; so, this approach would require some kind of update to the
   ftpmaster tools.

   If a patch stack version is wanted, the .debian.tar.gz could be
   replaced by a .rsync.gz which creates debian/patches/* along with
   whatever other changes are wanted.  This may be confusing, because
   not every Debian change would be in debian/patches/.

 * There has been talk of using advanced GNU diff/patch features.  I
   am rather sceptical of these because they are very new, and because
   diff and patch have a rather poor history in backward compatibility
   terms.

   Also it may be difficult to control exactly which diff/patch
   features end up being used.  This has already caused some lossage,
   where a new not-backward-compatible source sub-format was
   accidentally introduced into our archive (!)

One part of the task I need help with is negotiating and selecting an
appropriate technical approach.


Thanks for your attention.

Ian.


-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.


Reply to: