[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Introducing dgit - git integration with the Debian archive

On 25 August 2013 12:04, Raphael Hertzog <hertzog@debian.org> wrote:
> Hello,
> On Thu, 22 Aug 2013, Ian Jackson wrote:
>> I'm pleased to announce that dgit 0.7, which is a version of dgit
>> suitable for alpha and beta testers, is available in unstable.
>> >From the manpage:
>>        dgit [dgit-opts] clone [dgit-opts] package [suite] [./dir|/dir]
>>        dgit [dgit-opts] fetch|pull [dgit-opts] [suite]
>>        dgit [dgit-opts] build|sbuild [build-opts]
>>        dgit [dgit-opts] push [dgit-opts] [suite]
>>        dgit treats the Debian archive as a version control system, and
>>        bidirectionally gateways between the archive and git.  The git
> Basically, this is "Ubuntu Distributed Development" (UDD) but for Debian &
> Git (instead of Ubuntu & bzr).
> Have you looked at UDD? They have been doing this for multiple years and
> have much more experience than us here. I'm sure there a quite a few
> things to learn from what they did to not redo the same mistakes.

Things to learn from UDD:
1) the fact that debian didn't have a _standartised_ VCS repository
format, for UDD workflow all debian packages had to be imported, such
that lp:debian/package can be merged into lp:ubuntu/package.
2) 3.0 (quilt) causes problems:
- we had to go with committing .pc directory in the unpacked tree. As
otherwise, new patch end up at the start of the quilt series and can
cause the rest of series to fail to apply
- debian/patches/series.$vendor is evil, often series.ubuntu were not
updated/refreshed/rebased causing dpkg-source -x to fail with
- Merging two quilt series is a pain, as there is no $ quilt merge. We
end up unapplying both quilt series, merging the branches and throw
conflicts in debian/patches/series at the developer and asking them to
figure out what patches to apply and refresh quilt series themselves.
- versioning .pc directory is a pain, especially when quilt is
updated. E.g. newer versions of quilt added pointless .timestamp files
in the .pc directory which where not present in the automatic
lp:ubuntu/* and lp:debian/* branches which used older quilt
- a valid git/bzr patch may not be a valid quilt patch, and it turn
may not be a valid "patch" as considered by dpkg. It's getting better
with patch(1) starting to support git format-patch style patches. Thus
cherry-picking from upstream becomes a pain, I have multiple times
applied upstream cherry-picked patch, only later find out that e.g. +x
flag was not preserved, or fuzz is generated, or files are not
- tarball inside tarball packaging is evil & must die
3) Automatic importer is part of the UDD workflow, only because there
was no standartised developer created rich-VCS history on Debian side
which fully matched the archive state. And basing ubuntu branches, on
something that doesn't match debian uploads into the archive was a
4) automatic importer was necessory to import Debian history and well
it was not perfect: http://package-import.ubuntu.com/status/,
pristine-tar used to fail (importer was running on stable, now
upgraded and much better), dpkg-source -x sometimes fails, operational
issues (timeouts, OOM, etc), unreconcilable history (developer rebases
old tags, and importer can no longer reconcile it's state),
- history can be odd: UDD discovered where referenced uploads didn't
happen, or experimetal got ahead & then behind sid and has a really
hard time figuring out when, if ever, experimental got merged into
sid. (sometimes it's just abandoned)
I think james can give more examples.

The best UDD workflow seems to work with native packages:
As a highlight I can give example debian-installer. All
debian-installer git repositories are homogeneous and follow the same
All of d-i projects are imported into bzr branches
And then Ubuntu Installer team maintains branches where Ubuntu diverges, e.g.:
lp:~ubuntu-core-dev/apt-setup/ubuntu which frequently merges in debian
changes from lp:apt-setup

In a similar manner packages which use 1.0 format without a patch
system work really well with lp:ubuntu/* and lp:debian/* branches.

I have maintenance access to UDD & have filed a few bugs about it, and
all I can say is that dgit so far is getting a lot of things right:

1) round-trip tree guarantee
same is required for UDD, and automatic importer can fail to get the
state right when developers push different tree in the VCS vs what
dpkg-source -x produces.
Don't forget, e.g. git doesn't commit empty directories. I have seen a
case where bzr-git was used to push commits without empty directories
into lp:ubuntu/$pkg branches & then dpkg-source -x not matching the
state of the vcs, resulting in the automatic importer failing.

2) removing automatic importer
forcing all the checks on the developer side & forcing VCS commit to
match the src upload is a massive win. It means that one can actually
trust the archive & VCS commits. And they will always match. (Well one
can even verify that by unpacking the .dsc and comparing it to the
Dgit: commit id) After all the archive will always be authoritative,
as that's that gets GPG signature, is mirrored and gets deployed to
the users.

3) standard location, standard layout, standard naming
without that, it is impossible for derivatives to share VCS history

4) allowing free-for-all bolt-ons
UDD currently starts history with debian uploads, and forks/merges
ubuntu uploads consistently, such that lp:ubuntu/$pkg has full and
correct merge history dating back as far as possible. Whilst it is
very good, getting the history right is very tricky and UDD did find
many history anomalities based on version comparson only.
With dgit, it's awesome. Last commit is synthesised, thus only last
upload is the one that needs to be $ dpkg-source -x safe. But
developers are free to merge git upstream history, or previous Vcs-Git
repository history, thus from the first dgit push, one can have as
rich history as one wants/has.
And one is free to push pristine-tar (if makes sense/easy to
generate), and/or any other branches into the repository (git-dpm,
git-quilt, etc)
With UDD history being rigid, there was little room to allow history
rewritting/rebasing. As it would have been awesome to allow people use
bzr-pipes or bzr-looms or make history of: lp:debian/$pkg be based on
top of upstream VCS import lp:$pkg. None of those things are possible
as it can break round-trip tree state. With dgit, anything goes, as
long as one doesn't fiddle with dgit/* branches by-hand =)

I am exited about dgit, as for the first time it will be possible for
derivatives to centrally share history with Debian.

> https://wiki.ubuntu.com/DistributedDevelopment
> http://developer.ubuntu.com/packaging/html/udd-intro.html
> https://launchpad.net/udd
> I'm putting James Westby in CC (as I believe he's one of the core UDD
> developers, and also a Debian contributor). He might want to review dgit
> and share his hindsights.
> Among notable differences there's that dgit contrary to UDD decentralizes
> the creation of the branch with all the archive uploads. But I never used
> UDD and don't know it well enough to comment much more than that.

Not really, UDD and dgit are both centralized where it matters:
predictable state (matches archive uploads), predictable location
(url, branch name, etc).
UDD pre-created debian history, because there was none we could use.
And without importing debian history, creating just ubuntu branches
would be useless for merges.
In practice one doesn't actually care how far back the history goes,
as the history that is interesting is where developers get to do
intermediate commits between the two uploads to granulise the
changes.... and dgit totally allows for that, and even enforces the
user to commit everything before attempting to create .dsc & .changes
for the upload.



Reply to: