[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Packaging with Git (Was: Re: Package "ownership" per team, and the use of `mr' to handle this.)



[ Sorry, this is a long one! Read it only if you're interested. ;) ]

Am Dienstag, den 09.12.2008, 20:56 +0100 schrieb David Paleino:
> On Tue, 09 Dec 2008 14:48:12 +0100, Manuel Prinz wrote:
> >       * Excellent branch/merge support. That's fast and cheap and allows
> >         one to have several independant branches to work on. This is
> >         especially handy if you need to touch upstream's code, as it
> >         makes changes more visible than a huge quilt-patch.
> 
> So, consider a package I'm team-maintaining in Debian, john. I'm currently
> applying several patches, something like 17 or so. Should I keep 17 separate
> branches for this? And how to handle patches touching the same files but in
> different points? (i.e. what the "series", "00list" and kinda files do,
> establishing patch applying order)

It depends on the workflow you choose. Indeed, having 17 branches sounds
scary but it has some advantages. One thing is that they are distinct,
so commits to a branch belong to a bugfix or feature you develop. It
also allows to get rid of it as soon as the changes are integrated
upstream. Also, if the changes are split into several commits (for a
reason), it is easy to send all of them upstream without fiddling the
right commits from the commit log. (Alternativ: send one large chunk.
From my experience not always welcome by upstream.)

The obvious drawback is the integration work you need to do for every
new upstream release. You would use an integration branch into which
you'd first merge upstream, followed by all feature branches. Currently,
there are only ways to do it semi-automatically. Of course, merge
conflicts may happen, as you mentioned, when touching the same file.
These just need to be resolved. But this is pretty much the same thing
you do with a patch series: You flatten the hierarchy, so that one patch
depends on each other. The drawback of this is that you can't just take
a patch out of the series and apply it to the source if it modifies a
file that was already modified earlier or will be modified later.
Resolving merge conflicts from branch merges is no different from
flatten patch sets with quilt; but the former has the advantage that it
has no dependence on other branches, which a patch set can't provide.

Nevertheless, you can of course use a model in which you have on branch
in which you do all your changes to the upstream code. This is pretty
much like the SVN workflow. It's doable with Git but IMHO has drawbacks.

To mention pkg-vcs again: The aim is to experiment with ways to ease
cross-distro development and DVCS are a big part of that. Think of
having a fix for a real nasty bug in your real-nasty-bug-fix branch: If
you keep it distinct, other distros (like Gentoo or Fedora or whatever)
can cherry-pick the changes in that branch and apply it to their
distribution. (Or the other way round: Debian can profit from them.) Of
course this is quite hypothetical at the moment since not everyone uses
Git and there is no infrastructure that supports the exchange, but it
the situation is not too bad. If you have a flattened patch in a quilt
series, grabbing and integrating that into a different distribution may
be hard or even impossible. So maintaining branches might be for the
profit of the whole FS community. We waste too much time fixing the same
stuff in every distribution already. Making exchange easier is IMHO
worth to achieve.

> >       * Support for cherry-picking. Importing changes from other
> >         developers easily can be very handy at times. (I sometimes even
> >         cherry-pick from my own branches.)
> 
> What do you mean by "importing changes from other developers"?
> I know that this might be related to the concept of DVCS, but isn't this prone
> to errors on merge/push?

No. Git uses the repository content to define it's state. SVN uses a
revision number. Lets assume you apply commits in different orders than
upstream or a developer, everyone has it's repo in the same state, since
the content is identical. This is usually no problem with Git at all;
but it sometimes is with SVN, since r1234 on your repo might not be
equal to r1234 in the repo of a different developer.

> >       * With TopGit: Automatic patch generation from branches. No real
> >         need to update a series quilt patches, one can auto-create them.
> >         (This also has the nice side effect that it does not clutter the
> >         commit diff.)
> 
> Read my consideration at point 1 :)
> However, I must admit I don't know what TopGit is, /me looks info on it.

It's still under development but rocks already. I do not use it
productively yet, though. There's a lot of experimenting going on about
it on pkg-vcs.

> >       * "All in one" repo: Upstream code + Debian packaging as seperate
> >         branch. With pristine-tar, I can recreate the upstream tarball
> >         when not available. svn-bp just does not work if you're on the
> >         road and don't have the tarball. (Keeping all tarballs for all
> >         versions costs disk space as well, which is limited on my EEE.)
> 
> I've had problems in the past with pristine-tar, but that was probably due to
> my lack of "experience" with git. (and no, don't ask me what, I don't really
> remember which problems I had :( )

When using "git-import-orig --pristine-tar" one should be fine in most
cases. It might be confusing that the content is stored in a branch that
is decoupled from the rest of the repo. Git allows that, SVN doesn't,
and this was quite confusing for me as well when I switched.

> About "not having the tarball", that's what "get-orig-source" targets are for.

... if a) this optional target exists and b) you have an internet
connection and c) you just want to get the latest version. If have an
upstream that removes old versions when providing new ones. No chance to
ever get them again. (And mailing upstream about old versions is not
very comfortable, really.)

> Also, if you keep the upstream code in the same repository, how can you
> "checkout" different versions?

Yes, of course. And pristine-tar enables you to recreate a tarball from
the upstream sources that is bit-identical to the one that is provided
upstream.

> Using different tags?

Yes, tags are the way to mark certain points in history, like the import
of new upstream sources or a Debian upload.

> I admit that, if everything is done via tags and diffs between
> revisions, you could save lot of space.

That's how it works. (On most setups, disk space is not so much often an
issue; but on others, it really is. And one is quite thankful then that
one can work nevertheless.)

> But I can't really see why one should work on different versions, at
> least more than two -- so the "disk space cost" argument isn't really a point,
> IMHO)

Well, consider that a year after the release of Lenny, you have to fix a
security issue in your package, but upstream has released several new
version which are already in unstable. No big deal though, because you
can branch from the version in stable and merge the fix into that
branch, build and upload. In case you lost the upstream tarball, you can
simply recreate it. (Yes, well, you can also fetch it from the archive.
You got me there. Anyway, still need to have your mirror around or an
internet connection.) When packaging diverges, as in backports or
security fixes, I think Git can deal with this situations better. I did
not have this situation, so this is just speculation. Dealing with
diverging branches is what DVCS were developed for.

> >       * Offline commits. This is my personal favorite since I'm quite
> >         often lacking an internet connection. Commiting regularly is
> >         IMHO important, especially if a revert is needed. (I used SVK
> >         for years but was never happy with it; merge conflicts were just
> >         a pain back then with SVN. Heared the new SVN version fixes it
> >         somewhat. But SVK was an improvement in that point.)
> 
> This is a big plus for git -- I hate having to connect to commit my changes.
> But, once again, how are merges handled? And if two developers change the same
> file in more-or-less the same point? Is git smart enough to handle this?

No. And that's the beauty of it. ;)

> >       * Git is stupid. It means that it does not try to guess anything
> >         if can't decide what to do. The user has to be explicit in what
> >         he wants to do; you (almost) never get any "smart" behavior from
> >         Git. There is no unexpected behavior since you request Git to do
> >         what it should do. (I do not like unexpected behavior due to
> >         "smart" software. That's probably more a personal point.)
> 
> Err... I can't get it :)
> I once read that "git is like the factory giving you the pieces and the
> instructions sheet, and letting you build your own airplane", while "svn is a
> full featured airline that is more-or-less good for all pilots".

I would not put it this way. To me it's more like a full-featured
airline that allows pilots to add their own favorite aircraft and flight
plan. Or something like that.

What I wanted to say in my last mail was: If Git can't resolve a merge
conflict, it does not even try to do so. It just tells you about it and
leaves you with the pieces. You then have a look at it with your
favorite merge tool (or whatever you prefer to use), expects you to
resolve the conflict and tell it that it can happily go on merging. It
may sound like work but it really is the best way to handle it, since in
most cases brains are superior to algorithms. Git expects the brains to
resolve conflicts since developers are expected to know what is the best
way to handle those.

> >       * Import and export of patches from and to email. This is just
> >         great: Take a diff and export it to an email, add a comment and
> >         send it. Other Git users can simply apply it to their repo from
> >         their mail client. (Or save the mail to a file and apply that
> >         with the Git tools.) Once used to it, it's very handy and makes
> >         exchanging patches really easy.
> 
> This is fine, but it seems like we're having this functionality with "svn
> diff". Just a matter of attaching/including that output to a mail ;)

Well, sure. But it's more complicated. I do not see it as *the* feature
of Git but it is useful. If you can save 2 minutes sending such a patch
and you have 10 patches, just do the maths. ;)

> > As an example: In maintaining Open MPI, we have currently several issues
> > to solve. With different branches, I can address all of them seperately
> > which I do find the time. Hacking at all in the trunk clutters history a
> > lot and causes confusion.
> 
> Fine. And (repeating my previous question) what if two different bugs involve
> the same code at, let's say, the same function, and you solve them in two
> different incompatible ways? The various branches will work separately, but
> once merged, you'd get unexpected behaviour

No, you won't, because it will lead to a merge conflict then. You need
to resolve this, so it's totally expected behavior, since you where the
one dealing with it. This is like flattening quilt patches: You'd get a
conflict, you do some editing, and refresh the patch. Same thing here.
Everything that does some auto-merging or auto-conflict-resolution in
this is case is IMHO broken by design. As long as I have full control
over how to handle problems, I'd call this "expected behavior". Merge
resolutions are also commits in Git, so they are documented in the
history log.

One other point I forgot, because it's too much of a habit now: Every
Git repo is a full repository. It can easily be backuped. In fact, if
you have your repo online, you do have a backup since you can just clone
it. Or clone the repository of a fellow developer. They're all alike.
(Ignoring local changes.) This is surely a real nice thing: If you
destroy your repo, just get it back from somewhere.

I hope I could shed some light into why I like Git. If you have
questions, please ask. And I do not want to get everyone to use Git.
Everyone should use what they feel comfortable with. For me, switching
saved time and headaches. I can't warrant that for everyone, though. ;)

Best regards
Manuel

Attachment: signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Reply to: