[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: d-i git repo: sample conversion



Let me start with a request to other team members: PLEASE join this 
discussion. Think about how *you* use the repository; consider whether you 
are confortable with git or not; think about the issues raised both by 
Joey and me; and *give your opinion*.

I hate the fact that this looks to be turning into an "fjp against joeyh" 
discussion. Does nobody else care?

On Tuesday 22 September 2009, Joey Hess wrote:
> > Debian Installer is *one* (upstream!) project with components that are
> > sometimes tightly related, which should have a *single* (either
> > central or decentral) repository
>
> It's more complex than that. We have important d-i components that are
> stored in other VCSs (ie, linux, libc).

I strongly disagree with that. I think linux and libc (and other libraries 
and other udebs from general packages, such as fdisk, beep, etc.) are 
totally irrelevant to this discussion. Sure, we compile udebs against them 
and changes in them may affect us, but that's not other than any other 
package that compiles against a library or uses general utilities.

> Other things are uncomfortably grafted into the svn repository, despite
> having broader applications to Debian as a whole (busybox in /people,
> live-installer being shared with Debian Live, cdebconf being planned to
> be part of Debian base, os-prober being used by grub2, flash-kernel
> producing a deb as well as a udeb, etc).    

busybox and mklibs and even debootstrap probably don't really belong in the 
D-I repo at all. They were only added there because it was convenient. 
They could just as easily be moved to for example collab-maint.

For cdebconf the reason it is in the D-I repo is mostly historic: it was 
started because of D-I and has been most actively developed for D-I. It 
was great having it tied in close, but maybe the time is ripe to split it 
out of the D-I repo and let it be maintained on its own, just like debconf 
is.

> d-i is generally loosely coupled, and melts into Debian around the edges.

The components that I'm talking about and that are relevant here are those 
that provide D-I specific functionality, i.e. those that have postinsts or 
D-I specific scripts. Almost all of those live in the D-I repository.

Sure, some things are more tightly coupled than others. But for those that 
are fairly tightly coupled there is IMO a huge benefit to having them in 
the same repository. I have already given some examples, but let me 
elaborate on them a bit more.

- partman components are very tightly coupled
  Very frequently changes, such as adding ext4 support, requires making
  related changes to multiple components. Being able to do so in a single
  branch in a single repository is of huge practical benefit.
- My "partman cleanup" in 2007 would have been a pain if components had all
  been in different repositories; I did major reshuffling across components
  by rebasing in my git-svn checkout while working on that.
- For multi-CD support feature I required strongly related changes in
  di-utils, apt-setup and pkgsel.
- My recent "recommends" change required related changes in di-utils,
  base-installer, pkgsel and preseed.

> > The way D-I is managed as a project and most work on D-I does IMO not
> > require distributed SCM
>
> We're using distributed SCM every day between git-svn and people posting
> patches to this list for review. What we lack is a unified distributed
> SCM platform, and I think we're suffering because of that in ways that
> may not be obvious.

I'd like to see more concrete examples before I accept that.

> > and for the situations where either distributed
> > or offline work is desired, git-svn is IMO a more than adequate
> > solution.
> >
> > Let me also make clear that I have very little against git, except
> > that it has a *lot* steeper learning curve than subversion.
>
> git-svn is harder to learn than either svn or git.

I disagree. git-svn is very straightforward as long as you only use it to 
track trunk, which is all that should be needed for distributed or 
off-line development. IMO much could be gained by simply documenting how 
to use a git-svn checkout for distributed development.

> > In the proposal a few things besides the manual and master PO files
> > are now weirdly kept in a "master" SVN repo:
> > - scripts
> > - kernel/massbuild*
> >
> > So for example when there is a kernel update, we'll now have to do an
> > svn commit to update the massbuild.versions file, followed by separate
> > git commits to update the various kernel-versions files.
>
> Of course these could be put into some git repo, but it's not clear to
> me what such a repo should be called.

> (It didn't help that kernel/massbuild is not located in scripts or
> kernel-wedge.) 

*shrug* It is in the most logical place for ease of use. Sure, the script 
could be moved to a "tools" package and installed on developer's systems, 
but that would still leave the massbuild.versions file, which IMO should 
be under version control and very simply is common to all kernel udebs.

> > Sure, 'mr' is being proposed as "the" solution to keep it all hanging
> > together. I somewhat remember the announcement of 'mr' as a useful
> > tool to keep repos for *unrelated* projects up-to-date.
>
> Emphasis on *tool*. Tools can be used for more than one purpose.

Sure, but first splitting something up and then artificially glueing it 
together again requiring an additional tool in the process is not what I 
call logical.

> gitk in a git-svn repository must look very boring, given the small,
> flat branch structure. Anyway, something seems off about using gitk as a
> reason *not* to switch to git.

Obviously it is boring for history. It's very much not boring for things 
that are under development.

> > I've had *many* occasions
> > where a change required interdependent changes in multiple components,
> > for example in multiple partman components, or in pkgsel and di-utils,
> > or in a component and in the D-I build system.
> > If D-I is a single repository I can simply create a new branch and
> > work there. If the components are split up, I will continually need to
> > create branches, make sure that I'm in the correct branch for the
> > current component when changing directories, etc.
>
> d-i's svn repo tends to get very few branches of the whole tree, and
> those branches tend to be hard to work with. The only active such branch
> is for kfreebsd. (Which actually only currently touches 5 packages,
> although it is hard to figure this out using svn!)

Agreed. SVN is not ideal for branching, especially since it does not 
support off-line commits.

But IMO we don't have that much need for branching, other than temporary
"work in progress" branches, which in most cases are fairly short-lived and 
private. That's why IMO git-svn is a relatively suitable solution.

> Local branches in git-svn repos cannot by published to others as
> VCS-accessible branches; one easy way to accomplish a similar thing with
> git is to copy the whole d-i tree, commit to master, and never push.

Although I haven't tried that, I think that should be possible, or rather, 
why wouldn't it be possible to publish the full repository? There are 
restrictions to the use of the "master" branch that's based on SVN, but I 
don't see why there should be restrictions to development branches.

Care to elaborate on why it's not possible?

The way I see this is roughly (untested):
- create git-svn checkout (master branch)
- create development branch
- commit changes in development branch
- periodically update master branch and merge central changes in
  development branches
- when development work is ready to be committed, create new branch
  from development branch, rebase that on top of master and svn dcommit

> (Or, if you, like me, feel that keeping a local branch in
> git-svn is just asking to lose work when a disk crashes, you can push
> the resulting git repos to your ~/public_git/ on alioth.)

That's one way. I have automated backups of all .git dirs on a different 
machine on my home network.

> Git will, of course, make managing branches that only affect a simgle
> component much easier than svn does. It seems better to optimise for
> this common case than continue to notionally support the uncommon
> case of a real, published, whole-tree branch.

I disagree that working on multiple components is an uncommon case. In my 
experience it is a very common case as soon as you want to do any serious 
work.

> > And I will no longer be able to commit related changes in different
> > components as a single commit.
>
> You forgot to say why it is important to do this, FWIW.

It's not 100% required, but IMO it's logical to do so if changes are very 
strongly related. It makes it easier to see context for example if 
regressions are reported later.

> Of course, if 
> the components are not all included in d-i svn, you already can't do
> that. It's been not uncommon for me to need to make a commit to d-i and
> a commit to tasksel for a single logical change, for example.

Sure. tasksel, but also debian-cd, although neither have udebs, are 
strongly related to D-I. But you do have to draw the line somewhere 
(although it might even make sense to move tasksel into the D-I repo).

My contention remains that components which provide the actual D-I 
functionality through their postinsts and scripts belong in a single 
repository.

> > When the migration from CVS to SVN was done at least nothing was lost.
>
> (Aside from some screwed up branches and tags I noticed when looking
> at the CVS history when doing the git conversion...)

Sure, but that can be considered normal cleanup during a migration.

> > This  migration simply drops a lot of work that was done in the past
> > (such as obsoleted components)
>
> All obsoleted d-i components are present in their own git repos in the
> attic subdirectory.

I don't see any attic directory, but maybe that's because it's not included 
in the .mrconfig file?

> > > On Thursday 13 August 2009, Joey Hess wrote:
> > > > Other whole-tree tags and branches are not included. These include
> > > > sarge, etch, and lenny tags and branches, and old tags like rc1,
> > > > rc2, rc3.
> >
> > As mentioned above: why not? I'd think especially the release branches
> > are essential.
>
> Prior discussion about this point seemed to come to the consensus that,
> in cases where we do currently use the whole-release branches in svn, we
> could just as easily branch from the tag for the stable release version
> in git. Eg, to work on stable's version of anna, `git checkout -b stable
> 1.33`

I've not been allowed to see that discussion. But I can even agree with 
that, but not with losing the commit history that is currently embedded in 
those branches.

> > It also  means you do not have tags for versions uploaded to testing
> > or stable.
>
> In git we have tags for every version of every package ever uploaded.
> Unless these packages were uploaded to stable without being properly
> tagged..

Hmm. Yes, I see something now, but *only* if I run 'gitk --all'.

So the reason I missed them is because they are not properly connected. In 
gitk (with --all) they show up as commits that are unconnected to the 
history of the package and all the actual commits where the functional 
changes were made that led up to the tag _are_ lost.

For example run gitk --all for base-installer. If you search hard you will 
indeed see a dangling tag commit for e.g. 1.76etch1 (it's not even listed 
near the correct date), but you will not find the functional changes from 
SVN that led up to that release:
51774  Correctly recognize systems with 'RS64-IV (sstar)' cpu as powerpc64.
51776  Fix and improve test to also recognize systems with 'RS64-II
       (northstar)' cpus.
51777  Remove now redundant (and broken) test for sstar
51778  Fix reference to cpuinfo file

Having only the tags without the commits is IMO meaningless and not 
acceptable for a repository migration. For a proper migration *at least* 
any versions released from branches should be properly connected in the 
revision history and visible in gitk as branches, but I'd personally 
prefer to see the full branch history we have in SVN preserved.

> That cleanup has still not been done after many months (and makes using
> git-svn painful in the meantime BTW). Meanwhile, the git conversion

Yes, and there's a good reason it has not yet happened:
http://lists.debian.org/debian-boot/2009/06/msg00907.html
https://alioth.debian.org/tracker/?func=detail&atid=200001&aid=311791&group_id=1

Things do look to be better now, so I'll do the cleanup ASAP.

> can easily deal with the issue. And the result of choice #2 is identical
> to the result of doing the svn cleanup first and, in that cleanup,
> deciding to not bloat the tags with the bad da.po files.

No, the result of #2 is very much *not* the same, as I have previously 
explained.


I remain very much not impressed with this whole plan and you've failed to 
convince me on any really relevant points.

P.S. Please fix your mail handling, you're still breaking threading.


Reply to: