[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Sample DPMT SVN-GIT migration



Many months ago I said "it should be pretty easy" to just migrate
everything from SVN to GIT, with high fidelity. I may have been a bit
optimistic :P But I also just procrastinated a lot...

Here's where I currently am (a migration of r32486):
https://anonscm.debian.org/cgit/python-modules/svn-migration/
Here are the scripts:
https://anonscm.debian.org/cgit/users/stefanor/dpmt-migration.git/

The route I took was:
1. Migrate svn -> git, but only the master branch. Migrating all
   branches gets really complex, because of the magic "upstream" branch,
   that comes from bad imports without -o.
   I really tried to do this, but it was getting uglier and uglier, so I
   gave up.
2. Clean up the tags in the converted repository. svn-buildpackage
   tagging copies the current working dir to the tag. If you haven't
   "svn upped" before tagging, the tag could be a mix of files from
   multiple revisions. This made the tags very octopussy, in git.
3. Import upload history, from snapshot.debian.org and Launchpad's
   archive of Debian upload history.
4. Link the two histories together. By making the upload history the
   canonical git history, I didn't have to worry about the fidelity of
   the svn->git conversion. Every tag will be what was uploaded.
   But whenever there are matching tags between the two repos, the svn
   history will be attached as a parent of the commit.
   This means that it's mostly blameable though to the svn history, but
   each svn commit doesn't have the full source, just the /debian/ dir.


Here are the current known bugs:
* Packages and contributors added since r32347 won't be migrated
  properly. This is just because I forgot to re-run some scripts, I can
  fix those, trivially. [python-guacamole, pysimplesoap]
* Many uploads weren't tagged in SVN, and so the two histories aren't
  being well linked. We could do some retroactive tagging, before
  running the migration again, if we care.
  Some examples of this, from near the beginning:
  - alembic
  - argparse
  - backports.ssl-match-hostname
* We have packages that have had upload history diverge from svn.
  Examples:
  - aafigure
  - basemap
* There are staged, un-uploaded changes in SVN, that end up not being at
  the HEAD of the git repo. Examples:
  - audioread
  - babelfish
* Some packages have been renamed at some point in their history. I
  think I have a list of these, and the revisions where they happened.
  So I can attempt to stitch them together during the migration. But I
  also don't care too much...
* A couple of source packages aren't liked by gbp import-dsc
  [pydb_1.01-1.dsc, python-mysqldb_0.1.1-1.dsc]. I didn't investigate
  too deeply.
* Backports and SPUs aren't really well migrated, because we don't have
  branches.


Where we go from here is mostly up to team consensus. Does the migration
look vaguely sane? Do we care about the bugs?

What do we do about people who've already started using git? I guess
they've already shown that they don't care about the package's
history...

And then there's PAPT...

SR

-- 
Stefano Rivera
  http://tumbleweed.org.za/
  +1 415 683 3272


Reply to: