[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Mass svn-to-git migration - Progress report



Hi,

On Friday, 27 January 2023 20:14:43 CEST Jelmer Vernooij wrote:
> I've been looking at how to do a mass conversion. There's about 375 packages
> still listed as being on alioth (~100 in SVN, ~267 in Git, the rest in
> something else).
> https://janitor.debian.net/cupboard/result-codes/hosted-on-alioth?campaign=u
> nchanged&include_transient=off&include_historical=off

That page now returns 747 packages?

> What still needs to happen is:
> 
>  * The mapping still needs to be tied together with the import script, to
>    generate correct URLs to push to and set the Vcs-* headers
>    appropriately
> 
>    I'm not sure what to do with packages whether the owning user or team
>    is not on salsa. Add them to the "debian" group?
> 
>  * The import script supports just git right now, not svn. There's ~8
>    repositories in a VCS other than SVN or Git, which we could just
>    migrate manually.

After sufficient procrastination I started working on this again ;-)

The idea I had/have is to do the conversion and place them in a separate 
namespace/group, ready to be picked up by a prospective maintainer.

I've now created a group for that on Salsa under which the converted repos can 
be stored: https://salsa.debian.org/groups/alioth-to-salsa-migration-team (*)

That way a prospective maintainer can use that as *a* source, but can also use 
other sources like f.e. "gbp import-dscs" to create/rewrite a/the proper (git) 
history to their liking for the to be adopted package before it gets placed in 
the 'normal' Salsa structure.

There's also a practical reason (for me) as f.e. the id3lib repo is stored 
under `collab-maint/deb-maint/id3lib` and having a group on Salsa allows me to 
create subgroups and subsubgroups under which to store the git repo(s).

I'm open to suggestions how to structure the converted repos and possible 
(git) repos created to support this mass migration.
I have attached the document I've written thus far wrt Subversion, but that 
really needs to be put under version control and possibly/likely split up (and 
linked from a README.md document?).

*) I've also added/invited Jelmer to that group as Owner, possibly for 
practical reasons, certainly for the bus-factor reason.

On Saturday, 4 February 2023 00:52:23 CEST Diederik de Haas wrote:
> If I want to trim down the result list ASAP so I could focus on the ones who
> actually do need a conversion, should/could I use that page or would it be
> better if I keep a local document (for now), where I'd remove the false
> positives. (Fixing things properly will be for another time)

I searched/filtered the link Jelmer shared above for 'svn' and that resulted in 
119 results and subsequently I downloaded the archives for all of them.
Then I noticed that 4 archives were empty and I wanted to write that down.
In the above quoted part I had identified another reason why I wanted to add 
'notes' to various repos.
So I figured I'd better create the local document, so I saved "Jelmer's" page, 
only to realize that I got all 747 of them. I cleaned that up and added some 
columns with URLs which I otherwise would have constructed manually for 
further research.

They're now in an LibreOffice Calc document (attached), but while it was useful 
for the initial construction, I doubt it would be the right way/format going 
forward. It would be better to put that also under version control and I 
assume that git would see it as a binary document, which means the 'diffs' will 
be rather useless.
I could convert it to Markdown which would allow me to make the URLs actually 
clickable (and shorter) and would work great under git.
OTOH the LO document/spreadsheet allows me to easily sort/filter/etc which I 
_think_ doesn't work with Markdown?

I'm looking for ideas/tips/etc how to best deal with this.

Cheers,
  Diederik
# Migration of Alioth's Subversion repos to Git

## Intro

I was looking into [adopting](https://bugs.debian.org/770255) the ``id3lib`` package ([tracker: id3lib3.8.3](https://tracker.debian.org/pkg/id3lib3.8.3) (current), [tracker: id3lib](https://tracker.debian.org/pkg/id3lib) (old), [id3lib snapshot.d.o](https://snapshot.debian.org/package/id3lib/), [id3lib3.8.3 snapshot.d.o](https://snapshot.debian.org/package/id3lib3.8.3/)) and saw that the *VCS* line lists *Subversion* ... and points to <http://anonscm.debian.org/viewvc/collab-maint/deb-maint/id3lib>. Oeps.  
I clicked on the link even though I was pretty sure of the outcome: a ``404``.  

Via ``#debian-mentors`` I learned of <https://alioth-archive.debian.org/svn/> which contains archives of all (?) the old Subversion repos. For ``id3lib`` that meant I needed the ``collab-maint.tar.xz`` archive ... which was 866MB in size. This also meant I had to learn Subversion (again, as I had my own Subversion repo ... 15 (?) years ago), set up a repo and then learn how to convert that to ``git`` as I like git and have no intention to use Subversion. All that before starting the actual work of adopting the package.  

Then I realized that ``id3lib`` isn't the only package pointing to a no longer existing VCS repo and that means that anyone else looking to adopt a non-migrated package would have to go through that whole process too.  
So I send an email to the *debian-qa* ML: <https://lists.debian.org/debian-qa/2023/01/msg00031.html>.  
TL;DR: Can't we do a mass *svn-to-git* migration of all the packages which aren't yet on *Salsa*, so we/I only have to suffer once and then we can all forget about it and just use salsa/git. And let's do that before every guide/tutorial on the internet about such migrations return ``404`` too.  
And while I started with it, I figured I could just as well document it.

## Subversion

I'll describe the major things I did and found out about *Subversion* which were mostly needed for the migration. But if you want to learn (much) more, I can heartly recommend the *Version Control with Subversion* [VCwS] book. Via the [website](https://svnbook.red-bean.com/) you can view/download the single and/or multi-page HTML version, a PDF version or the [DocBook sources](https://sourceforge.net/p/svnbook/source/HEAD/tree/branches/1.7/en/book/) which are licensed under [CC BY 2.0](https://creativecommons.org/licenses/by/2.0/). From the sources you can also build those targets and also an epub version (patch for python 3 compatibility is [send to the ML](http://www.red-bean.com/mailman/listinfo/svnbook-dev) ;-P).

To start off I extracted the ``collab-maint.tar.xz`` archive, which gave me a ``collab-maint`` directory. I recalled having used ``kdesvn`` before and that is still maintained, so I installed that. After starting ``kdesvn`` you can then *open* a repository by just pointing to the location on the filesystem. And that worked. And then I could *browse* around the repository, view commit history, etc; just like that.

### Subversion installation and configuration

Subversion supports 2 backends: ``BDB`` (Berkely DB) and ``FSFS`` which is a (virtual) filesystem type storage. The latter is recommended and is also what the ``collab-maint`` repo is using. I think it's safe to assume that the other SVN archives will use that too, which is great.  
While you can look at the repo with the standard filesystem tools, for actually interacting with it, it is (strongly) recommended not to.  
The various configuration files *can* be modified with your favorite text editor.

While trying to figure out how to do the migration I was using 2 'systems':

1. Local repository (directory) on my laptop for exploring/experimentation
2. Remote repository on a 'server' (Rock64 SBC, headless, named ``cs21``) in my LAN for the *real* thing

To interact with and administer a local repository, use the ``svnadmin`` program which is packaged in the ``subversion`` package. This required no further configuration.  
On my server I had already installed ``apache2`` so I only needed to install the ``libapache2-mod-svn`` package and configured it as follows:

```sh
$ grep -v "#" /etc/apache2/mods-enabled/dav_svn.conf | grep .
<Location /svn>
  DAV svn
  SVNParentPath /srv/data/svn
  SVNListParentPath on
  <LimitExcept GET PROPFIND OPTIONS REPORT>
    Require valid-user
  </LimitExcept>
</Location>
```

That configuration file is very well documented and all I had to do was enabling commented out settings. There was only 1 setting not documented and I needed that to make it actually work, or [I'd get a 503 error](https://bugs.debian.org/1031229) and that was ``SVNListParentPath on``.  
Both my laptop and 'server' are running Bookworm, so both packages had version ``1.14.2-4+b2`` (they're build from the same source).

**Note**: Going to <http://cs21/svn> with a web browser works, but not with ``kdesvn``. But <http://cs21/svn/collab-maint> does work with ``kdesvn``.

### Learning about the SVN repo

Chapter 5 of VCwS is about *Repository Administration* and it begins with describing things to consider when setting up a new SVN repo. Most of it is not relevant for our situation as we already have a repo. In the *Creating and Configuring Your Repository* section the ``svnadmin`` and ``svnlook`` utilities are introduced. These are considered *server-side* utilities and expect a local path to the repo dir as an argument. They don't work across a network and you can't use an URL (including ``file://`` type URLs) with them.  
There are several more utilities (introduced), but these are the most interesting for us.  

The ``svnlook`` util seems to be about *viewing* details of transactions (``-t``) and revisions/commits (``-r``).  
The ``svnadmin`` util deals with repositories as a 'whole'.

Help about the various commands can be requested as follows: ``<command> help`` which (usually) also gives instructions how to get more help.

Let's run a couple of commands to get a feel for it (... = omitted output for brevity):

```sh
me@laptop:~/svn$ svnadmin help
general usage: svnadmin SUBCOMMAND REPOS_PATH  [ARGS & OPTIONS ...]
Subversion repository administration tool.
Type 'svnadmin help <subcommand>' for help on a specific subcommand.
Type 'svnadmin --version' to see the program version and FS modules.

Available subcommands:
   build-repcache
   ...
   verify
me@laptop:~/svn$ svnadmin --version
svnadmin, version 1.14.2 (r1899510)
   compiled Jan 31 2023, 16:48:28 on x86_64-pc-linux-gnu
   ...
```

This shows the help syntax and tells us we're using version 1.14.2 of Subversion.

```sh
me@laptop:~/svn$ svnadmin info collab-maint/
Path: collab-maint
UUID: 19660600-52fe-0310-9875-adc0d7a7b53c
Revisions: 27545
Repository Format: 3
Compatible With Version: 1.1.0
Filesystem Type: fsfs
Filesystem Format: 1
FSFS Sharded: no
FSFS Logical Addressing: no
Configuration File: collab-maint/db/fsfs.conf
me@laptop:~/svn$ svnlook info collab-maint/ -r 0

2005-08-14 22:57:51 +0200 (zo, 14 aug 2005)
0
```

This tells us that the repo was created on 2005-08-14 and uses the ``fsfs`` backend.  
It tells us more, but more on that later.

```sh
me@laptop:~/svn$ svnadmin help lslocks
lslocks: usage: svnadmin lslocks REPOS_PATH [PATH-IN-REPOS]

Print descriptions of all locks on or under PATH-IN-REPOS (which,
if not provided, is the root of the repository).
me@laptop:~/svn$ svnadmin lslocks collab-maint/
```

So there are no locks on our repo.

```sh
me@laptop:~/svn$ svnadmin help lstxns
lstxns: usage: svnadmin lstxns REPOS_PATH

Print the names of uncommitted transactions. ...
Transactions with base revisions much older than HEAD are likely
to have been abandoned and are candidates to be removed.
...
me@laptop:~/svn$ svnadmin lstxns collab-maint/
15965-1
409-1
7025-1
773-1
6672-1
```

But it turns out there are uncommitted transactions ...

```sh
me@laptop:~/svn$ svnlook info -t 409-1
Repository argument required
Type 'svnlook help' for usage.
me@laptop:~/svn$ svnlook info -t 409-1 collab-maint/
csmall
2006-06-02 14:40:52 +0200 (vr, 02 jun 2006)
49
[svn-inject] Installing original source of procps
me@laptop:~/svn$ svnlook info -r 410 collab-maint/
csmall
2006-06-02 14:41:29 +0200 (vr, 02 jun 2006)
49
[svn-inject] Installing original source of procps
```

... and here we can conclude that tx 409-1 can safely be deleted (and that SVN's sequential numbering can be useful).  
The list isn't that large and we could inspect them all and determine what to do with each of them. Practically speaking they're of no use anymore and should just be deleted, which can be done with ``svnadmin rmtxns collab-maint/ $(svnadmin lstxns collab-maint/)``

This worked on my laptop where the collab-maint archive was extracted with ``Ark``, but likely due to it being in my *home* directory.  
When I tried the same thing on my server, where I had extracted the archive on the command line under ``/srv/svn/``, I got a *Permission denied* error. The reason was that the files/directories under ``collab-maint`` were owned by numeric user and group ``3519``.  
After I created a (system) user+group with that ``uid/gid`` and performed the command as that user, it succeeded.

### Upgrading the SVN repo

So I had already encountered some issues, with only a cursory look/experimentation. Minor, but still.  
Earlier I mentioned that the output of ``svnadmin info <repo>`` told us more then mentioned earlier and that is the following:

1. It doesn't use *sharding*
2. It doesn't use *logical addressing*
3. The *shards* aren't *packed* (naturally as sharding isn't used)

Subversion 1.5 added *sharding* (spread the data/files over several directories instead of just 2) and 1.6 added the option to *pack* shards (similar to ``git gc --aggressive``?), but our repo has/uses neither. This means that there is both a performance penalty as well as that more disk space is being used then need when those features *are* used.

There is a ``svnadmin upgrade <repo>`` command, but IIUC that just enables new features, but doesn't *convert* the existing repo to (optimally) use those new features.  
I didn't look further into that as there is a far better process. It's (significantly?) lenghtier, but as *downtime* is not relevant, I went with that.

The better process consists of the following steps:

1. ``svnadmin dump <repo>`` which exports the repository to a *dumpfile* portable format
2. ``svnadmin create <repo>`` which creates a new repository using settings/default from the current version, which is 1.14.2 in our case
3. ``svnadmin load <repo>`` which will read a 'dumpfile' and creating all the commits 'described' in the *dumpfile*
4. ``svnadmin pack <repo>`` which will *pack* the shards of the repository

The Subversion maintainers promised that the *dumpfile* portable format wouldn't change within releases from the same major version. So it ought to stay the same between all 1.x versions.  
An *educated* guess is that the *alioth* SVN repositories were created with Subversion version 1.2 (released on 2005-05-21) and when they were decommisioned in 2018 they were likely running a much newer version of Subversion. Judging by the output of ``svnadmin info <repo>``, there likely has never been made a proper/full upgrade of the repository itself. But due to the portable *dumpfile*, that shouldn't matter.  
Let's show in code how I did it:

```sh
me@laptop:~/svn$ svnadmin dump collab-maint/ > ../collab-maint.svndump
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
...
* Dumped revision 27543.
* Dumped revision 27544.
* Dumped revision 27545.
me@laptop:~/svn$ rm -rf collab-maint/
me@laptop:~/svn$ svnadmin create collab-maint/
me@laptop:~/svn$ svnadmin load collab-maint/ < ../collab-maint.svndump
<<< Started new transaction, based on original revision 1
     * editing path : lib ... done.
     * editing path : lib/archive2svn.pl ... done.

------- Committed revision 1 >>>

<<< Started new transaction, based on original revision 2
     * editing path : deb-maint ... done.
     * editing path : ext-maint ... done.
     * editing path : lib/archive2svn.pl ... done.
     * editing path : orphaned ... done.

------- Committed revision 2 >>>

<<< Started new transaction, based on original revision 3
     * editing path : orphaned/dvidvi ... done.
     * editing path : orphaned/dvidvi/branches ... done.
     * editing path : orphaned/dvidvi/branches/upstream ... done.
     * editing path : orphaned/dvidvi/branches/upstream/current ... done.
     * editing path : orphaned/dvidvi/branches/upstream/current/a5bookle.hlp ... done.
     * editing path : orphaned/dvidvi/branches/upstream/current/a5test.tex ... done.
     * editing path : orphaned/dvidvi/branches/upstream/current/dvidvi.1 ... done.
     * editing path : orphaned/dvidvi/branches/upstream/current/dvidvi.c ... done.
     * editing path : orphaned/dvidvi/tags ... done.

------- Committed revision 3 >>>
...
me@laptop:~/svn$ svnadmin pack collab-maint/
<yet to capture output of this command>
```

There were no errors or warnings when doing the ``svnadmin load`` so that looks good. If there are many commits, the log output is very long. If you only want to see errors, you can use the ``--quiet`` parameter (for both ``dump`` and ``load``).

Let's look at the *info* after these operations to see the result:

```sh
me@laptop:~/svn$ svnadmin info collab-maint/
Path: collab-maint
UUID: 19660600-52fe-0310-9875-adc0d7a7b53c
Revisions: 27545
Repository Format: 5
Compatible With Version: 1.10.0
Repository Capability: mergeinfo
Filesystem Type: fsfs
Filesystem Format: 8
FSFS Sharded: yes
FSFS Shard Size: 1000
FSFS Shards Packed: 27/27
FSFS Logical Addressing: yes
Configuration File: collab-maint/db/fsfs.conf
```

The filesystem 'statistics' of our repo were these:  
``1.9 GiB (2,001,792,956), 62,525 files, 22 sub-folders``

After doing the ``dump`` and ``load`` and ``pack`` operations, there are like this:  
``1.1 GiB (1,182,990,091), 1,511 files, 65 sub-folders``

The size of the repo shrank significantly, there are a lot less files so that means less IO and they're also spread over more folders.  
The only possible 'downside' is that it now requires Subversion version 1.10.0. But as Debian Stretch (currently old-old-stable) has version ``1.10.4-1+deb10u3``, that doesn't look like an actual issue.

Attachment: janitor-hosted-on-alioth.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: