Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository

To: debian-boot@lists.debian.org
Subject: Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
From: Frans Pop <elendil@planet.nl>
Date: Thu, 4 Jun 2009 11:28:06 +0200
Message-id: <[🔎] 200906041128.06754.elendil@planet.nl>
In-reply-to: <[🔎] 20090604084438.GA19842@wavehammer.waldi.eu.org>
References: <[🔎] 200906032242.49605.elendil@planet.nl> <[🔎] 20090604084438.GA19842@wavehammer.waldi.eu.org>

Thanks a lot for the reply, Bastian.

On Thursday 04 June 2009, Bastian Blank wrote:
> On Wed, Jun 03, 2009 at 10:42:39PM +0200, Frans Pop wrote:
> > The way my cleanup works is that I remove all changes to the affected
> > files made between revisions 55934 and 57133 (both inclusive).
> > As a result of the cleanup the 'svnadmin dump' file shrinks by more
> > than 2GB (!) and the repository database shrinks from 2.4GB to 1.7GB.
>
> Which sizes did you compare? The d-i repo still includes plenty of

Current database versus reloaded cleaned database.

> vdelta revisions from repository format <= 3. A dump/load cycle should
> reduce the size anyway.

Ah, that is possible. The other advantages remain though.

> Working copies with references to this revisions gets invalidated.

Hmm, yes that could be. Did not consider that.
But what risk is there that there _are_ (m)any working copies that 
reference those revisions? The last commit I change was 08-01-2009, so 
most users should have 'svn updated' by now.

Hmm. I guess some translators who worked on their translations in that 
period and haven't been active since could have such a checkout. 

OK. I'll test that and if it is a problem we'll have to warn about it.
I don't think it's a huge problem if such users would have to do a new 
checkout.

> > Because of the way tagging in subversion works, it is not possible to
> > do the cleanup and still keep the tagged versions exactly as they
> > were uploaded (see below for affected package versions).
>
> Please explain. A tag is just a copy, which can also include
> modifications.

A tag is a copy, but the files are not actually copied. So if I change the 
file in trunk in a revision before the tag, the tagged version of the 
file will automatically change as well.

> > Essentially: not.
>
> This is incorrect. The effects are outlined in the Subversion FAQ and
> references materials[1].

There does not seem anything there other than what we've already covered. 
We don't lose any revisions and all revisions + the state of HEAD remain 
completely identical to the current database.

> > If we are agreed, I will pick a day to do the actual cleanup. During
> > part of that day the repository will be blocked for commits.
>
> There is not need to block anything. You can only change intermediate
> revisions, so the top is not affected.

I don't see how I could manipulate intermediate revs without rebuilding 
the database from the bottom up. What exact procedure are you referring 
to?

Blocking the repo for a few hours shouldn't be a major inconvenience 
anyway. It's not like we have a high commit rate ATM.

> > BEGIN {
> > 	clean = 0
> > 	infile = 0
> > }
>
> [...]
>
> I think you want svndumpfilter.

I read about that, but I don't think it does what we need here: it only 
filters paths, not specific commits [1]. Anyway, my awk script is already 
there and I've tested that it does exactly what I want it to do.
My cleaned dump file loads without any problems and I've done fairly 
extensive checks with svnlook that the database is as it should be after 
the load.

Despite the warnings, the dumpfile format is relatively straightforward 
(and I did not use --incremental for my dump on purpose).

Thanks again,
FJP

[1] Hmm. Guess it could maybe be used, but I'd need to create a dumpfile 
for exactly the range to be cleaned and it would need to be run 
separately for each file to be excluded.

Reply to:

Follow-Ups:
- Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
  - From: Frans Pop <elendil@planet.nl>
- Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
  - From: Bastian Blank <waldi@debian.org>

References:
- [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
  - From: Frans Pop <elendil@planet.nl>
- Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
  - From: Bastian Blank <waldi@debian.org>

Prev by Date: Re: Which kernels to include on ISOs? (Was: Re: Netboot Xen images for amd64)
Next by Date: Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
Previous by thread: Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
Next by thread: Re: [RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository
Index(es):
- Date
- Thread