[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[RFC] IMPORTANT: Cleaning l10n-sync damage from D-I SVN repository



As you may remember we had a problem before the release of Lenny with the 
l10n-sync script running wild and creating an insanely large Danish PO 
file for sublevel 4.
This was eventually corrected, but the commits increasing the size of that 
da.po master file to eventually 250MB (and the same again spread out the 
da.po files for several individual packages) are still there.

These commits waste space on alioth and will also continue to cause 
problems, for example when people create a git-svn checkout [1].

Today I've looked at options to clean up the worst of the mess and I think 
I've found something that will work, but has one important consequence 
that needs to be discussed.

At the bottom of the mail a list of affected files and packages.

THE CLEANUP
===========
The way my cleanup works is that I remove all changes to the affected 
files made between revisions 55934 and 57133 (both inclusive).
As a result of the cleanup the 'svnadmin dump' file shrinks by more than 
2GB (!) and the repository database shrinks from 2.4GB to 1.7GB.

The cleanup starts _after_ the problems started, so the affected da.po 
files between the start of the problem (revision 55901) and the end of 
the cleanup are still not technically correct. However, they now remain 
only a little bit broken for the whole period instead of increasingly 
majorly broken.

As a result of the cleanup, some revisions (24 in total) become empty as 
no other files were changed in that commit, but subversion handles this 
without problems: a diff against the previous revision just shows empty. 
I'll modify the revision comment to explain this. I'll also modify the 
comments for revisions that caused the problem and the (now very small) 
cleanup commits to explain the issue.

The cleanup procedure is described below.

THE PROBLEM
===========
The issue occurred right around the release of D-I Lenny RC1. The Lenny 
branch was created in the middle of the period and all the affected 
packages were uploaded: first because of changes or an l10n upload series 
and later after the errors in the Danish translation were corrected in 
the Lenny branch.

Because of the way tagging in subversion works, it is not possible to do 
the cleanup and still keep the tagged versions exactly as they were 
uploaded (see below for affected package versions).
However, IMO the "damage" is acceptable, for the following reasons:
1) My cleanup stops _before_ the correction of the Danish translations
   in the Lenny branch by Christian. This means that the tags for the
   versions uploaded as a result of that, and also all versions released
   with Lenny, are 100% identical to what was uploaded.
2) For affected releases before that, tThe only file that is "incorrect"
   is the da.po file, the tagged version is still 100% correct for all
   other files in the packages.
3) The relevant versions are now no longer available anywhere [2]: they
   are no longer in the archive and we don't have a snapshot.d.n for that
   period.

HOW DOES IT AFFECT USERS
========================
Essentially: not.

During the cleanup the repository will be locked for commits. Users would 
be advised not to try to do an svn up: it should do no harm except 
possibly for the short time I'll be moving the cleaned repo in place.

There is one minor effect for git-svn users who have the affected period 
in their history: their local git repository will no longer match the the 
SVN repository. But in practice that can do absolutely no harm.

WHAT NOW?
=========
The main question is if people agree with me that this cleanup is a good 
thing and that the problem described is not serious enough to block it.
So: comments welcome!

If we are agreed, I will pick a day to do the actual cleanup. During part 
of that day the repository will be blocked for commits.

Cheers,
FJP

[1] Phil Hands' git-svn checkout got buggered as a result of this.
[2] Not completely true: D-I Lenny RC1 images are still on the mirrors,
    but they will also disappear [3].
[3] BTW, looks like there are a number of old D-I releases in unstable
    that could be cleaned up. FTP masters will appreciate it.


Affected files/packages
-----------------------
po/sublevel4/da.po

cdebconf/debian/po/da.po
nobootloader/debian/po/da.po
flash-kernel/debian/po/da.po

partman/partman-prep/debian/po/da.po
partman/partman-newworld/debian/po/da.po
partman/partman-target/debian/po/da.po
partman/partman-palo/debian/po/da.po
partman/partman-ext2r0/debian/po/da.po
partman/partman-efi/debian/po/da.po

arch/sparc/silo-installer/debian/po/da.po
arch/powerpc/prep-installer/debian/po/da.po
arch/powerpc/quik-installer/debian/po/da.po
arch/powerpc/yaboot-installer/debian/po/da.po
arch/mips/sibyl-installer/debian/po/da.po
arch/mips/arcboot-installer/debian/po/da.po
arch/m68k/vmelilo-installer/debian/po/da.po

Package versions that will have tags not 100% equal to upload
-------------------------------------------------------------
r55973 cdebconf 0.136
r55975 partman-efi 0.11
r56062 nobootloader 1.23
r56074 partman-target 58
r56090 flash-kernel 2.11
r56092 silo-installer 1.15
r56094 partman-ext2r0 1.17
r56157 sibyl-installer 1.14
r56160 arcboot-installer 1.11
r56399 quik-installer 0.0.21
r56402 prep-installer 0.8
r56404 yaboot-installer 1.1.14
r56406 partman-prep 16
r56408 partman-newworld 20
r56411 cdebconf 0.137
r56825 cdebconf 0.138

Description of the cleanup procedure
------------------------------------
# backup $repo
$ svnadmin dump -r 55900:57350 $repo >d-i_svn.dump
# smart use of grep, head and tail to remove initial revision 55900
# use awk script (attached) to remove broken changes to da.po files
# add back dump file header
=> result: d-i_svn.dump.cleaned

$ svnadmin create $repo.new
$ svnadmin dump -r 0:55900 $repo | svnadmin load $repo.new
$ svnadmin load $repo.new <d-i_svn.dump.cleaned
$ svnadmin dump --incremental -r 57351:HEAD $repo | \
     svnadmin load $repo.new
# copy configuration files, hook scripts, etc.
# move $repo.new to $repo

That, that, that's all folks!

BEGIN {
	clean = 0
	infile = 0
}

/^Revision-number: 55934/ {
	clean = 1
}
/^Revision-number: 57134/ {
	clean = 0
}

/^(Revision-number|Node-path):/ {
	infile = 0
}

/^Node-path: trunk.*\/(po\/sublevel4|cdebconf|nobootloader|flash-kernel|partman-(prep|newworld|target|ext2r0|efi|palo)|(silo|prep|quik|yaboot|sibyl|arcboot|vmelilo)-installer)\/.*da\.po/ {
	infile = 1
}

/.*/ {
	if (clean == 0 || infile == 0) {
		print $0
	}
}

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: