[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Splitting D-I translation in "sublevels": ready infrastructure



On Sunday 13 January 2008, Christian Perrier wrote:
> Quoting Frans Pop (elendil@planet.nl):
> > For now I'm mostly interested in discussion of the issues I mention in
> > the comments in the big new section (which basically replaces the old
> > section that follows it).
>
> > +		# We need the date of the last update of a sublevel PO file
>
> Yes.
>
> > +		# Preferably we should also determine the name of the person who
> > +		# did the last update to a sublevel (for changelogs)
>
> That would certainly be better. I fear it could complicated the code
> quite a lot and, indeed, the translation is mostly a team work. I
> personnally don't give much importance to Last-Translator.
>
> So, well, if we find a *not too complicated* way to allow for
> different last-translator, why not. But I don't think it's worth a
> great effort.

With some generalized functions it wasn't "too complicated" :-)
And I do think it's important to at least /try/ and get the correct 
translator into our changelogs.

Are there gettext alternatives to the po_print_header() and po_print_body() 
functions? If there are, I think that would be preferred, but note that my 
functions select/remove both leading comments and all headers.

> > +		# When updating a sublevel PO file, we should really retain
> > +		# all the old headers and only update the POT-Creation-Date...
>
> Yes, definitely. There may be specific comments, or whatever

Done.

> > +				# Do we really want to loose obsolete strings?
> > +				# Shouldn't that be up to the translator?
> > +				msgattrib --width=79 --no-obsolete sublevel${i}/${lang}.po.new
> > >sublevel${i}/${lang}.po
>
> I should have put a comment when I added this. I know there was a
> reason..:-|

Some fancy footwork was needed, but it looks like I've got a working 
implementation for this. All obsolete strings are not gathered in the 
sublevel1 PO file.


Attached a new version of the patch. I think this solves all issues I 
spotted with the original implementation.
The current patch still also supports the current system. For the final 
version I would suggest removing that (in practice: remove the "old" Phase 
III code). I'd also suggest to remove the --split option and instead just 
hardcode the number of levels in a variable in the script.

I've done a fair amount of testing and the results looks good to me.
I would suggest delaying implementation of the patch until after the Beta1 
release, but it would be great if you could test this a bit too.

The way I have tested this is:
$ cd <d-i dir>
# Make sure there are no pending changes!
$ for i in 1 2 3 4 5; do mkdir packages/po/sublevel$i; done
# Prepare for conversion (repeat the following 3 commands to revert
# to the initial state):
$ svn revert -R packages/
$ cp packages/po/*.po packages/po/sublevel1/
$ rm -f packages/po/sublevel[2345]/*
# Initial conversion run:
$ <path>/l10n-sync --noupdatepo --force --split=5 --convert `pwd`
# Do translation updates etc, then do a "normal" run:
$ <path>/l10n-sync --noupdatepo --force --split=5 `pwd`
# Clean up after testing:
$ svn revert -R packages/
$ rm -f packages/po/sublevel*/*

Cheers,
FJP

commit 8b85f434878442c8307b8c5bb14baf7105b5393e
Author: Frans Pop <fjp@debian.org>
Date:   Sat Jan 12 00:53:59 2008 +0100

    Improve multi-level handling
    
    Main change is an improved method for updating from sublevel PO files.
    Characteristics of the new method:
    - preserve the headers in sublevel PO files (only POT-Creation-Date
      is updated)
    - the PO-Revision-Date and Last-Translator for the most recently updated
      sublevel PO file are used to update PO files for individual packages
    - if the level of a string changes, the existing translation from the
      old level is preserved
    - obsolete strings are preserved in the sublevel1 PO file in case strings
      are reintroduced later (translators should remove them occasionally)
    
    A temporary '--convert' option has been added to facilitate conversion
    from the current master PO files to multi-level PO files.
    
    Other changes
    - Introduce new functions to determine header values for most
      recent updated PO or POT file.
    - Remove custom PO file headers (X-*) from merged PO files
      before updating translations in packages directories to avoid
      cluttering them.

diff --git a/scripts/l10n/l10n-sync b/scripts/l10n/l10n-sync
index 9601588..0e573f0 100755
--- a/scripts/l10n/l10n-sync
+++ b/scripts/l10n/l10n-sync
@@ -18,6 +18,7 @@ NUMLEVELS=1
 UPDATEPO=Y
 SYNCPKGS=Y
 QUIET=N
+CONVERT=N
 svn=svn
 debconfupdatepo=debconf-updatepo
 
@@ -145,6 +146,73 @@ criticalerr() {
 	exit 3
 }
 
+
+po_last_updated() {
+	local key files file lastfile lastdate tdate
+	key=$1
+	shift
+	files="$*"
+
+	lastdate=0
+	for file in $files ; do
+		tdate=$(date -d "$(grep "^\"$key:" $file | \
+			sed 's/^.*: \(.*\)\\n.*$/\1/')" "+%s")
+		if [ $tdate -gt $lastdate ] ; then
+			lastdate=$tdate
+			lastfile=$file
+		fi
+	done
+	echo "$lastfile"
+}
+
+# Get the whole line
+# The --no-wrap is needed because translator can span more than one line
+# The last sed statement is needed to preserve the \n at the end
+po_get_header() {
+	local key=$1
+	local file=$2
+	msgattrib --no-wrap $file | grep "^\"$key:" | sed 's/^.*: \(.*\)\\n.*$/\1/'
+}
+
+# Replace a header with a new value
+# The complex sed expression is to allow for the fact that a header may span
+# two lines; spanning three lines is not supported
+po_replace_header() {
+	local key=$1
+	local value=$2
+	local file=$3
+	sed -i "/^\"$key:/ N; s/^\"$key.*\\\\n\"\(\n.*\|$\)/\"$key: $value\\\\n\"\1/" \
+		$file
+}
+
+# Print anything up to the first msgid (the header)
+po_print_header() {
+	awk 'BEGIN {found = 0}
+	     /^msgid ""/ {found = 1}
+	     /^$/ {if (found == 1) exit}
+	     {print $0}' $1
+}
+
+# Print anything after the first msgid (the header)
+po_print_body() {
+	awk 'BEGIN {found = 0}
+	     /^msgid ""/ {if (found == 0) found = 1}
+	     /^$/ {if (found == 1) found = 2}
+	     {if (found == 2) print $0}' $1
+}
+
+# Print obsolete strings
+po_print_obsolete() {
+#	# Old "manual" version
+#	awk 'BEGIN {found = 0; lead=""}
+#	     /^#~ msgid/ {if (found == 0) {found = 1; print lead}}
+#	     {if (found == 0) lead=lead"\n"$0}
+#	     /^$/ {if (found == 0) lead=""}
+#	     {if (found == 1) print $0}' $1
+
+	msgattrib --only-obsolete --width=79 $1 | po_print_body
+}
+
 ##  Command line parsing
 MORETODO=true
 while $MORETODO ; do
@@ -189,6 +257,9 @@ while $MORETODO ; do
 	"--nolog")
 		LOG=""
 		;;
+	"--convert")
+		CONVERT=Y
+		;;
 	"--"*)
 		echo "Illegal option: $1" >&2
 		usage
@@ -398,21 +469,19 @@ log "- Merge all package templates.pot files..."
 if ! msgcat ${pots} >/dev/null 2>&1 ; then
 	svnerr
 fi
-log_cmd --pass 	msgcat ${pots} | \
+log_cmd --pass 	msgcat $pots | \
 	sed 's/charset=CHARSET/charset=UTF-8/g' >$DI_COPY/packages/po/template.pot.new
 # Determine the most recent POT-Creation-Date for individual components
 # Include master templates.pot too so the timestamp will never be set back
-LASTDATE="$(
-	for j in ${pots} po/template.pot; do
-		date -ud "$(grep "POT-Creation-Date:" $j | sed 's/^.*: \(.*\)\\n.*$/\1/')" "+%F %R%z"
-	done | sort | tail -n 1)"
+LASTDATE="$(po_get_header "POT-Creation-Date" \
+		$(po_last_updated "POT-Creation-Date" $pots po/template.pot))"
+
 # We don't want all templates.pot files headers as we don't care about them
 # So we merge the generated file with a simple header.pot file
 if [ -f po/header.pot -a -s po/template.pot.new ] ; then
 	msgcat --use-first po/header.pot po/template.pot.new | \
-		sed 's/charset=UTF-8/charset=CHARSET/g' | \
-		sed "s/^.*POT-Creation-Date:.*$/\"POT-Creation-Date: $LASTDATE\\\n\"/" \
-		> po/template.pot
+		sed 's/charset=UTF-8/charset=CHARSET/g' > po/template.pot
+	po_replace_header "POT-Creation-Date" "$LASTDATE" po/template.pot
 	rm po/template.pot.new
 else
 	error "ERROR: no $DI_COPY/packages/po/header.pot file. Cannot continue."
@@ -465,14 +534,119 @@ if [ "$WITHLEVELS" = "Y" ] ; then
 fi
 log ""
 
+# Update PO files for sublevels:
+# 3a) Synchronize with D-I SVN
+# 3b) Merge the sublevel PO files into a master PO file
+# 3c) Update the master PO file from the master POT file as it will be used
+#     to update package PO files
+# 3d) Update the sublevel PO files from this master PO file and the sublevel POT file
+# 3e) commit back the changed file
+log "Phase III: update master translation files"
+if [ "$WITHLEVELS" = "Y" ] ; then
+	cd $DI_COPY/packages/po
+	languages=""
+	for po in sublevel1/*.po ; do
+		lang=$(basename $po .po)
+		# Next line is just for quicker testing
+		#[ $lang = nl ] || continue
+		log "- $lang"
+		if [ ! -r PROSPECTIVE ] || \
+		   ([ -r PROSPECTIVE ] && \
+		    ! grep -q "^$lang[[:space:]]*$" PROSPECTIVE); then
+			languages="${languages:+$languages }$lang"
+		fi
+
+		log "  - Merge sublevel PO files into master PO file and update..."
+		list=""
+		for i in $LEVELS; do
+			if [ -f sublevel$i/$lang.po ]; then
+				list="${list:+$list }sublevel$i/$lang.po"
+			fi
+		done
+		# Retain the date and translator of the last updated sublevel PO file
+		LASTFILE="$(po_last_updated "PO-Revision-Date" $list)"
+		LASTDATE="$(po_get_header "PO-Revision-Date" $LASTFILE)"
+		LASTTRANS="$(po_get_header "Last-Translator" $LASTFILE)"
+		msgcat --use-first $list >${lang}.po
+		po_replace_header "PO-Revision-Date" "$LASTDATE" $lang.po
+		po_replace_header "Last-Translator" "$LASTTRANS" $lang.po
+
+		# Update the master PO file (as it's used to update package PO files)
+		log_cmd --pass msgmerge --previous $lang.po template.pot >$lang.po.new || \
+			gettexterr
+
+		# Remember obsolete strings
+		OBSOLETE="$(po_print_obsolete $lang.po.new)"
+
+		# Optionally merge with PO files from a different source
+		# Strings from the other source are preferred!
+		# Should we disallow automatic commits for this?
+		# WARNING: NOT TESTED!!!
+		if [ -n "$MERGEDIR" ] && [ -r $MERGEDIR/$lang.po ]; then
+			log "  - Merge with $MERGEDIR/$lang.po !!"
+			msgcat --use-first "$MERGEDIR/$lang.po" $lang.po.new \
+				>$lang.po.merge || gettexterr
+			log_cmd --pass msgmerge --previous $lang.po.merge template.pot | \
+				msgattrib --no-obsolete	>$lang.po.new || gettexterr
+			rm $lang.po.merge
+		fi
+
+		# Clean up new master PO file
+		msgattrib --width=79 --no-obsolete $lang.po.new >$lang.po
+		rm $lang.po.new
+
+		# Update the sublevel PO files
+		# We keep its old header and only update the POT-Creation-Date
+		for i in $LEVELS; do
+			if [ -f sublevel$i/$lang.po ]; then
+				OLDHEADER="$(po_print_header sublevel$i/$lang.po)"
+			elif [ "$CONVERT" = Y ]; then
+				OLDHEADER="$(po_print_header $lang.po)"
+			fi
+			if [ -f sublevel$i/$lang.po ] || [ "$CONVERT" = Y ]; then
+				log_cmd --pass -m "  - Merge with template.pot for sublevel $i..." \
+					msgmerge --previous $lang.po \
+						sublevel$i/template.pot \
+					>sublevel$i/$lang.po.new || gettexterr
+				POTDATE="$(po_get_header "POT-Creation-Date" sublevel$i/$lang.po.new)"
+
+				# Combine old header and new content
+				( echo "$OLDHEADER"
+				  po_print_body sublevel$i/$lang.po.new	) | \
+					msgattrib --width=79 --no-obsolete \
+					>sublevel$i/$lang.po
+				po_replace_header "POT-Creation-Date" "$POTDATE" sublevel$i/$lang.po
+				# Append any obsolete strings to sublevel1 PO file
+				if [ $i -eq 1 ] && [ "$OBSOLETE" ]; then
+					echo "$OBSOLETE" >>sublevel$i/$lang.po
+				fi
+				rm sublevel$i/$lang.po.new
+			fi
+		done
+
+		# Remove all custom headers so they don't clutter the PO files in
+		# the packages directories
+		msgattrib --no-wrap $lang.po | \
+			grep -v "^\"X-.*: .*\\n\"$" | \
+			msgattrib --width=79 >$lang.po.new
+		mv $lang.po.new $lang.po
+	done
+
+	if [ "$COMMIT" = "Y" ] ; then
+		log_cmd -p "Commit all general PO/POT files to SVN..." \
+			$svn commit -m "$COMMIT_MARKER Updated packages/po/* against package templates" || svnerr
+	fi
+fi
+
 # For each PO file in packages/po/sublevel* or packages/po:
 # 3a) Synchronize with D-I SVN
 # 3b) Update with template.pot
 # 3c) Grab translations from the lower levels file(s)
 # 3d) commit back the changed file
-log "Phase III: update master translation files"
 for i in $LEVELS; do
 	if [ "$WITHLEVELS" = "Y" ] ; then
+		# Bail out; work has already been done in previous section
+		break
 		dir=po/sublevel$i
 		level="level $i "
 	else
@@ -540,26 +714,9 @@ for i in $LEVELS; do
 			$svn commit -m"${COMMIT_MARKER} Updated packages/$dir/* with general template.pot" *.po template.pot || svnerr
 	fi
 done
-
-# If we use levels, create a temporary general file
-# (which we won't commit) to make merging in individual packages
-# much faster
-if [ "$WITHLEVELS" = "Y" ] ; then
-	cd $DI_COPY/packages/po
-	for po in sublevel1/*.po ; do
-		lang=$(basename $po .po)
-		list=""
-		for i in `seq $NUMLEVELS -1 1`; do
-			if [ -f sublevel${i}/${lang}.po ]; then
-				list="$list sublevel${i}/${lang}.po"
-			fi
-		done
-		msgcat --use-first $list >${lang}.po
-	done
-fi
 log ""
 
 # Loop over D-I packages:
@@ -578,10 +735,10 @@ if [ "$SYNCPKGS" = "Y" ]; then
 		for lang in $languages ; do
 			logn "$lang "
 			cat >$lang.po.new <<EOF
-# THIS FILE IS AUTOMATICALLY GENERATED FROM THE MASTER FILE:
-# packages/po/$lang.po
+# THIS FILE IS GENERATED AUTOMATICALLY FROM THE D-I PO MASTER FILES
+# The master files can be found under packages/po/
 #
-# DO NOT MODIFY IT DIRECTLY: SUCH CHANGES WILL BE LOST
+# DO NOT MODIFY THIS FILE DIRECTLY: SUCH CHANGES WILL BE LOST
 # 
 EOF
 			log_cmd --pass msgmerge $DI_COPY/packages/po/$lang.po templates.pot | \
@@ -599,22 +756,21 @@ EOF
 				egrep -v "$filter" $lang.po >$oldfiltered
 				egrep -v "$filter" $lang.po.new >$newfiltered
 				if [ -z "$(diff $oldfiltered $newfiltered)" ] ; then
-					# Don't commit if the only chages are in filtered lines
+					# Don't commit if the only changes are in filtered lines
 					rm $lang.po.new
 				else
+					# Remember original PO-Revision-Date
+					LASTDATE="$(po_get_header "PO-Revision-Date" $lang.po)"
+					mv $lang.po.new $lang.po
 					# At least one unfiltered line changed
 					# Put the old Revision-Date back if asked for
-					if [ "$KEEP_REVISION" != "N" ] && [ "$KEEP_REVISION" = "$lang" ] ; then
-						# Grab back the PO-Revision-Date from the old file
-						old_revision=`grep -e "^\"PO-Revision-Date:" $lang.po | sed 's/\\\\n\"//g'`
-						# And replace the one from the new file with it
-						# then put all this as a result
-						sed "s/\"PO-Revision-Date:.*/$old_revision\\\\n\"/g" $lang.po.new >$lang.po
-						rm $lang.po.new
-						log_s1 "${package}/debian/po/${lang}.po" "CHANGED, revision kept"
+					if [ "$KEEP_REVISION" != "N" ] && \
+					   [ "$KEEP_REVISION" = "$lang" ] ; then
+						# Restore original PO-Revision-Date
+						po_replace_header "PO-Revision-Date" "$LASTDATE" $lang.po
+						log_s1 "$package/debian/po/$lang.po" "CHANGED, revision kept"
 					else
-						mv $lang.po.new $lang.po
-						log_s1 "${package}/debian/po/${lang}.po" "CHANGED"
+						log_s1 "$package/debian/po/$lang.po" "CHANGED"
 					fi
 				fi
 				# Remove temporary files

Attachment: signature.asc
Description: This is a digitally signed message part.


Reply to: