[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: svn corruption: trunk/packages/po/sublevel4/da.po



Quoting Philip Hands (phil@hands.com):

> it looks like something is misinterpreting character codings and then
> stuffing them back in slightly more broken each lime round the loop.


As discussed on IRC, the "something" was l10n-sync. The script behaves
badly when all sublevel files for a given language are using different
encodings.

In such situation, the files that are not UTF-8 are converted from
their original encoding to UTF-8, BUT the PO file header is left to
its original value.

As a consequence, all subsequent runs of l10n-sync will re-"convert" the
file to UTF-8...which doubles the size of non ASCII characters in the
file.

This is what happened with that Danish translation. It claims to be
ISO-8859-1 while all other Danish translations were UTF-8. Moreover,
the file was indeed broken at the beginning, but that could even have
happen with a non broken file.

Weirdly, this got unnoticed as the resulting files *still* appeared to
be valid to gettext tools. That explains why the file was growing at
each run between Dec 25th (where l10n-sync runs were reactivated) and
Jan 8th....where the file was so big that it filled up my laptop's
disk.

I'm currently working to fix l10n-sync so that it enforces UTF-8 on
all files for a language when they are using different encodings.



Attachment: signature.asc
Description: Digital signature


Reply to: