Quoting Philip Hands (phil@hands.com): > it looks like something is misinterpreting character codings and then > stuffing them back in slightly more broken each lime round the loop. As discussed on IRC, the "something" was l10n-sync. The script behaves badly when all sublevel files for a given language are using different encodings. In such situation, the files that are not UTF-8 are converted from their original encoding to UTF-8, BUT the PO file header is left to its original value. As a consequence, all subsequent runs of l10n-sync will re-"convert" the file to UTF-8...which doubles the size of non ASCII characters in the file. This is what happened with that Danish translation. It claims to be ISO-8859-1 while all other Danish translations were UTF-8. Moreover, the file was indeed broken at the beginning, but that could even have happen with a non broken file. Weirdly, this got unnoticed as the resulting files *still* appeared to be valid to gettext tools. That explains why the file was growing at each run between Dec 25th (where l10n-sync runs were reactivated) and Jan 8th....where the file was so big that it filled up my laptop's disk. I'm currently working to fix l10n-sync so that it enforces UTF-8 on all files for a language when they are using different encodings.
Attachment:
signature.asc
Description: Digital signature