Re: Request for ideas how to fix #297074

On Tue, Aug 14, 2007 at 07:55:36PM +0100, Marcin Owsiany wrote:
> First, a short explanation of the use case:
> 1. User runs poedit (aka potooledit) on a partially translated po file.
> 2. Poedit retrieves only the untranslated messages from the file (by
>    filtering it through potool -fnt) and puts them into a temporary po
>    file
> 3. Poedit launches $EDITOR on that temporary po file
> 4. User does some translation, saves the file, exits the editor
> 5. Poedit merges the original and the temporary file back together
> Now, to reproduce the bug:
> 1. use an editor which can auto-detect the file encoding, e.g. vim
> 2. run poedit on a file which is in encoding A, while your locale is set
> to use encoding B. (where neither A nor B is a subset of the other. For
> example UTF-8 and Latin2)

Uhm, Latin2 _is_ a subset of UTF-8.

> What happens in step 3 is that vim looks at an ascii-only file (since
> msgids are in POSIX locale) and when the user inputs the translation in
> her own language, the editor decides to use encoding B (since it's the
> locale default).

Any non-broken editor _has_ to use encoding B and only encoding B, at least
not without a very explicit user override.  So it loads the file assuming it
uses encoding B (Latin2 in your case) and also saves it in that way.

> Then in step 5 poedit merges the original (in encoding A) and the
> temporary (in encoding B) creating a broken and a difficult to fix file
> with different parts in differing encodings.
> Does anyone have any ideas on how to fix this properly, keeping in mind
> that poedit is editor-agnostic so it is hard to determine what encoding
> the editor has chosen to use for the temporary file.

Just do this: iconv -f utf-8 <po >tmp;$EDITOR tmp;iconv -t utf8 <tmp
It will use the local encoding when running the editor.

> The only metadata available seems to be the Content-type field of the
> header in the original po file, but I can't see how to enforce it for
> the temporary file...

The only editor-agnostic way would be to change the locale, intercept all
input and output of the editor -- doable on tty, much harder in X, and a
very bad idea generally.  Few editors can handle a different encoding for
on-disk files and their user interface, and they are not really supposed to
anyway.  As it's unwise to mess with the user's locale, you can change only
the on-disk one, and this is the way to go.

