[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Is it save to let translate C format strings?



Hi all,

I found an ugly format flag in hex-a-hop which resulted from passing
a translator specified string (_("> Continue game 1 (1% complete) <"))
to the first argument of printf (I introduced the error myself).

It resulted in "> Continue game 1 (1�omplete) <" as "% c" was interpreted
as format sequence. (I thought such a flag results in a leading space
and not truncated bytes but let's ignore this ...)

A fix was simple, just add "%s" as format string and pass the other
string as second argument.

But now I wonder: Is it save to write the following?
printf(_("Hello World: %s"), a_string);

The translation of "Hello World: %s" could contain multibytes. Isn't
it possible that there exists an encoding in which the
translation contains dangerous bytes such as %s (even if the translator
didn't used the 7bit character %)?

I know that this cannot happen with UTF-8, as multibytes always have the
8th bit set for all bytes. But there exist more encodings ...

I tried to workaround by using "%s" as format specifier where possible,
but in the given example it is not possible (except if I parse all
format flags myself instead of asking printf to do so).

Jens



Reply to: