Using UTF-8 (was Re: Debian Boot Floppies CVS: boot-floppies polish)
On Fri, Nov 19, 1999 at 01:26:44PM +0100, Stephane Bortzmeyer wrote:
> OK, here is my suggested way, using only packages found in potato:
> - all the XML files in UTF-8 (by law, every XML processor must recognize it,
> and it has upward compatibility with ASCII). If you cannot or don't want to
> edit UTF-8, use recode (here, I assume you edited in Latin-2):
> recode latin-2..utf-8 polish.xml
(I'd prefer iconv) This seems reasonable and this is what I have already done
for Russian (and for the time I finish this message, this is also done for
> - langs.c must be in UTF-8 for another reason: it mixes characters from many
Sorry, I do not quite understand your reasoning.
> - conversion from UTF-8 to the choosen charset needs to be done dynamically in
> dbootstrap. librecode could help, but it would mean adding it to the rescue
Hmm... This means that we need to add to the rescue disk the following items:
-- messages for dbootstrap
-- all necessary fonts
-- acms for these fonts (actually these are tables for converting from local
charset to utf-8 in case LatArCyrHeb is used)
-- keymaps (in case we do want to enter anything localized in dbootstrap)
-- librecode0 for translating from utf-8...
I do not think that conversion should be performed by dbootstrap. The reason
is quite simple: we have static (wrt dbootstrap) data and the charset is known
at compile time. Why not make use of this knowledge? Then UTF-8 has a big
drawback: the size. For example, UTF-8 version of russian.xml file is 300
bytes larger than the KOI8-R. I would not mind if everything we are to display
in dbootstrap were in UTF-8. But! The messages in *.po files are using
their's local charsets. s-lang does not easily support UTF-8 (well, I do not
know for sure how well s-lang copes with UTF-8 :).
Hmm, (just found it) I do not know how to correctly work with character set
conversions in python... Does anybody have any suggestions?