[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Using UTF-8 (was Re: Debian Boot Floppies CVS: boot-floppies polish)

On Fri, Nov 19, 1999 at 01:26:44PM +0100, Stephane Bortzmeyer wrote:
> OK, here is my suggested way, using only packages found in potato:
> - all the XML files in UTF-8 (by law, every XML processor must recognize it, 
> and it has upward compatibility with ASCII). If you cannot or don't want to 
> edit UTF-8, use recode (here, I assume you edited in Latin-2):
>  recode latin-2..utf-8 polish.xml
(I'd prefer iconv)  This seems reasonable and this is what I have already done
for Russian (and for the time I finish this message, this is also done for
Polish :).

> - langs.c must be in UTF-8 for another reason: it mixes characters from many 
> languages.
Sorry, I do not quite understand your reasoning.

> - conversion from UTF-8 to the choosen charset needs to be done dynamically in 
> dbootstrap. librecode could help, but it would mean adding it to the rescue 
> disk.
Hmm...  This means that we need to add to the rescue disk the following items:

    -- messages for dbootstrap
    -- all necessary fonts
    -- acms for these fonts (actually these are tables for converting from local
       charset to utf-8 in case LatArCyrHeb is used)
    -- keymaps (in case we do want to enter anything localized in dbootstrap)
    -- librecode0 for translating from utf-8...


I do not think that conversion should be performed by dbootstrap.  The reason
is quite simple: we have static (wrt dbootstrap) data and the charset is known
at compile time.  Why not make use of this knowledge?  Then UTF-8 has a big
drawback: the size.  For example, UTF-8 version of russian.xml file is 300
bytes larger than the KOI8-R.  I would not mind if everything we are to display
in dbootstrap were in UTF-8.  But!  The messages in *.po files are using
their's local charsets.  s-lang does not easily support UTF-8 (well, I do not
know for sure how well s-lang copes with UTF-8 :).

Hmm, (just found it) I do not know how to correctly work with character set
conversions in python...  Does anybody have any suggestions?


Reply to: