Re: Language support
On Mon, Nov 15, 1999 at 08:35:28PM +0100, Hartmut Koptein wrote:
> > I have sent a proposal exactly about that.  What do you think
> > about it?
> 
>  Please resend it to this list.
OK.
Basically, I want to not only specify the language for messages to be used, but
a complete set of properties of a localized environment: language, locale,
keyboard, font (with possible acm (console-tools' term)).
The attached document is a first attempt to summarize the thoughts.
ALL comments are welcome.
--
Mike
                  Notes on localization of boot-floppies
                  --------------------------------------
                    Michael Sobolev, <mss@transas.com>
                                    0.1
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
1. General considerations
-------------------------
     There are few places where the localization should occur:
        * right before loading installation system (like syslinux'[1]
          messages, help screens, etc)
          [1]  Maybe it's good idea to check other boot loaders.  What
               features do we need from syslinux?
        * dbootstrap program (which, actually, is the most important part
          to be localized)
     What localization means here?  We have a user who wants to use her own
     language wherever possible.  This consists of three parts:
        * message catalogs (what the programs print out)
        * console font (the way that output is displayed)
        * keyboard map (the way user interacts with the program)
     To my knowledge, message catalogs and keyboard maps depend not only on
     the language the user speaks, but the user's location (e.g., fr_FR,
     fr_CA, fr_CH, etc).  Console fonts are a bit different: they usually
     depend on the taste of the user and nothing more.  As result, the
     following scheme looks reasonable for me:
              Language
                  Variant
                      Variant
                  Variant
                  Variant
                      Variant
     For example
              French (fr)
                  Francais (France) (fr_FR)
                      ISO-8859-1
                      UTF-8
                  Francais (Swiss) (fr_CH)
                      ISO-8859-1
                      UTF-8
              Russian
                  KOI8-R
                  ISO-8859-5
                  UTF-8
     There could be one problem with keyboard: somebody should check all
     available keyboards and decide what keyboard is used for every
     possible variant.  These will be called standard keyboards.  Later,
     when the system is installed, and console-data (or other similar
     package is available) user may change the standard keyboard to
     something else (what will suit her very special tastes)[1].
     [1]  Hmm...  it looks like a set of standard keyboards is already
          chosen.
     To summarize: a complete enviroment description consists of:
        * enviroment identifier
        * locale name
        * font name
        * application character map name[1]
          [1]  this may be an exotic name, thus we cannot deduce neither
               locale name nor anything else, I believe
        * keyboard name[1]
          [1]  this is equal to an [almost] full path for keymap file
-------------------------------------------------------------------------------
2. A proposal
-------------
2.1. The main idea
------------------
     The main idea is to present the user with a possibility to choose her
     language as soon as `dbootstrap' started.
     The appropriate part of the program should check a special file that
     says whether the user already specified the language for her
     enviroment.  If this file exists, the step is skipped, otherwise, a
     menu of all supported languages is presented.  At this stage we do not
     diffirentiate localities (e.g.  ``English (US)'' and ``English (UK)''
     are the same and are represented with ``English'').  In this menu user
     may use up and down arrows to move between items[1].  Whenever a
     different language is selected appropriate application character map
     (acm for short) or acm and font may be reloaded to make it possible to
     see a hint written in this very language.  As soon as the user sees
     something she wants to use, she presses the Enter, and proceeds to the
     selecting a suitable variant (yes, yes, it's here where we start to
     diffirentiate ``English (US)'' and ``English (UK)'').  As soon as she
     finishes the selection, we receive a whole bunch of useful
     information:
        * enviroment name (English (US) or English (US) - UTF8, or
          something)
          [1]  I do not know what is the best way to tell that to the user
        * locale name
        * name of a font needed for this kind of enviroment
        * acm for this font (as we may want to use a unicode font, and take
          only part of it)
        * keyboard name[1]
          [1]  this may be a starting point for later keyboard
               configuration
2.2. How it works
-----------------
     There is a file (in XML) that contains definitions for all possible
     enviroments[1].  This file is processed with a special program that
     produces a `.c' file.  This file contains only static data definitions
     and the only function (see Chapter 4, `Application Programming
     Interface').  A program that wants to make use of the enviroment
     definitions, just links in the appropriate object file, and voila,
     this data is available.  For the case of `dbootstrap', there is a
     windowed (newt) interface, the allows a user to choose her enviroment.
     [1]  we want to put there
2.3. Fonts
----------
     To minimize the space needed for proper localization, I propose to use
     one Unicode font (one of LatArCyrHeb* from console-data) and a set of
     application character maps (*.acm files from console-data).  For those
     languages, that characters are not in LatArCyrHeb* fonts (like
     Japanese??), appropriate fonts should be added.
     But as soon as I propose to use unicode fonts, it's vital to provide
     appropriate acm's and a program that knows how to cope with them.
2.4. Keyboard
-------------
     Even though this process allows to get the keyboard identifier, on
     this stage we do not try to load appropriate keyboard mappings.
2.5. Locales
------------
     Locales, as well as keyboards, are only determined on this stage: they
     are not used!  If later we want to specify a default value for system
     locale (e.g.  this could be put into `/etc/enviroment' or something
     similar), this value can be used.
-------------------------------------------------------------------------------
3. Summary of things to do
--------------------------
        * _done_ create a file format
        * _done_ create a program converting this file into `.c' file
        * ...
-------------------------------------------------------------------------------
4. Application Programming Interface
------------------------------------
     This chapter describes the only available function and the structure
     used for supporting this.
4.1. Function prototype
-----------------------
     There is only one function.  It is called `available_languages' and
     returns a pointer to an _array_ of `language_definition's.
4.2. Structures
---------------
-------------------------------------------------------------------------------
5. File format
--------------
     The infromation about possible environments is stored as an
     XML-document.  Section 5.1, `Document Type Definition' contains
     complete document type definition.  Then every element is
     described[1].
     [1]  I will appreciate anybody's commenting the DTD
5.1. Document Type Definition
-----------------------------
              <!ELEMENT languages (language+)>
          
              <!ELEMENT language (name, hint, list)>
              <!ATTLIST language
                  name    CDATA   #REQUIRED
                  font    CDATA   #REQUIRED
                  acm     CDATA   #REQUIRED>
          
              <!ELEMENT list (name, (item | list)+)>
          
              <!ELEMENT item (name)>
              <!ATTLIST item
                  locale  CDATA   #REQUIRED
                  font    CDATA   #REQUIRED
                  acm     CDATA   #REQUIRED
                  keymap  CDATA   #REQUIRED
                  msgcat  CDATA   #REQUIRED>
          
              <!ELEMENT name (#PCDATA)>
              <!ELEMENT hint (#PCDATA)>
-------------------------------------------------------------------------------
     Notes on localization of boot-floppies
     Michael Sobolev, <mss@transas.com>
     0.1
Reply to: