[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation

Additional comments on my 'compromise' idea.

> One compromise is that:
>  - to use UCS-4 for internal processing, not wchar_t.
>  - a small part of input and output to be encoding-sensible.

A small part of the source code of Groff related to I/O
has to be encoding-sensible.  This part can handle Latin-1,
EBCDIC, and UTF-8.  Additionally, if Groff is compiled within
internationalized OS (i.e. setlocale(), iconv(), nl_langinfo(),
and so on are available), the part also has locale-sensible
file I/O.

The input routine outputs the contents with UCS-4 encoding.
The output routine inputs the contents with UCS-4 encoding.

Other almost part of the source code is written to handle UCS-4.

For example: typedef long ucs4_t; and substitute char with ucs4_t.

>  - command options for encodings of input and output to be added.

For example, --input-encoding and --output-encoding.

>  - a compile-time option I18N to be introduced.

Autoconf may be used for this purpose.

>  - when I18N is off, default input is latin-1 and default output
>    is also latin-1.
>  - when I18N is on, default input and default output are sensible
>    to LC_CTYPE locale.
>  - Of course these default encodings can be overrided by command
>    options.

Thus, valid encoding names for --input-encoding and --output-encoding
command options are: 
  - latin1, ascii, utf8, (ascii8)  [compiled without I18N]
  - latin1, ascii, utf8, (ascii8), and encodings supported by OS
    [compiled with I18N]

>  - Groff can be compiled with I18N off for systems without 
>    internationalization functions such as setlocale().
>  - iconv(3) to be used for converting between input/output encodings
>    and internal UCS-4 encoding, if available (I18N=true).
>  - if I18N is false, conversion process to be hard-coded for
>    Latin-1, EBCDIC, and UTF-8.

Tomohiro KUBOTA <kubota@debian.org>

Reply to: