Re: [Groff] Re: groff: radical re-implementation
Additional comments on my 'compromise' idea.
> One compromise is that:
> - to use UCS-4 for internal processing, not wchar_t.
> - a small part of input and output to be encoding-sensible.
A small part of the source code of Groff related to I/O
has to be encoding-sensible. This part can handle Latin-1,
EBCDIC, and UTF-8. Additionally, if Groff is compiled within
internationalized OS (i.e. setlocale(), iconv(), nl_langinfo(),
and so on are available), the part also has locale-sensible
The input routine outputs the contents with UCS-4 encoding.
The output routine inputs the contents with UCS-4 encoding.
Other almost part of the source code is written to handle UCS-4.
For example: typedef long ucs4_t; and substitute char with ucs4_t.
> - command options for encodings of input and output to be added.
For example, --input-encoding and --output-encoding.
> - a compile-time option I18N to be introduced.
Autoconf may be used for this purpose.
> - when I18N is off, default input is latin-1 and default output
> is also latin-1.
> - when I18N is on, default input and default output are sensible
> to LC_CTYPE locale.
> - Of course these default encodings can be overrided by command
Thus, valid encoding names for --input-encoding and --output-encoding
command options are:
- latin1, ascii, utf8, (ascii8) [compiled without I18N]
- latin1, ascii, utf8, (ascii8), and encodings supported by OS
[compiled with I18N]
> - Groff can be compiled with I18N off for systems without
> internationalization functions such as setlocale().
> - iconv(3) to be used for converting between input/output encodings
> and internal UCS-4 encoding, if available (I18N=true).
> - if I18N is false, conversion process to be hard-coded for
> Latin-1, EBCDIC, and UTF-8.
Tomohiro KUBOTA <firstname.lastname@example.org>