Re: groff: radical re-implementation
From: Werner LEMBERG <firstname.lastname@example.org>
Date: Mon, 16 Oct 2000 16:41:35 +0200 (CEST)
> From: Tomohiro KUBOTA <email@example.com>
> Subject: groff: radical re-implementation
> Date: Mon, 16 Oct 2000 11:35:20 +0900
> > Why 'ascii' and 'latin1' are treated as 'device type'? The device
> > type should be 'tty' or so. Because of this confusing design, we
> > have no way to treat, for example, Japanese X11 output or Korean
> > PostScript. You know, THIS IS NOT DUE TO LACK OF IMPLEMENTATION BUT
> > DUE TO CONFUSED DESIGN. Should we type 'groff -Tlatin1 -Tx75' for
> > X11 output with latin1 encoding? Entirely No!
> As you may know, this confusion has historical origins. I'm not
> willing to add new `devices' like `latin-2' or even `nippon' due to
> this currently.
> I plan to separate input encodings, output encodings, and character
> sets from devices. Then, we will have real devices like tty, ps, or
> dvi. Input characters will be converted to glyph names by troff, and
> these glyph names will be mapped to output encodings (for ttys)
> resp. fonts (for everything else) according to the device and font
It's nice news!
> > The ideal implementation will be using 'wchar_t' for reading.
> But this will fail for some compilers...
Hmm, ISO C99 now becomes standard, but ...
> > Ukai has surveyed roughly the source code of groff and posted
> > a brief but long list of needed works (in firstname.lastname@example.org
> > mailing list in Japanese).
> > http://www.debian.or.jp/Lists-Archives/debian-devel/200010/msg00072.html
> > Fortunately, fgetwc(), putwchar(), wprintf(), swprintf(), and so on
> > are available in new Glibc 2.2. mbstowcs() and so on are also
> > available since older Glibc. These functions are locale-sensible
> > and can handle any encodings. Note that they can also treat UTF-8
> > under UTF-8 locale, though the current Debian locales package does
> > not include any UTF-8 locales. We should not give UTF-8 special
> > treatment. Discussion is in progress about this new design of groff
> > at email@example.com mailing list (in Japanese) and personal
> > communication.
> Please bear in mind that groff shall work on non-GNU systems also! My
> idea is to only accept UTF8, ascii, latin1, and ebcdic as input
> encodings (the latter three for historical reasons only).
> Maybe on systems with a recent glibc, iconv() and friends can be used
> to do more, but generally I prefer an iconv-preprocessor so that groff
> itself has not to deal with encoding conversions.
There is portable iconv implementation developed by Bruno Haible,
which is derivered from glibc iconv().
In the case of using iconv(), such a portable library may be helpful.
Another solution is to use GNU recode as iconv-preprocessor.
> > - Man-db to invoke Groff through iconv(1).
> > Problem of the latter idea is that the current version of locales
> > package does not have any UTF-8 locale. How UTF-8 --> wchar_t
> > conversion can be achieved without UTF-8 locale? WE MUST NOT
> > ASSUME THAT INTERNAL EXPRESSION OF WCHAR_T IS UCS-4, though this
> > is true for Glibc.
No. UTF-8 locale is easy to include into locales package.
And what is the problem? Using iconv() is not needed any UTF-8
-- GOTO Masanori