[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: groff: radical re-implementation

Hi, All.

From: Werner LEMBERG <wl@gnu.org>
Date: Mon, 16 Oct 2000 16:41:35 +0200 (CEST)
> From: Tomohiro KUBOTA <tkubota@riken.go.jp>
> Subject: groff: radical re-implementation
> Date: Mon, 16 Oct 2000 11:35:20 +0900
> > Why 'ascii' and 'latin1' are treated as 'device type'?  The device
> > type should be 'tty' or so.  Because of this confusing design, we
> > have no way to treat, for example, Japanese X11 output or Korean
> > DUE TO CONFUSED DESIGN.  Should we type 'groff -Tlatin1 -Tx75' for
> > X11 output with latin1 encoding?  Entirely No!
> As you may know, this confusion has historical origins.  I'm not
> willing to add new `devices' like `latin-2' or even `nippon' due to
> this currently.
> I plan to separate input encodings, output encodings, and character
> sets from devices.  Then, we will have real devices like tty, ps, or
> dvi.  Input characters will be converted to glyph names by troff, and
> these glyph names will be mapped to output encodings (for ttys)
> resp. fonts (for everything else) according to the device and font
> data.

It's nice news!

> > The ideal implementation will be using 'wchar_t' for reading.
> But this will fail for some compilers...

Hmm, ISO C99 now becomes standard, but ...

> > Ukai has surveyed roughly the source code of groff and posted
> > a brief but long list of needed works (in debian-devel@debian.or.jp 
> > mailing list in Japanese).
> > 
> >   http://www.debian.or.jp/Lists-Archives/debian-devel/200010/msg00072.html
> > 
> > Fortunately, fgetwc(), putwchar(), wprintf(), swprintf(), and so on
> > are available in new Glibc 2.2.  mbstowcs() and so on are also
> > available since older Glibc.  These functions are locale-sensible
> > and can handle any encodings.  Note that they can also treat UTF-8
> > under UTF-8 locale, though the current Debian locales package does
> > not include any UTF-8 locales.  We should not give UTF-8 special
> > treatment.  Discussion is in progress about this new design of groff
> > at debian-devel@debian.or.jp mailing list (in Japanese) and personal
> > communication.
> Please bear in mind that groff shall work on non-GNU systems also!  My
> idea is to only accept UTF8, ascii, latin1, and ebcdic as input
> encodings (the latter three for historical reasons only).
> Maybe on systems with a recent glibc, iconv() and friends can be used
> to do more, but generally I prefer an iconv-preprocessor so that groff
> itself has not to deal with encoding conversions.

There is portable iconv implementation developed by Bruno Haible,
which is derivered from glibc iconv().
In the case of using iconv(), such a portable library may be helpful.
Another solution is to use GNU recode as iconv-preprocessor.

> > - Man-db to invoke Groff through iconv(1).
> > 
> > Problem of the latter idea is that the current version of locales 
> > package does not have any UTF-8 locale.  How UTF-8 --> wchar_t 
> > conversion can be achieved without UTF-8 locale?  WE MUST NOT 
> > is true for Glibc.

No. UTF-8 locale is easy to include into locales package.
And what is the problem? Using iconv() is not needed any UTF-8

-- GOTO Masanori

Reply to: