Re: [Groff] Re: groff: radical re-implementation
At Fri, 20 Oct 2000 20:32:17 +0100 (BST),
(Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> wrote:
> It does not really matter that these are interpreted, by default, as
> iso-latin-1. They could correspond to anything on your screen when you
> are typing, and you can set up translation macros in troff to make them
> correspond to anything else (using either the ".char" request or the
> traditional ".tr" or the new ".trnt" requests).
Since I am not familiar with the internals of troff, I don't know
this works well for multibyte encodings such as EUC-*, UTF-8, and so
on. However, there are encodings which have shift state with escape
sequence, like ISO-2022-JP, ISO-2022-CN, ISO-2022-INT-1, and so on.
Such encodings cannot treated by ".char" . Do you have any positive
reason not to support such encodings, even when such encodings can
easily supported using standard "locale" technology?
We have to prepare many font files for every encodings in the world.
It is inefficient. And more, what is the mechanism to choose proper
font file for desired encoding?
I think the answer is "locale" technology. Groff can check LC_CTYPE
category. Thus, all what a user has to do is to set LANG, LC_CTYPE,
or LC_ALL environmental for various softwares to work on needed
However, I am interested in how Groff 1.16 works for UTF-8 input.
I could not find any code for UTF-8 input, though I found a code for
UTF-8 output in src/devices/grotty/tty.cc . Am I missing something?
(Of course /font/devutf8/* has no implementation of UTF-8 encoding,
though it seems to have a table for glyph names -> UCS-2.)
> B: Troff should be able to cope with multi-lingual documents, where
> several different languages occur in the same document. I do NOT
> believe that the right way to do this is to extend troff's capacity
> to recognise thousands of different input encodings covering all the
> languages which it might be called upon to typeset (e.g. by Unicode or
> the like).
This is a confusion of language and encoding. I think UTF-8 support
can and should be achieved via locale technology. By using locale
technology, a software can be written in encoding-independent way.
The software can support any encodings including UTF-8. Why
do we have to hard-code UTF-8, though we can use locale technology
to support any encodings including UTF-8 ?
I agree that we should not extend troff to recognize thousands of
encodings if we have to hard-code every encodings. But it is not true.
Again, do you have any positive reason not to support encodings other
than UTF-8 ?
We have many systems using encodings such as EUC-* and ISO-2022-*.
Some systems will migrate to UTF-8 soon. Some will migrate after
a certain time. Users using UTF-8 and EUC-* may share a system.
Some will never migrate into UTF-8. Groff should support all of them.
I think you will also have trouble if you cannot use 8-bit encodings.
> C: Error messages and similar communications with the user (which
> have nothing directly to do with troff's real job) are irrelevant to
> the question of revising groff. If people would like these to appear in
> their own language then I'm sure it can be arranged in a way which
> would require no change whatever in the fundamental workings of troff.
This is a different topic. You are right. We can use gettext to
handle translation without radical re-implementation of troff nor
Groff system. However, IMO, this topic is much less important.
You are reminded of this topic because I use a word "locale", aren't you?
Sure, locale is also used to change language for messages
(LC_MESSAGES category). However, what I am discussing is LC_CTYPE
Supporting "locale" technology (especially LC_CTYPE category) is
important so that Groff can work well with other softwares.
P.S. to CHOI Junho:
I think you'd better subscribe firstname.lastname@example.org or you may lose messages.
I'd like you not to leave this discussion because you are also a
Tomohiro KUBOTA <email@example.com>