[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation

Hi Werner (and all)

Thanks for this clarifying explanation. I have a couple of comments,
one explanatory, the other which, I think, may point to the core
of the question.

On 21-Oct-00 Werner LEMBERG wrote:
>> Troff's multi-character naming convention means that anything you
>> could possibly need can be defined, and given a name in the troff
>> input "character set" whenever you really need it, so long as you
>> have the device resources to render the appropriate glyph.
> There are only 256 `multi-characters' named `charXXX'.  Everything
> else are glyph entities (even if they behave like a character in most
> cases).  The reality is that groff doesn't really make a difference
> between a character and a glyph, and it has high priority to me to
> implement this distinction.  I'll probably start with renaming a lot
> of troff internals.

1. Perhaps I should clarify: by "multi-character naming convention"
I mean the fact that you can decide to use the sequence of ASCII
characters, for instance, "\[O-ogonek]" as the name of a "character".

In passing: I see no _logical_ distinction between using a string
of ASCII characters to name a "character", and using a string of
bytes which implements a UTF-8 encoding.

2. Perhaps it is a good point of view to see troff (gtroff) as an
engine which handles _glyphs_, not characters, in a given context of
typographic style and layout. The current glyph is defined by the current
point size, the current font, and the name of the "character" which is to
be rendered, and troff necessarily takes account of the metric information
associated with this glyph.

The fact that ASCII characters and the iso-latin-1 characters
corresponding to byte-values > 128 are (by default) the troff names of
"characters" in a group of European languages -- together with certain
other marks and symbols -- is logically (in my view) an irrelevant
coincidence which happens to be very convenient for people using these
languages; but it is not at all necessary. Nothing at all stops you from

  .char a \*a

as the name of Greek "alpha", and so on, if you want to simply the typing
of input in a passage of Greek using an ASCII interface.

Logically, therefore, troff could be "neutral" about what the byte "a"
stands for. From that point of view, a troff which makes no assumptions
of this kind, amd which consults external tables about the meaning of
its input and about the characteristics of what output that input
implies, purely for the purpose of correct formatting, is perhaps the
pure ideal. And from that point of view, therefore, unifying the input
conventions on the basis of a comprehensive encoding (such as UTF-8
or Unicode is intended to become) would be a great step towards
attaining this neutrality.

However, I wish to think more about this issue.

Meanwhile, interested parties who have not yet studied it may find
the "UTF-8 and Unicode FAQ for Unix/Linux" by Markus Kuhn well worth


By the way, your comment that hyphenation, for instance, is not a "glyph
question" is, I think, not wholly correct. Certainly, hyphenation _rules_
are not a glyph question: as well as being language-dependent, there may
also be "house rules" about it; these come under "typographic style" as
above. But the size of a hyphen and associated spacing are glyph issues,
and these may interact with where a hyphenation occurs or whether it
occurs at all, according to the rules.

An interesting debate!


E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 284 7749
Date: 21-Oct-00                                       Time: 23:47:03
------------------------------ XFMail ------------------------------

Reply to: