Re: [Groff] Re: groff: radical re-implementation
> 2. Perhaps it is a good point of view to see troff (gtroff) as an
> engine which handles _glyphs_, not characters, in a given context of
> typographic style and layout. The current glyph is defined by the
> current point size, the current font, and the name of the
> "character" which is to be rendered, and troff necessarily takes
> account of the metric information associated with this glyph.
Exactly. But the current terminology in gtroff is more than
ambiguous, and I believe that we need a clear separation between
characters and glyphs.
> Logically, therefore, troff could be "neutral" about what the byte
> "a" stands for. From that point of view, a troff which makes no
> assumptions of this kind, amd which consults external tables about
> the meaning of its input and about the characteristics of what
> output that input implies, purely for the purpose of correct
> formatting, is perhaps the pure ideal. And from that point of view,
> therefore, unifying the input conventions on the basis of a
> comprehensive encoding (such as UTF-8 or Unicode is intended to
> become) would be a great step towards attaining this neutrality.
I fully agree. A single input character set (as universal as
possible) is the right thing, and everything else shall be managed by
preprocessors (and a postprocessor for tty).
> Meanwhile, interested parties who have not yet studied it may find
> the "UTF-8 and Unicode FAQ for Unix/Linux" by Markus Kuhn well worth
Yes, Markus is doing an excellent job.
> By the way, your comment that hyphenation, for instance, is not a
> "glyph question" is, I think, not wholly correct. Certainly,
> hyphenation _rules_ are not a glyph question: as well as being
> language-dependent, there may also be "house rules" about it; these
> come under "typographic style" as above. But the size of a hyphen
> and associated spacing are glyph issues, and these may interact with
> where a hyphenation occurs or whether it occurs at all, according to
> the rules.
I mean the algorithm of finding possible breakpoints which must be
based on input characters. The final decision where a word will be
broken is of course a glyph issue.