Re: [Groff] Re: groff: radical re-implementation
> 1. Your 'charset' and 'encoding' are for troff or for preprocessor?
In general. I want to define terms completely independent on any
particular program. We have
> I thought both of them are for preprocessor. The preprocessor
> figures out the way to convert the input to UTF-8 from the
A groff preprocessor will work as you have described. Under the
assumption that you are talking about input characters, the term
`encoding' indeed implies the character set(s). After some thinking I
have to correct myself: It is better to say that `EUC' is an `encoding
scheme' which describes which character ranges and how many bytes are
used. Sorry for the confusion.
> 2. Which will the pre/postprocessors handle, characters or glyphs?
The preprocessor converts from characters to characters (i.e. to
Unicode), grotty + postprocessor convert glyph names back to Unicode
characters (using a hard-coded table), then from characters to
characters. I don't know yet whether it makes sense to unify the
latter two programs.
> 3. Your 'charset' is for glyph and 'encoding' is for character?
> I thought both of them are for character, since I thought both
> of them are for preprocessor.
My point was to make the distinction clear between `set' and
`encoding'. Maybe it is only of academic interest, but it (hopefully)
helps to clear up the used terms.
> 4. I though we were discussing on (tags in roff souce for)
> preprocessor. Is that right?
> roff source in any encoding like '\(co' (character)
> | preprocessor
> UTF-8 stream like u+00a9 (character)
> | troff
> glyph expression like 'co' (glyph)
> | troff (continuing)
Here is missing a step:
typeset output (glyph)
> UTF-8 stream like u+00a9 or '(C)' (character)
> | postprocessor
> formatted text in any encoding (character)