[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation

> 1. Your 'charset' and 'encoding' are for troff or for preprocessor?

In general.  I want to define terms completely independent on any
particular program.  We have

  character set
  character encoding
  glyph set
  glyph encoding

>    I thought both of them are for preprocessor.  The preprocessor
>    figures out the way to convert the input to UTF-8 from the
>    information.

A groff preprocessor will work as you have described.  Under the
assumption that you are talking about input characters, the term
`encoding' indeed implies the character set(s).  After some thinking I
have to correct myself: It is better to say that `EUC' is an `encoding
scheme' which describes which character ranges and how many bytes are
used.  Sorry for the confusion.

> 2. Which will the pre/postprocessors handle, characters or glyphs?

The preprocessor converts from characters to characters (i.e. to
Unicode), grotty + postprocessor convert glyph names back to Unicode
characters (using a hard-coded table), then from characters to
characters.  I don't know yet whether it makes sense to unify the
latter two programs.

> 3. Your 'charset' is for glyph and 'encoding' is for character?
>    I thought both of them are for character, since I thought both 
>    of them are for preprocessor.

My point was to make the distinction clear between `set' and
`encoding'.  Maybe it is only of academic interest, but it (hopefully)
helps to clear up the used terms.

> 4. I though we were discussing on (tags in roff souce for)
>    preprocessor.  Is that right?


>    roff source in any encoding like '\(co'     (character)
>           |
>           |  preprocessor
>           V
>    UTF-8 stream like u+00a9                    (character)
>           |
>           |  troff
>           V
>    glyph expression like 'co'                  (glyph)
>           |
>           |  troff (continuing)
>           V

Here is missing a step:

     typeset output                              (glyph)
            |  grotty

>    UTF-8 stream like u+00a9 or '(C)'           (character)
>           |
>           |  postprocessor
>           V
>    formatted text in any encoding              (character)


Reply to: