[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation



> JIS X 0213 has many characters which are also included in JIS X 0212.
> It is very confusing.  I guess JIS people think JIS X 0212 is
> obsolete.

Basically, only Emacs supports JIS X 0212...

> A few characters in JIS X 0213 are not included in the present
> Unicode.

AFAIK, this will be fixed (or have already been fixed?) in the next
Unicode release where more than 10000 CJK characters are added (in the
surrogate area).

> > > Then the 'font definition file' will be irrationally large.
> >
> > Right.  I think I've answered this problem in my last mail (regarding
> > a `glyphclass' directive in font description files).
> 
> Then all of these glyphs have to have the same width.

Why?  It is intended that `glyphclass' can occur multiple times.  Say,
one glyphclass command for full-width glyphs, and another one for
half-width glyphs.

> > Indeed, the default behaviour should be that the preprocessor adds
> > a
> > 
> >   .mso tmac.<charset>
> > 
> > line or something similar to the document, but there must be a
> > possibility to override it manually.
> 
> Good idea.  Thus '#ifdef I18N' part can be restricted in pre/post-
> processors.

Exactly.

> Overriding?  Well, the current Groff has '-a' option.  I think this
> can be used for this purpose.  (Anyway, we can provide substitution
> only for non-letter symbols like soft-hyphen, '(C)', circles,
> squares, and so on.  I think this is sufficient.)

The `-a' option is almost useless today IMHO.  It will show a tty
approximation of the typeset output:

  groff -a -man -Tdvi troff.man | less

It is *not* the right way to quickly select an ASCII device.  To
override the used macros for the output character set we need a new
option.

Using `-a' is comparable to dvi2tty or similar converters.

> We have to think about uniting my idea on design of preprocessor
> and Ukai's idea of '.encoding "encoding-name"' in roff source.
> 
>  - it is the preprocessor that handles the ".encoding" .
>  - priority is that
>    * --input-encodings wins.
>    * .encoding is next.
>    * then falling into the default (locale-sensible for i18n OS
>      and latin-1 for non-i18n OS).

Exactly.  Compare this to the Emacs model of `local variables'.

Note that such an encoding request has to determine the encoding *and*
character set of a document (similar to Emacs).

I suggest that we don't use `.encoding' but

  -*- charset-encoding: xxx -*-

in the first comment block (almost similar to Emacs).  troff shouldn't
notice encoding issues at all and just accept UTF-8.

If really necessary, we can add two additional commands to select
encoding and character separatedly:

  -*- charset: ...; encoding: ... -*-

Examples:

  .\" -*- charset: JIS-X-0208; encoding: EUC -*-

  .\" -*- charset: JIS-X-0208; encoding: ISO-2022 -*-


    Werner



Reply to: