[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation

> However, thank you for explaining glyph.  I also understand you 
> understand problems on Japanese character codes well. 

Well, I'm the author of the CJK package for LaTeX, I've written a
ttf2pk converter, and I'm a member of the FreeType core team :-)

> Note that CJK ideographs also has distinction between character and
> glyph.  The most famous example is two variants of a 'tall or high'
> character.  Japanese people regard these two as the same in daily
> use but Japanese people regard these two as different if they are
> used in person's names or so on.

I know these problems too well -- AFAIK, in JIS X 0208 these two
variants are unified.  Do you know details about the new JIS X 0213

> I don't know how Chinese and Korean people treat them.  It may be
> different.  However, IMHO, we should neglect this problem now since
> there are so far no standard to treat these variants properly.
> Though it is important, it is not in our scope.

If you are working on a terminal you need a character set which
distinguishes the two forms.

> > A `glyph code' is just an arbitrary registration number for a glyph
> > specified in the font definition file.
> Then the 'font definition file' will be irrationally large.  I think
> at least CJK ideographics and Korean precompiled Hanguls have to be
> treated in different way.  (Ukai has already pointed this problem.
> jgroff uses 'wchar<EUCcode>' for glyph names of Japanese
> characters.)

Right.  I think I've answered this problem in my last mail (regarding
a `glyphclass' directive in font description files).

> A problem.  When compiled within internationalized OS, the names for
> encodings (for iconv(3) and so on) is implementation-dependent (You
> know, there are many implementation-dependent items in standard
> C/C++ language).  A solution is: we can have a hard-coded
> translation table between implementation-dependent encoding names
> and macro names for -m.  The table must be changed by OS (by
> './configure' script or so).  A minimal table will be translate
> every implementation-dependent encoding names into 'ascii' macro,
> since almost encodings in the world are superset of ASCII.  A full
> table for a OS will cover the list generated by 'iconv --list'.

I don't think so.  For example, we could restrict to MIME character
set tags which are standardized.

> Since the '-m' option is generated by groff and passed to troff,
> groff has to have '#ifdef I18N' code.  (or, the code can be
> integrated to the preprocessor if we design the preprocessor to
> invoke troff.)

Indeed, the default behaviour should be that the preprocessor adds a

  .mso tmac.<charset>

line or something similar to the document, but there must be a
possibility to override it manually.


Reply to: