[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation

> A.1. At present troff accepts 8-bit input, i.e. recognises 256
> distinct entities in the input stream (with a small number of
> exceptions which are "illegal").

We need at least 20 bit (for Unicode BMP + surrogates) and the special
characters.  A 32bit wide number is thus the right choice IMHO.

> It does not really matter that these are interpreted, by default, as
> iso-latin-1.

I plan to remove the hard-coded `charXXX' values, moving them to macro

> A.2. The direct correspondence between input bytes and characters is
> defined in the font files for the device.

But this isn't the right place.  Input character stuff should not be
there at all.

> I don't see anything wrong (except possibly in ease of use) in
> creating an ASCII input stream in order to generate Japanese output.

Not everything can resp. should be handled on the glyph level, for
example hyphenation.

> Preparation of an output stream to drive a device capable of
> rendering the output is the job of the post-processor (and, provided
> you have installed appropriate font definition files, I cannot think
> of anything that would be beyond the PostScript device "devps").

As mentioned in another mail we have to extend the metric directives
to cope with the many CJK characters without make troff too slow.

> A: It follows that troff is already language-independent, for all
> languages whose typographic conventions can be achieved by the
> primitive mechanisms already present in troff. For such languages,
> there is no need to change troff at all. For some other languages,
> there are minor extra requirements which would require small
> extensions to troff which would not interact with existing
> mechanisms.

Correct.  The changes we are discussing only affects the input
character level and not the troff engine itself (except additional
typesetting features for CJK and possibly other languages).

> I think that troff's hard-wired set of ligatures should be replaced
> by a user-definable set.


> Some characters have different glyphs at the beginning, the middle,
> or the end of words; and so on.

Usually, such changes involve contextual analysis which I won't
implement.  In case a preprocessor is doing this, it has to directly
send glyph entities to troff.  So this isn't a problem.

> B: Troff should be able to cope with multi-lingual documents, where
> several different languages occur in the same document. I do NOT
> believe that the right way to do this is to extend troff's capacity
> to recognise thousands of different input encodings covering all the
> languages which it might be called upon to typeset (e.g. by Unicode
> or the like).

This is done by a preprocessor and not visible to troff itself.  troff
will see Unicode only.

> Troff's multi-character naming convention means that anything you
> could possibly need can be defined, and given a name in the troff
> input "character set" whenever you really need it, so long as you
> have the device resources to render the appropriate glyph.

There are only 256 `multi-characters' named `charXXX'.  Everything
else are glyph entities (even if they behave like a character in most
cases).  The reality is that groff doesn't really make a difference
between a character and a glyph, and it has high priority to me to
implement this distinction.  I'll probably start with renaming a lot
of troff internals.

> If you want to use a multi-byte encoding in your input-preparation
> software, you can pre-process this with a suitable filter to
> generate the troff input-sequences you need (I have done this with
> WordPerfect multinational characters, for instance, which are
> two-byte entities).

This filter will be the yet-to-come preprocessor.

> CONCLUSION: Troff certainly needs some extensions to cope with the
> typesetting demands of some languages (of which the major ones that
> I can think of have been mentioned above). I also believe that there
> are some features of troff which need to be changed in any case, but
> these has nothing to do with language or "locale".

Locales support only affects pre- and postprocessors.


Reply to: