Re: [Cjk] Is CJK-4.4.0 out? (TeXLive6-CJK.tar.bz2?)

To: foka@debian.org
Cc: cjk@ffii.org, debian-chinese-gb@lists.debian.org
Subject: Re: [Cjk] Is CJK-4.4.0 out? (TeXLive6-CJK.tar.bz2?)
From: Werner LEMBERG <wl@gnu.org>
Date: Wed, 16 May 2001 12:59:19 +0200 (CEST)
Message-id: <20010516.125919.35013737.wl@gnu.org>
In-reply-to: <20010515141753.B6384@lovelife.olvc.ab.ca>
References: <20010514162822.A11819@lovelife.olvc.ab.ca> <20010515.065650.30184624.wl@gnu.org> <20010515141753.B6384@lovelife.olvc.ab.ca>

> Speaking of cjk-4.4.0, will it include gbklatex and bg5platex sh
> scripts?

utils/extconv/{bg5+latex,gbklatex}

> > I'm wondering why you need chpfb at all.  The
> > ttf2pt1-chinese-3.3.2.tar.gz archive from ttf2pt1.sourforge.net
> > already contains these mapping tables.
> 
> Hehe.  :-) It is just that ttf2pt1 currently names all the glyphs
> /c0 /c1 /c2.... /a /b /c /d /e /f /g etc. instead of the nice
> /cjk????  that ttf2pfb (and chpfb's modified ttf2pfb) uses.  (But
> that's probably just cosmetic.  :-)

You should contact the ttf2pt1 team to make your suggestions.  The
best solution would be to have AGL compliant glyph names,
i.e. /uni???? except for already defined glyph names.  For details see

  http://partners.adobe.com/asn/developer/typeforum/unicodegn.html

> Also, ttf2pt1 seems to be generating a whole lot of warning
> messages.

They can be disabled.  I've used `-W 2'.

> > I don't know.  While dvips would benefit from this conversion,
> > pdftex doesn't need it basically.  There were some discussions
> > recently on the pdftex mailing list, showing how to include a CJK
> > ttf into a PDF document, making it searchable also.  Nothing for
> > the faint-harted currently...
> 
> Wow, that would be wunderbar!  Do you know if they would implement
> this soon?

It is already implemented; below is a mail from Otfried Cheong
<otfried@cs.uu.nl>, describing how to do it.  Another mail from Petr
Sojka (below also) describes how to add a proper CID Cmap to make the
file searchable -- this is the part which is a bit complicated.

> And, OTOH, is it technically possible for dvips to eventually do the
> same thing, i.e. convert and embed TTF font data to a PostScript
> file on-the-fly?

If you find someone who is willing to implement this...

    Werner

======================================================================

Here's a very quick tour of what I did.  I wanted to make 'yoonm.ttf'
accessible from TeX using Unicode encoding (Unicode.sfd). So I run 

ttf2tfm yoonm.ttf yoonm@Unicode@

This generates yoonm00.tfm, yoonm01.tfm, ... yoonmff.tfm.  For each
subfont you also need an *.enc file, which simply maps the code points
in the subfont to glyph indices in yoonm.ttf.  E.g., yoonmac.enc looks
like this:

/YoonmacEncoding [
  /index123 /index124 /index728 /index248
  ...
] def

As you may have guessed /index<n> refers to the glyph with glyph index <n>.

Now the only missing item is the map file 'yoonm.map':

yoonm00 <yoonm.ttf <yoonm00.enc
yoonm01 <yoonm.ttf <yoonm01.enc
yoonm02 <yoonm.ttf <yoonm02.enc
...
yoonmff <yoonm.ttf <yoonmff.enc

Add yoonm.map to pdftex.cfg (or insert it into one of the existing
maps), put the .tfm and .enc files in the pdftex path, and you are
ready to use 'yoonm.ttf' using CJK.

This is really all -- no subfonts need to be generated.  Whenever the
TeX code refers to a raw TeX font yoonmXX, pdftex synthesizes an
embedded TTF font by subsetting 'yoonm.ttf' (and so there will be
quite a few disjoint subsets of the same original font in the PDF
output).

I'm really happy with this system, it makes working with TTF fonts so
much easier.  I still need to figure out the best way for adding
/ToUnicode maps -- right now I'm thinking it has to go in the .fd
files -- but then it'll be perfect.

 > > The .enc files contain the same information as the .sfd file, but
 > > using glyph indices instead of char codes.  I've just extended
 > > ttf2tfm so that together with the .tfm files it writes the .enc
 > > files.  I guess one could make it create the /ToUnicode cmap files
 > > and the .map file for pdftex as well.
 > 
 > Please send me the patch.

I can bring it from home, but it seems it's broken.  I'm sure I didn't
do the vertical variant handling right, because I didn't even notice
it was there.  Probably you can do it better starting without my
patch...

 > I fear this won't be possible.  For example, the current version from
 > ttf2tfm uses some OpenType features to get vertical glyph
 > representation forms for pseudo-vertical typesetting (i.e. rotated
 > glyphs) from a CJK font if such an OpenType table is available.  The
 > character codes are the same, but you'll get different glyph indices.

Aha, I didn't realize this.  Well, I'm happy to live with the system
as it is if I know why :-)

Otfried

======================================================================

> Certainly one can use Werner's CJK package, or hlatex etc.,
> using several Type1 8bit fonts to cover a single 16bit font.  But the
> resulting PDF files are essentially encrypted - it is possible to view
> and print them, but one cannot copy and paste, or search for text in
> the file, because the viewer has no idea what the character codes mean.

It has been possible since Jul 2000; take your CMAP resources, or Adobe's from
http://partners.adobe.com/asn/developer/technotes/acrobatpdf.html
and put them in the PDF file using \pdffontattr command
(i don't know whether Thanh already managed putting a note
about it into the manual :-( ):

>From thanh@informatics.muni.cz Sat Jul  8 16:48 MET 2000
Subject: tounicode
To: sojka@anxur.fi.muni.cz (Petr Sojka),

pridal jsem novy primitiv \pdffontattr, ktery umozni pridat /ToUnicode do
Font dict. Priklad pouziti:

==================== cut here ==========================
\font\f=ptmr8r\f % *nesmi* byt virtualni font

% obsah CMap objektu (kopirovano z t1.pdf):
\immediate\pdfobj{%
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (NewsSerifEE-Roman+0) /Ordering (T1UV) /Supplement 0 >> def
/CMapName /NewsSerifEE-Roman+0 def
1 begincodespacerange <01> <0a> endcodespacerange
10 beginbfrange
<01> <01> <010D>
<02> <02> <0078>
<03> <03> <00E9>
<04> <04> <011B>
<05> <05> <0161>
<06> <06> <0159>
<07> <07> <017E>
<08> <08> <00FD>
<09> <09> <00E1>
<0a> <0a> <00ED>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end}

\pdffontattr\f{/ToUnicode \the\pdflastobj\space 0 R} % pridat /ToUnicode
==================== cut here ==========================

I don't know any font handling macropackage that supports it, so you should
specify \pdffontattr for every "cut&pasteable&searchable" raw font
manually, or write macros for it.

> Here is a possible design: Extend the syntax of font map files with
> the same subsetting implemented by ttf2tfm:
> 
> ntukai@<subsetting spec file>@ ntukai.ttf

\pdffontattr seems more flexible.

> There is a much simpler route to enabling copy-and-paste and text
> searching, at least in theory: one can add a "ToUnicode" character map
> to each of the subfonts.  Viewers that correctly implement the PDF
> specification should then be able to provide search and copy.  But
> does this really work in practice?

Yes, searching does work, and cut&paste under windows platform only, 
(tested under win2000 with Czech fonts [not all Czech characters
are in AdobeStandardEncoding, so we need it] as there is not
clipboard equivalent under X-windows AFAIK :-(.

Hope helps.

--ps

-- 
| This message was re-posted from debian-chinese-big5@lists.debian.org
| and converted from big5 to gb2312 by an automatic gateway.

Reply to:

References:
- Re: [Cjk] Is CJK-4.4.0 out? (TeXLive6-CJK.tar.bz2?)
  - From: Anthony Fok <foka@debian.org>

Prev by Date: Re: View chinese in Pan..
Next by Date: Corel Linux
Previous by thread: Re: [Cjk] Is CJK-4.4.0 out? (TeXLive6-CJK.tar.bz2?)
Next by thread: chpfb-1.3.1 Big5 patch
Index(es):
- Date
- Thread