[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: XLC_LOCALE for Big5HKSCS



On Wed, May 31, 2000 at 09:59:19AM -0400, Thomas Chan wrote:
> On Wed, 31 May 2000, Roger So wrote:
> 
> > Although I have an XLC_LOCALE for Big5HKSCS that works for me at the
> > moment, it requires users to have a single font that includes both
> > standard Big5 and HKSCS extension characters.  I have been lucky to
> > have such a "unified" font; however I don't think it would be useful
> > for other people since generally HKSCS fonts are distributed 
> > separately from standard Big5 fonts.
> > 
> > I'm now working on an XLC_LOCALE that lets X use standard Big5 fonts
> > for "standard" characters (0xA1xx-0xF9xx), and extended fonts for
> > other characters.  However I don't know how to get X to distinguish
> > between these two character ranges.
> 
> I don't know either, but maybe you are looking for something like
> the "font-linking" in Mozilla or Java, if one is to use more than two
> font files.

I wish it was that simple... *sigh*

>From what I gather in the "X Locale Database Definition", each "font
set" corresponds to a "character set".  Each "character set" is 
distinguished from each other by bitwise-anding the character with
the wc_encoding_mask.  For example, the XLC_LOCALE for zh_TW.Big5
has a wc_encoding_mask of 0x00008000.

However, from a quick glance through XFree86's source, it seems that
different character sets can also be classified directly by byte
ranges.  That's what I've done in my zh_HK.Big5HKSCS, although I'm
not sure whether it works or not.  If it does, then the rest should
be pretty simple.

> Perhaps the Big5 + HKSCS situation can be handled like how the Japanese
> EUC-JP encoding works, which includes the JIS X 0208 and JIS X 0212
> character sets, but there are separate font files for those two character
> sets (with different XFLD names).  Or the Taiwanese EUC-TW encoding, which
> has the CNS 11643 character set, but each plane is a separate font file
> with its own XFLD.  In other words, keep the Big5 font as one font with a 
> "big5" XFLD and the HKSCS font as a second font with a "hkscs" XFLD or
> something, and define a new character set like "big5hkscs" which has to
> know to get glyphs from those two fonts/xfld's.  Just as one wouldn't use
> JIS X 0212 alone or CNS 11643 plane 5 alone, one isn't going to use HKSCS
> by itself.

I think it would be best to keep Big5 fonts with a "big5" XLFD, and 
the Big5HKSCS as "big5hkscs". (note that it's Big5HKSCS, not just HKSCS, 
as there is also an HKSCS for ISO-10646/Unicode 3.0.  Confused? :p)

> And a question of my own...
> 
> I do have a question about the ranges, though--does the basic Big5 ranges
> include the ETen extensions in C6A1-C8D3 (kana and other symbols), and
> F9D6-F9FE (seven hanzi and boxes)?  It seems like a de facto extension,
> and some "Big5" fonts out there include some or part of it.  (Ones from
> Microsoft don't include the C6A1-C8D3 part, but only the latter.)  Other
> fonts include it but are marked with "ETen" in the XFLD somewhere, and
> even some marked that way are actually missing them.

Well, Big5 was never a proper standard anyway; everyone just add bits
here and there.  It seems that the ETen "extensions" are the most widely
used, and the current zh_TW.Big5 XLC_LOCALE definition actually includes 
the two ranges you mentioned above.  Whether the fonts include them
seems to be based on pure luck though. :p  Although I know most TrueType
fonts follow Microsoft/Dynalab of not including the kana range.

> HKSCS in http://www.digital21.gov.hk/eng/hkscs/download/e_hkscs.pdf
> apparently includes all of the Eten extensions (see pages 2-41 to 2-44)
> as part of its definition, as well as adding new stuff to the C8 row.
> I recall that Big5+ also included the Eten extensions as well.

If you look at it carefully, you'll notice that Big5HKSCS's definition
of the C6-C7 range actually differs from ETen's.  This has got me
very confused, and had me producing wrong iconv modules that can't
do round-trip conversion for characters in that range.

> So, does this mean that the ETen extensions are a "de facto" standard,
> reinforced by post-Big5 standards that include it as a subset, and we
> assume "Big5" fonts without them are "broken"?

Certainly not...

> What if one is using one
> of these "Big5" fonts that don't have all of the ETen extensions, in
> conjunction with a HKSCS font, but because of the way the
> coderanges are defined, one can't make use of the available glyphs in
> the latter?

That would be bad; yes, I shall modify my Big5HKSCS XLC_LOCALE definition
to map C6-C8 to Big5HKSCS.  It should be defined this way anyway since
Big5HKSCS practically redefines the C6-C8 range.

I'm not sure of what to do with F9D6-F9FE though; it would be hard/ 
costly to tell them apart from other Big5 characters.


-- 
  Roger So                                            telnet://e-fever.org
  spacehunt at e-fever dot org                          SysOp, e-Fever BBS
  GnuPG  1024D/98FAA0AD  F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD



Reply to: