Re: XLC_LOCALE for Big5HKSCS
On Wed, May 31, 2000 at 09:59:19AM -0400, Thomas Chan wrote:
> On Wed, 31 May 2000, Roger So wrote:
>
> > Although I have an XLC_LOCALE for Big5HKSCS that works for me at the
> > moment, it requires users to have a single font that includes both
> > standard Big5 and HKSCS extension characters. I have been lucky to
> > have such a "unified" font; however I don't think it would be useful
> > for other people since generally HKSCS fonts are distributed
> > separately from standard Big5 fonts.
> >
> > I'm now working on an XLC_LOCALE that lets X use standard Big5 fonts
> > for "standard" characters (0xA1xx-0xF9xx), and extended fonts for
> > other characters. However I don't know how to get X to distinguish
> > between these two character ranges.
>
> I don't know either, but maybe you are looking for something like
> the "font-linking" in Mozilla or Java, if one is to use more than two
> font files.
I wish it was that simple... *sigh*
>From what I gather in the "X Locale Database Definition", each "font
set" corresponds to a "character set". Each "character set" is
distinguished from each other by bitwise-anding the character with
the wc_encoding_mask. For example, the XLC_LOCALE for zh_TW.Big5
has a wc_encoding_mask of 0x00008000.
However, from a quick glance through XFree86's source, it seems that
different character sets can also be classified directly by byte
ranges. That's what I've done in my zh_HK.Big5HKSCS, although I'm
not sure whether it works or not. If it does, then the rest should
be pretty simple.
> Perhaps the Big5 + HKSCS situation can be handled like how the Japanese
> EUC-JP encoding works, which includes the JIS X 0208 and JIS X 0212
> character sets, but there are separate font files for those two character
> sets (with different XFLD names). Or the Taiwanese EUC-TW encoding, which
> has the CNS 11643 character set, but each plane is a separate font file
> with its own XFLD. In other words, keep the Big5 font as one font with a
> "big5" XFLD and the HKSCS font as a second font with a "hkscs" XFLD or
> something, and define a new character set like "big5hkscs" which has to
> know to get glyphs from those two fonts/xfld's. Just as one wouldn't use
> JIS X 0212 alone or CNS 11643 plane 5 alone, one isn't going to use HKSCS
> by itself.
I think it would be best to keep Big5 fonts with a "big5" XLFD, and
the Big5HKSCS as "big5hkscs". (note that it's Big5HKSCS, not just HKSCS,
as there is also an HKSCS for ISO-10646/Unicode 3.0. Confused? :p)
> And a question of my own...
>
> I do have a question about the ranges, though--does the basic Big5 ranges
> include the ETen extensions in C6A1-C8D3 (kana and other symbols), and
> F9D6-F9FE (seven hanzi and boxes)? It seems like a de facto extension,
> and some "Big5" fonts out there include some or part of it. (Ones from
> Microsoft don't include the C6A1-C8D3 part, but only the latter.) Other
> fonts include it but are marked with "ETen" in the XFLD somewhere, and
> even some marked that way are actually missing them.
Well, Big5 was never a proper standard anyway; everyone just add bits
here and there. It seems that the ETen "extensions" are the most widely
used, and the current zh_TW.Big5 XLC_LOCALE definition actually includes
the two ranges you mentioned above. Whether the fonts include them
seems to be based on pure luck though. :p Although I know most TrueType
fonts follow Microsoft/Dynalab of not including the kana range.
> HKSCS in http://www.digital21.gov.hk/eng/hkscs/download/e_hkscs.pdf
> apparently includes all of the Eten extensions (see pages 2-41 to 2-44)
> as part of its definition, as well as adding new stuff to the C8 row.
> I recall that Big5+ also included the Eten extensions as well.
If you look at it carefully, you'll notice that Big5HKSCS's definition
of the C6-C7 range actually differs from ETen's. This has got me
very confused, and had me producing wrong iconv modules that can't
do round-trip conversion for characters in that range.
> So, does this mean that the ETen extensions are a "de facto" standard,
> reinforced by post-Big5 standards that include it as a subset, and we
> assume "Big5" fonts without them are "broken"?
Certainly not...
> What if one is using one
> of these "Big5" fonts that don't have all of the ETen extensions, in
> conjunction with a HKSCS font, but because of the way the
> coderanges are defined, one can't make use of the available glyphs in
> the latter?
That would be bad; yes, I shall modify my Big5HKSCS XLC_LOCALE definition
to map C6-C8 to Big5HKSCS. It should be defined this way anyway since
Big5HKSCS practically redefines the C6-C8 range.
I'm not sure of what to do with F9D6-F9FE though; it would be hard/
costly to tell them apart from other Big5 characters.
--
Roger So telnet://e-fever.org
spacehunt at e-fever dot org SysOp, e-Fever BBS
GnuPG 1024D/98FAA0AD F2C3 4136 8FB1 7502 0C0C 01B1 0E59 37AC 98FA A0AD
Reply to: