[thomas@atlas.datexx.com: Re: comments...]

To: Debian-Chinese <debian-chinese@lists.debian.org>
Subject: [thomas@atlas.datexx.com: Re: comments...]
From: Anthony Wong <hajime@asunaro.dhs.org>
Date: Fri, 2 Jul 1999 16:05:03 +0800
Message-id: <[🔎] 19990702160503.B9787@asunaro.dhs.org>
Hi all,

Thomas sent this mail to me, I'm now forwarding to the list...

----- Forwarded message from Thomas Chan <thomas@atlas.datexx.com> -----

From: Thomas Chan <thomas@atlas.datexx.com>
Reply-To: Thomas Chan <tc31@cornell.edu>
To: Anthony Wong <hajime@asunaro.dhs.org>
Subject: Re: comments...

On Thu, 1 Jul 1999, Anthony Wong wrote:
 
> |     Debian and/or Linux-related:
> |     - Debian only has GB2312 fonts, and the GB input methods are probably
> |       only geared towards GB2312; existing software may be hardcoded to GB2312
> |       coderanges
> |     - a GBK font could probably be made by remapping the 24x24 Big5+ font in the
> |       xfntbig5p-cmex24m package (stable/main/binary-i386/x11), but mainland
> |       China (gov't) is picky about glyph design, so we should get a proper font
> |       and input methods, preferably one that is approved with a certificate
> |       like ??? (I found one on the www.dynalab.com.hk site before, but can't find it now)
> |
> |     Conclusions:
> |     - we're behind the world in this area! (high priority)
> 
> IHMO the major problem for the GBK support is the lack of font. (actually,
> this is true to any encoding). Unluckily this problem is very
> difficult to solve because DFSG-free fonts are very rare. But most people
> lack of the skill to make them. Software support of GBK should not be
> difficult to fix though.

I agree with the font problem--there's a lot of fonts (for different
character sets) at places like ftp.ifcss.org, but with license problems.
(I'm not sure if some of them weren't just taken right out of DOS-era
Chinese software.)

I hope there is a reference font that can be used, like the Big5+ font
from CMEX that Anthony Fok packaged.  Maybe some of the mainlanders on
debian-chinese will be more familiar with possibilities.  (Frankly, I
never paid much attention to GB 2312 or GBK until I discovered GBK had as
many characters as Unicode.)

It doesn't help that mainland China mandates the exact design of glyphs,
so just remapping the CMEX Big+ font would work, but may not be
acceptable.  (I hope there aren't offshore software development
legal issues, either!  Nadine Kano's book says Microsoft worked on
simplified Win 3.1 without first contacting the government, and ran intoo 
trouble; then they later put out a government-approved Win 3.2.)  Ken
Lunde's book mentions some mainland bitmap font glyph standards, and
somewhere on the DynaLab HK site I saw a scan of a certificate.

The CMEX Big5+ .zip files included a mapping table to CNS 11643, GBK, and
Unicode, so maybe I'll try this sometime...
 

[Big5+]
> |     Conclusions:         
> |     - it'd be cool to be among the first OS's to support this fully
> 
> I think we should pursue this. Do you know what kinds of characters
> has been added into Big5+? (Sigh, all the documents from CMEX are in
> Word 7.0...)

Basically whatever is in Unicode that isn't alreaady in Big5.  The
easiest way to get an idea is to do "xfd -fn cmex24m", if you have the
xfntbig5p-cmex24m .deb installed.

Word 7.0?  I thought I was able to read them with an earlier version of
Word...

Does mswordview work on them?
 

[GCCS]
> |     Conclusions:
> |     - we need this if we want Debian to do well in Hong Kong
> 
> Agree (as I'm a Hongkonger :)

I made a BDF font out of the stuff at http://www.info.gov.hk/gccs/, which
seems to work for the most part (aside from a minor snag--I have to invert
the pixels).  It works fine in crxvt, but Netscape and cxterm grok it
because of the codepoint ranges.  However, it would have to be used to
patch existing Big5 fonts--maybe taipei16 and taipei24?  (This wouldn't
hurt anyone--no one seems to use user-defined characters on Unix, and the
Taiwan users who just use the core Big5+ETen extensions wouldn't notice
the addition, just like no one has noticed the extra characters in the
CMEX Big5+ font.  This does get in the way of expanding Big5 fonts
to Big5+, though.)

I think there are licensing/redistribution problems with the stuff at the
above URL, though. :/  I wonder if DynaLab HK would be willing to
contribute anything--they have quite a bit of stuff on GCCS (well, they
are the only ones I know of who even seem to be aware of it), and they
provided the glyphs for the CMEX Big5+ font and the font used to print the
CJK pages in the hardcopy Unicode book.


> |4) ime-other
> |   Contains less commonly IME's.  e.g., Wade-Giles for inputting
> |   Mandarin (some non-Chinese might still use this), dictionary
> |   indices ("Kangxi page 545, third character..."), 4 corner, etc.                        
> |
> |For the naming, I avoided "gb" and "big5" because these may change
> |in the future (e.g., unicode).  I also avoided "jianti" and "fanti"
> |because GBK, Big5+, and Unicode can do both jianti and fanti.
> |(GB = jianti and Big5 = fanti are both no longer true.)  I also avoided
> |listing "chinese" anywhere, because one can use a character set/encoding
> |for other languages.  e.g., a hypothetical ime-ja package with a Pinyin IME
> |for EUC-JP, for people in Japan (this already exists in some third-party
> |products for Windows).
> 
> However, the fact is that the data format of input methods that
> different programs use are not the same. Cxterm uses it own formats,
> and xcin uses another one. The best thing we can do is to make a
> centralized repository of 'raw' input method data, and suggest authors
> of Chinese software, like cxterm and xcin, to refer to our repository.

Well, the compiled input files are different, but the human-readable text
files of the uncompiled are rather similar.  We could either have say a
cangjie.deb that contains precompiled files for cxterm and xcin, or a
cangjie.deb that contains an uncompiled text file and then bundle
scripts in the cxterm and xcin packages to grab whatever's in the
repository and compile it for their native format.

My concern really was about "forked" versions of input methods.

 
> |cedict{b5,gb} - Any way to download updates/additions to the dictionary
> |without downloading it all over again in entirety?  i.e., patches.
> |Currently it is small (409K), but if it grows in the future like its
> |inspiration, the Japanese->English EDICT dictionary (I believe packaged
> |for Debian-JP), it could become very huge (EDICT is several megs at this
> |point).
> 
> This is a very interesting point. Not only cedict, but also any other
> large packages like xfonts. This involves modifying the current
> Debian's upload/download and ftp infrastructure, but it seems this is
> very useful. This will take some time to accomplish.

Would distributing patch files with scripts work?  Updating a font (well,
BDF format, not compiled PCF) is simply adding or replacing "records", as
is for CEDICT.  Both of those have unique values that could be used as
keys (the BDF fonts' "STARTCHAR" line; the term in CEDICT), unlike program
source or binaries, which are context-dependent.

e.g., if taipei16 were distributed in source BDF form, then one could get
the upgrade-taipei16-with-gccs package which would via a script append
~3000+ new records, and then compile a new PCF font.

What do you think?

 
> |tcs - tcs is a character set/encoding converter from Plan9.  The Big5
> |support is for the erroneous "HKU standard"; this should be fixed.
> 
> I tried tcs before. It had problems converting GB to and from Big5. I
> hope I did nothing wrong at that time. Have you tried ccf? It's not
> packaged probably due to license problem, but it can do the conversion
> very well. I frequently use ccf to do GB/Big5/HZ conversion.

Yes, I tried tcs before to do GB<->Big5.  It was a mess.  Maybe tcs is a
dead end, since it apparently is old stuff from a dead Plan 9 system.

I'll look into ccf.

 
> |xmbdfed - xmbdfed is a font editor by Mark Leisher.  It can handle HBF
> |fonts, but the Debian package does not include it because of licensing
> |problems with the HBF code (written by someone other than Leisher).
> 
> Do you know of any way to convert between HBF and BDF? If no then may
> be we should patch xmbdfed to support HBF. As the format of HBF is
> open, I think it can be done.

HBF is not really too different from BDF (ver 2.0, which is what X uses).
It's just the BDF head, with pointers to a binary file (like distributed
with DOS-era Chinese software).  A lot of redundant information is
excised, like the size of each character.  (BDF 2.1 remedies this.)

Another thing is that HBF can point to various files, so it is possible
for two HBF files to point to different files for the hanzi, but the same
file for the kana/cyrillic/greek/etc.  (Kind of interesting--it's like
thoses .ttc TrueType fonts which contain a version with monospaced Roman
letters and a version with proportional Roman letters--e.g., mingli.ttc)
Plus with just a different HBF file, you can map the codepoints in the
binary font files differently--very interesting, since BDF font files
hardcode all of that instead of providing different character maps.  (A
cool trick that can be done in Netscape--at least on Windows--is to have a
Big5 TTF font, and view a webpage in Shift-JIS.  Via the magic of the 
Unicode cmap, the appropriate glyphs from the Big5 font are fetched, so
you can kind of read the page.)

There are specs at ftp.ifcss.org .

I wonder what the priority of this is, though--release 4.0 of XFree86 is
supposed to add built-in Truetype support, isn't it?
 

> |yudit - yudit is a Unicode-based multi-language editor.  Upstream author
> |needs help--he doesn't speak all those langauges it supports.
> 
> Wow, authors of yudit are quite capable. 'Helping yudit' should be an
> wishlist item in our TODO list.
> 
> |
> |xemacs (mule) - Chinese support could be better too.
> 
> Can you be more specific? I don't use mule so I have no idea...

Hmm, another priority question--do Chinese users even use this stuff, or
is everyone a fan of xcin and cxterm? :) xemacs w/ mule only provides
Cangjie and Pinyin, I think, and only for Big5 and GB, I think.  Via ISO
2022 (the 7 bit email encoding with all those escape sequences to shift to
different character sets) it provides support for even CNS 11643... I
wonder how you are supposed to input, though.  There is some mule
information at: http://www.kanji.com/kc/emacs/emacs.html

(I think mule was a fork of emacs or xemacs--I don't known which one, and
then it was absorbed into xemacs.)

 
> |dates - Support for ROC year?
> 
> May not be necessary as I can see, the demand is not very large.

Also complicates the Y2K problem. :)  "What, my computer runs on ROC
time--it doesn't have a Y2K problem!"  Yeah, right.

 
> |dynadoc - DynaDoc is a PDF-like format by DynaLab.  It's used by HKSAR
> |and perhaps also mainland China and Taiwan to publish government
> |and industry publications.  Supposedly it has built-in CJK support, including
> |GCCS.  A reader for this on Linux would be nice.
> |http://www.dynalab.com.hk/internet/index.htm
> 
> I can't find the DynaDoc format from DynaLab's website. This makes us
> very difficult to create a Linux reader, because we have to reverse
> engineering the DynaDoc format.

I was afraid that would be the case.  It's like the problem with Word .doc
proliferation... Linux people bend over backwards to read them.
 
 
> |mtv - mtv is a VCD player.  Unofficial .deb's are available from
> |http://www.mpegtv.com/ .  Uses the XForms library.  I believe it would
> |be a "non-free".  Given the popularity of VCD's in Asia, it would be nice
> |if Debian could include one (there are questions on newsgroups about where
> |to get a VCD for Linux).
> 
> Yes, VCD is popular. However on their homepage it doesn't mention that
> we can re-distribute the program. I need to send them an email for
> clarification. It would be cool if we have a VCD player :)

I'd want it for the same I'd want Netscape or Mozilla "out of the box"
with Debian.  IE gained a lot of ground through bundling ("hey, why should
I spend hours downloading Netscape when I can install IE off
cdrom--they're pretty much the same thing, aren't they?").

Maybe something can be done with ordinary mpeg video plaeyrs, if they
could be gotten to give control over the left and right audio channels
selectively.  (Otherwise, multilingual babble.)

SMPEG looks interesting:
http://www.lokigames.com/opensource/opensource.html


> |tools - It'd be nice to have tools to help users create their own input
> |methods, and convert amongst xcin/cxterm/twinbridge/windows/etc formats.
> |Support for user-defined extensions (EUDC, "end-user defined fonts", "gaiji")
> |would be nice too.  Also conversion utilities, both strict (per codepoint),
> |and between traditional<->simplified cultural standards (Office 2000 does the
> |latter, according to the ads), would be nice.
> 
> It would be nice to our users if we have those input method widgets.
> As to the conversion utilities you mentioned, are you talking about
> the tools like tcs that convert between different encodings?

Well, a bit of both.  A very enligthening article is:
http://www.basistech.com/articles/c2c.html


Now that I think about it, I also left off another interesting thing I'd
like to see in Debian--a xiangqi program!  There already are Go and Shogi
games...

Have you tried xabacus?  I think the beads start off in the wrong
position...


> |Meyer, Dirk.  "Dealing With Hong Kong Specific Characters".
> |  Multingual, vol. 9, issue 3 (April 1998), pp. 35-38.

If you want, I can scan the four pages of this article for you to read.



Thomas Chan
tc31@cornell.edu

----- End forwarded message -----

-- 
Anthony Wong.   [ E-mail: hajime@asunaro.dhs.org / ypwong@debian.org ]
Reply to:
Follow-Ups:
- Re: comments...
  - From: Anthony Wong <hajime@asunaro.dhs.org>
Prev by Date: Re[2]: some thoughts
Next by Date: Re: comments...
Previous by thread: Generate your own Debian distribution?
Next by thread: Re: comments...
Index(es):
- Date
- Thread