Re: comments...

To: Debian-Chinese <debian-chinese@lists.debian.org>
Subject: Re: comments...
From: Anthony Wong <hajime@asunaro.dhs.org>
Date: Fri, 2 Jul 1999 16:10:35 +0800
Message-id: <[🔎] 19990702161035.C9787@asunaro.dhs.org>
In-reply-to: <[🔎] 19990702160503.B9787@asunaro.dhs.org>; from Anthony Wong on Fri, Jul 02, 1999 at 04:05:03PM +0800
Thomas Chan wrote:
|
|> IHMO the major problem for the GBK support is the lack of font. (actually,
|> this is true to any encoding). Unluckily this problem is very
|> difficult to solve because DFSG-free fonts are very rare. But most people
|> lack of the skill to make them. Software support of GBK should not be
|> difficult to fix though.
|
|I agree with the font problem--there's a lot of fonts (for different
|character sets) at places like ftp.ifcss.org, but with license problems.
|(I'm not sure if some of them weren't just taken right out of DOS-era
|Chinese software.)

I have done an extensive research on the licenses of these fonts, and
found that Most of them don't have clear licenses. I guess that the
'taipei' fonts are actually converted from the Eten fonts. They can't be 
uploaded to Debian, but I'll package them anyway, and put them on my
homepage for people to download. I'm somewhat pragmatic and I don't
think there will be any people suing me :)


|[GCCS]
|> |     Conclusions:
|> |     - we need this if we want Debian to do well in Hong Kong
|> 
|> Agree (as I'm a Hongkonger :)
|
|I made a BDF font out of the stuff at http://www.info.gov.hk/gccs/, which
|seems to work for the most part (aside from a minor snag--I have to invert
|the pixels).  It works fine in crxvt, but Netscape and cxterm grok it
|because of the codepoint ranges.  However, it would have to be used to
You mean Netscape and cxterm _don't_ grok it?

|patch existing Big5 fonts--maybe taipei16 and taipei24?  (This wouldn't
|hurt anyone--no one seems to use user-defined characters on Unix, and the
|Taiwan users who just use the core Big5+ETen extensions wouldn't notice
|the addition, just like no one has noticed the extra characters in the
|CMEX Big5+ font.  This does get in the way of expanding Big5 fonts
|to Big5+, though.)
|
|I think there are licensing/redistribution problems with the stuff at the
|above URL, though. :/  I wonder if DynaLab HK would be willing to
|contribute anything--they have quite a bit of stuff on GCCS (well, they
|are the only ones I know of who even seem to be aware of it), and they
|provided the glyphs for the CMEX Big5+ font and the font used to print the
|CJK pages in the hardcopy Unicode book.

I also think that it may not be legal to convert the GCCS fonts to BDF and
then redistribute it, or just take the "HK glyphs" from the BDF font and
add them into existing fonts.
BTW, do you know this site? http://www.5c.org/

DynaLab HK may be willing to contribute something, we won't know if
we don't ask them. On the CLE mailing-list I learned that DynaLab is
willing to contribute 1 or 2 Chinese TTF. But it's very likely that
they don't contain the GCCS characters.

[IME]
|> However, the fact is that the data format of input methods that
|> different programs use are not the same. Cxterm uses it own formats,
|> and xcin uses another one. The best thing we can do is to make a
|> centralized repository of 'raw' input method data, and suggest authors
|> of Chinese software, like cxterm and xcin, to refer to our repository.
|
|Well, the compiled input files are different, but the human-readable text
|files of the uncompiled are rather similar.  We could either have say a
|cangjie.deb that contains precompiled files for cxterm and xcin, or a
|cangjie.deb that contains an uncompiled text file and then bundle
|scripts in the cxterm and xcin packages to grab whatever's in the
|repository and compile it for their native format.
|
|My concern really was about "forked" versions of input methods.

Personally I have not met the "forked" version problems, but it may
happen. The best way to prevent this, as I said, is to make a
repository (web page or FTP site) to collect, combine, refine, the
input methods that we find from various places.

|> |cedict{b5,gb} - Any way to download updates/additions to the dictionary
|> |without downloading it all over again in entirety?  i.e., patches.
|> |Currently it is small (409K), but if it grows in the future like its
|> |inspiration, the Japanese->English EDICT dictionary (I believe packaged
|> |for Debian-JP), it could become very huge (EDICT is several megs at this
|> |point).
|> 
|> This is a very interesting point. Not only cedict, but also any other
|> large packages like xfonts. This involves modifying the current
|> Debian's upload/download and ftp infrastructure, but it seems this is
|> very useful. This will take some time to accomplish.
|
|Would distributing patch files with scripts work?  Updating a font (well,
|BDF format, not compiled PCF) is simply adding or replacing "records", as
|is for CEDICT.  Both of those have unique values that could be used as
|keys (the BDF fonts' "STARTCHAR" line; the term in CEDICT), unlike program
|source or binaries, which are context-dependent.
|
|e.g., if taipei16 were distributed in source BDF form, then one could get
|the upgrade-taipei16-with-gccs package which would via a script append
|~3000+ new records, and then compile a new PCF font.
|
|What do you think?

Doing so would violate the Debian policy, because a package cannot
modfiy files owned by other packages. So, the 'update-taipei16'
package cannot modify the taipei16 file owned by 'intlfonts-chinese'.
Modifying the current FTP infrastructure may be unavoidable, but I
think it's valuable to achieve this.

What I think is, on Debian's FTP server, there are 'patches' of
packages that let you upgrade from one version to the next one, just
like the kernel patches.
For example, on the FTP server you can find the file
'intlfonts-chinese_patch_1.1-3_1.2-1'. By using this file, you can
upgrade intlfonts-chinese from 1.1-3 to 1.2-1. This can be done by
xdelta, which can do 'binary' patching, unlike the program 'patch'
that can only do text file patching.

|> |tcs - tcs is a character set/encoding converter from Plan9.  The Big5
|> |support is for the erroneous "HKU standard"; this should be fixed.
|> 
|> I tried tcs before. It had problems converting GB to and from Big5. I
|> hope I did nothing wrong at that time. Have you tried ccf? It's not
|> packaged probably due to license problem, but it can do the conversion
|> very well. I frequently use ccf to do GB/Big5/HZ conversion.
|
|Yes, I tried tcs before to do GB<->Big5.  It was a mess.  Maybe tcs is a
|dead end, since it apparently is old stuff from a dead Plan 9 system.
|
|I'll look into ccf.

I remember that Anthony Fok, another Debian developer, told me that he
also had problems in using tcs. I have already packaged ccf, which is
originally part of the BeTTY program. If you need the deb I can mail
it to you.

|I wonder what the priority of this is, though--release 4.0 of XFree86 is
|supposed to add built-in Truetype support, isn't it?

It's rumored that 4.0 will have truetype support, but I'm not sure.
XFree86 lags behind from the 'real world' too far as we still don't
have truetype support in X. I'm quite happy with Japanese's X-TT, it
support Chinese TTF quite well. The drawback is it's very troublesome
to edit the fonts.alias file as new capabilities have been introduced.
The truetype technology in X-TT should be merged into XFree86, why it
hasn't been done may be due to the language barrier.

|> |xemacs (mule) - Chinese support could be better too.
|> 
|> Can you be more specific? I don't use mule so I have no idea...
|
|Hmm, another priority question--do Chinese users even use this stuff, or
|is everyone a fan of xcin and cxterm? :) xemacs w/ mule only provides
|Cangjie and Pinyin, I think, and only for Big5 and GB, I think.  Via ISO
|2022 (the 7 bit email encoding with all those escape sequences to shift to
|different character sets) it provides support for even CNS 11643... I
|wonder how you are supposed to input, though.  There is some mule
|information at: http://www.kanji.com/kc/emacs/emacs.html
|
|(I think mule was a fork of emacs or xemacs--I don't known which one, and
|then it was absorbed into xemacs.)

You are quite right. Few people use mule. My logic is, if he uses
mule, he is probably an advanced user, so he has the ability to
enhance it and contribute :)

|Maybe something can be done with ordinary mpeg video plaeyrs, if they
|could be gotten to give control over the left and right audio channels
|selectively.  (Otherwise, multilingual babble.)
|
|SMPEG looks interesting:
|http://www.lokigames.com/opensource/opensource.html

haha, actually this is on my to-be-packaged list, but haven't tried it
out yet.

|> It would be nice to our users if we have those input method widgets.
|> As to the conversion utilities you mentioned, are you talking about
|> the tools like tcs that convert between different encodings?
|
|Well, a bit of both.  A very enligthening article is:
|http://www.basistech.com/articles/c2c.html

Very informative indeed. This article is very useful when we have to
enhance our existing conversion tools. Thanks for the link!

|Now that I think about it, I also left off another interesting thing I'd
|like to see in Debian--a xiangqi program!  There already are Go and Shogi
|games...

But I haven't heard of a free xiangqi program before, it would be very
nice if we have one.

|Have you tried xabacus?  I think the beads start off in the wrong
|position...

Really? I think they are correct.. But I may be wrong as I have not
used an abacus for a very long time.


|> |Meyer, Dirk.  "Dealing With Hong Kong Specific Characters".
|> |  Multingual, vol. 9, issue 3 (April 1998), pp. 35-38.
|
|If you want, I can scan the four pages of this article for you to read.

Yes, sure. It'd be nice if you can scan them and send to me :)

-- 
Anthony Wong.   [ E-mail: hajime@asunaro.dhs.org / ypwong@debian.org ]
Reply to:
References:
- [thomas@atlas.datexx.com: Re: comments...]
  - From: Anthony Wong <hajime@asunaro.dhs.org>
Prev by Date: [thomas@atlas.datexx.com: Re: comments...]
Next by Date: Please test cce...
Previous by thread: [thomas@atlas.datexx.com: Re: comments...]
Next by thread: Please test cce...
Index(es):
- Date
- Thread