announce of patch to support CJK in AW (fwd)
Plese test it.....:)
---------- Forwarded message ----------
Date: Sat, 28 Oct 2000 00:33:43 +0500 (SAMST)
From: Vlad Harchev <hvv@hippo.ru>
To: abiword-dev@abisource.com
Cc: Belcon <rainfall@yeah.net>, hj <huangj@citiz.net>,
Chih-Wei Huang <cwhuang@linux.org.tw>
Subject: announce of patch to support CJK in AW
Here is a location of the patch:
http://www.hippo.ru/~hvv/abiword/aw-cjk.diff.gz
This patch can be cleanly applied over vanilla 0.7.11 patched with the
following patches (I hope they are still there):
ftp://seviorpc.ph.unimelb.edu.au/pub/abi-oct24-cvs.patch.gz
ftp://seviorpc.ph.unimelb.edu.au/pub/wv-oct24-cvs.patch.gz
What's there:
100% of the HJ's patch logic is there. The code and logic was greatly cleaned
up compared to original patch and should be compilable (didn't test) on any
platform (HJ's patch was making use of glib in xp code). Also, the logic of
HJ's is disabled if current locale is not CJK one. That was extensively tested
with Dom's german document and (in full, all aspects) with russian.
The added functionality:
* UT_Wctomb and UT_Mbtowc are used for converting between various charsets
from now. They use iconv internally (so now they became working, and portable
and also allow to chose input/output encoding). All usage of iconv everywhere
(except wv) correctly swaps bytes of UCS (correct order is detected at
runtime).
* Thanks to HJ, AW emits only necessary fonts to the .ps when printing. It
reduced the size of .ps file generated by AW by 5 times for one font-enriched
document of me (title for some paper).
* Fixed bug with spellchecker ("replace" button didn't work).
* Now AW looks for more files kinda ${prefix}//AbiSuite/AbiWord/system.profile
- also ${prefix}/AbiSuite/AbiWord/system.profile-${SUFFIX}, for the following
values of ${SUFFIX}:
'language', 'charset', 'language-Country', 'language-Country.charset'
This allows to ship language-specific defaults (e.g. metric system or
name of spellchecker dictionary).
* As for fonts, AW now tries to load fonts from the following subdirectories
of ${prefix}/AbiSuite/fonts
'language', 'charset', 'language-Country', 'language-Country.charset'
This should solve CJK's people problems (before this patch, AW was looking
only in subdirectory 'charset').
Fonts of 'fonts.dir' format should be placed in them. Under CJK locales,
the file with list of fonts is also named 'fonts.dir', but it has the same
extended format as HJ's 'fonts.hj' has. Consider this when trying.
* If GNOME_XML2 is undefined UnknownEncodingHandler is set on XML parser
in ie_imp_XML.cpp
* Some translator's names added to CREDITS.TXT
* Just to underline: support for "wrap-at-any-CJK-letter" logic of layout
is already there too (thanks to HJ).
I think that this patch can be committed since it doesn't break anything
under non-CJK locale (at least if you did 'make clean' after applying).
I ask CJK people to test the following, in the order of precedence:
* Input of CJK letters in various charsets. It should work. Insure twice that
gtk+ is installed correctly, that fonts are in the font path, etc.
Currently you have to have all CJK fonts AW uses available in fontpath
before the start of AbiWord wrapper (it's not yet updated to look into all
subdirectories AW looks now).
* Cutting and pasting immediately. If this doesn't work, then test RTF importers
and exporters (AW uses them internally for cut/paste). Very minor tweaks
would be needed to make it working if it doesn't work. If cutting/pasting
works, try exporting/importing of RTF and testing it with other apps (e.g.
word).
* Cutting/pasting to/from other apps and saving/reading plain text files.
It should just work.
* Saving and loading of native AW file format. Should work. If import of .abw
doesn't work, then try removing "encoding=FOO" from the 1st line of it.
* Printing. Since 100% of HJ's logic is there, it should just work.
* Export to html. Should work. If it doesn't tell what changes are needed to
make browsers understanding it (keep in mind that xhtml importer should be
able to read produced file).
* Checking that export of CJK texts to LaTeX works if correct prologue is
added to exported document. That prologue should be added to the tables in
xap_EncodingManager.cpp
* import of CJK doc files (most probably it won't work due to wv's singlebyte
encoding limitations). wv should be hacked to allow importing .doc files.
What won't work with CJK text:
* export to WML, DocBook. I just don't know how to specify charset name in
these formats.
* import of XHTML (html importer assumes UTF8) and DocBook.
* export to Word. It doesn't work for Latin1 yet, so forget it.
* No other-than-unix specific code is touch, so CJK support is in the same
state on platforms other than unix.
* Spellchecking of CJK texts. Does it ever makes sense? English words can be
spellchecked inside CJK text.
Donations/fees are appreciated.
If anybody needs it, I can provide commercial support and extension of this
work.
Enjoy.
I'm going to bed, so I won't be able to read/post/hack in next 11 hours.
Best regards,
-Vlad
Reply to: