[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

announce of patch to support CJK in AW (fwd)



Plese test it.....:)

---------- Forwarded message ----------
Date: Sat, 28 Oct 2000 00:33:43 +0500 (SAMST)
From: Vlad Harchev <hvv@hippo.ru>
To: abiword-dev@abisource.com
Cc: Belcon <rainfall@yeah.net>, hj <huangj@citiz.net>,
     Chih-Wei Huang <cwhuang@linux.org.tw>
Subject: announce of patch to support CJK in AW

Here is a location of the patch:
    http://www.hippo.ru/~hvv/abiword/aw-cjk.diff.gz
    
This patch can be cleanly applied over vanilla 0.7.11 patched with the
following patches (I hope they are still there):
    ftp://seviorpc.ph.unimelb.edu.au/pub/abi-oct24-cvs.patch.gz
    ftp://seviorpc.ph.unimelb.edu.au/pub/wv-oct24-cvs.patch.gz
    
What's there:
100% of the HJ's patch logic is there. The code and logic was greatly cleaned 
up compared to original patch and should be compilable (didn't test) on any
platform (HJ's patch was making use of glib in xp code). Also, the logic of 
HJ's is disabled if current locale is not CJK one. That was extensively tested
with Dom's german document and (in full, all aspects) with russian.

The added functionality:
* UT_Wctomb and UT_Mbtowc are used for converting between various charsets
 from now. They use iconv internally (so now they became working, and portable 
 and also allow to chose input/output encoding). All usage of iconv everywhere
 (except wv) correctly swaps bytes of UCS (correct order is detected at 
 runtime).

* Thanks to HJ, AW emits only necessary fonts to the .ps when printing. It
 reduced the size of .ps file generated by AW by 5 times for one font-enriched
 document of me (title for some paper).

* Fixed bug with spellchecker ("replace" button didn't work).

* Now AW looks for more files kinda ${prefix}//AbiSuite/AbiWord/system.profile
  - also ${prefix}/AbiSuite/AbiWord/system.profile-${SUFFIX}, for the following
  values of ${SUFFIX}:
      'language', 'charset', 'language-Country', 'language-Country.charset'
  This allows to ship language-specific defaults (e.g. metric system or 
  name of spellchecker dictionary).
  
* As for fonts, AW now tries to load fonts from the following subdirectories 
  of ${prefix}/AbiSuite/fonts
      'language', 'charset', 'language-Country', 'language-Country.charset'
  This should solve CJK's people problems (before this patch, AW was looking
  only in subdirectory 'charset').
    Fonts of 'fonts.dir' format should be placed in them. Under CJK locales,
  the file with list of fonts is also named 'fonts.dir', but it has the same
  extended format as HJ's 'fonts.hj' has. Consider this when trying.
  
* If GNOME_XML2 is undefined UnknownEncodingHandler is set on XML parser
  in ie_imp_XML.cpp
  
* Some translator's names added to CREDITS.TXT

* Just to underline: support for "wrap-at-any-CJK-letter" logic of layout 
  is already there too (thanks to HJ).

   I think that this patch can be committed since it doesn't break anything 
under non-CJK locale (at least if you did 'make clean' after applying).
   
   I ask CJK people to test the following, in the order of precedence:
* Input of CJK letters in various charsets. It should work. Insure twice that
  gtk+ is installed correctly, that fonts are in the font path, etc.
  Currently you have to have all CJK fonts AW uses available in fontpath
  before the start of AbiWord wrapper (it's not yet updated to look into all
  subdirectories AW looks now).
* Cutting and pasting immediately. If this doesn't work, then test RTF importers
  and exporters (AW uses them internally for cut/paste). Very minor tweaks 
  would be needed to make it working if it doesn't work. If cutting/pasting
  works, try exporting/importing of RTF and testing it with other apps (e.g. 
  word).
* Cutting/pasting to/from other apps and saving/reading plain text files.
  It should just work.
* Saving and loading of native AW file format. Should work. If import of .abw
  doesn't work, then try removing "encoding=FOO" from the 1st line of it.  
* Printing. Since 100% of HJ's logic is there, it should just work.
* Export to html. Should work. If it doesn't tell what changes are needed to
  make browsers understanding it (keep in mind that xhtml importer should be
  able to read produced file).
* Checking that export of CJK texts to LaTeX works if correct prologue is 
  added to exported document. That prologue should be added to the tables in 
  xap_EncodingManager.cpp
* import of CJK doc files (most probably it won't work due to wv's singlebyte 
  encoding limitations). wv should be hacked to allow importing .doc files.

 What won't work with CJK text:
* export to WML, DocBook. I just don't know how to specify charset name in 
    these formats.
* import of XHTML (html importer assumes UTF8) and DocBook.
* export to Word. It doesn't work for Latin1 yet, so forget it.
* No other-than-unix specific code is touch, so CJK support is in the same 
    state on platforms other than unix.
* Spellchecking of CJK texts. Does it ever makes sense? English words can be 
 spellchecked inside CJK text.

 Donations/fees are appreciated. 
 If anybody needs it, I can provide commercial support and extension of this 
 work. 
 
 Enjoy.

 I'm going to bed, so I won't be able to read/post/hack in next 11 hours.

 Best regards,
  -Vlad



Reply to: