[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#476957: (fwd) Bug#476957: texlive-xetex: Sinhala language support



On 21 Apr 2008, at 11:55 am, Anuradha Ratnaweera wrote:

On Sun, Apr 20, 2008 at 11:08 PM, Norbert Preining <preining@logic.at> wrote:

 If you have the *smallest* doubts let me know...

Adding Harshula to the CC list.

If you are looking for doubtful areas in the patch, check the
following.  The rest of the patch is *adding* Sinhala related
variables, functions and switch conditions.  Even this change makes
sure ZWJ is not discarded, which shouldn't a problem for others
scripts that doesn't use it.

--- layout/LEFontInstance.cpp
+++ layout/LEFontInstance.cpp
@@ -75,7 +75,7 @@
         return 0xFFFF;
     }

-    if (mappedChar == 0x200C || mappedChar == 0x200D) {
+    if (mappedChar == 0x200C) {
         return 1;
     }


        Anuradha
--
http://www.sayura.net/anuradha/

Yes, I realize the patch touches very little existing functionality, as it is adding support for a new script rather than modifying an existing one. One other part that might interact somehow would be the change to the state table:

--- texlive-bin-2007.orig/build/source/libs/icu-xetex/layout/ IndicReordering.cpp +++ texlive-bin-2007/build/source/libs/icu-xetex/layout/ IndicReordering.cpp
@@ -326,14 +346,15 @@
{ 1, 1, 1, 5, 8, 3, 2, 1, 5, 9, 5, 1, 1, 1}, // 0 - ground state {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}, // 1 - exit state {-1, 6, 1, -1, -1, -1, -1, -1, 5, 9, 5, 5, 4, -1}, // 2 - consonant with nukta - {-1, 6, 1, -1, -1, -1, -1, 2, 5, 9, 5, 5, 4, -1}, // 3 - consonant + {-1, 6, 1, -1, -1, -1, -1, 2, 5, 9, 5, 5, 4, 11}, // 3 - consonant {-1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, 7}, // 4 - consonant virama {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}, // 5 - dependent vowels {-1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}, // 6 - vowel mark {-1, -1, -1, -1, -1, 3, 2, -1, -1, -1, -1, -1, -1, -1}, // 7 - ZWJ, ZWNJ {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 4, -1}, // 8 - independent vowels that can take a virama {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, 10, 5, -1, -1}, // 9 - first part of split vowel - {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, 5, -1, -1} // 10 - second part of split vowel + {-1, 6, 1, -1, -1, -1, -1, -1, -1, -1, -1, 5, -1, -1}, // 10 - second part of split vowel + {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 7, -1} // 11 - <ct> <zwj>

 };

This adds the new state 11, and also changes one of the transitions in the existing state 3 (in order to use the new state). So it presumably could affect the processing of certain sequences in any Indic script. I'm not saying this is wrong, or even that it would make any difference to the actual results for other scripts; it's probably fine. I simply haven't studied it in order to understand what is really happening here. I notice that it has similarities to the newer version in ICU 3.8.1, but is not identical to that (3.8.1 has transitions from both states 2 and 3 to this new state, and also inserts another new state related to vowels; it also extensively revises the ground state row of the table). But to feel really confident about all this, I'll need to understand the Indic shaping engine better.

Jonathan




Reply to: