[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hebrew: General question about location of shortcut indicator "&" and reading direction



Dear Stephen,

thanks a lot. Your explanation helped me a bit.

How is this represented as bytes on the data disc?

As an simple example lets assume "ABC" is a word in Left-to-Right.
Making it a Hebrew word (e.g. via translation) it would be written "CBA"
because its read from Right-to-left, starting with "A", then "B" and "C" at the end.

Am I right so far?

No lets add such a shortcut indicator to the first letter (the "A").
Weblate and Qt seems to use the correct BIDI algorithm and will display it correctly like this:

    "CB&A" (or an underlined "A" in a Qt GUI)

But a terminal without using the correct BIDI algorithm shows it like this:

    "&CBA"

I am aware that a unicode character consist of multiple bytes. Usually it starts with 2 bytes and then there can come additional characters to it. I remember the emoticon example of an black astronaut: human+rocket+black (or something like this). But please lets keep it simple and don't open the unicode box to much. I assume there is a hidden control character indicating the read direction?
So what is in the file?

    &ABC

or

    &CBA

I do guess it is the first (&ABC), right? It is coded into unicode that the A the B and the C need to be read the "other way around"?
So the IO algorithm read something like this?

    & reverted-A reverted-B reverted-C

Or even in Python:

   myletters = ['&', 'A', 'B', 'C']

   # but myleters[1:] are somehow coded as "other way around"

OK?

Kind
Christian


Reply to: