[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Using .XCompose



On Tue 14 Jul 2020 at 11:19:30 (+0000), Ajith R wrote:
> On Sun 12 Jul 2020 at 22:50:23 (-0500), David Wright wrote:
> 
> > OK. I wonder whether the problem you're having with using XCompose
> > is that although those three characters <U0D19> <U0D4D> <U0D19>
> > look independent of each other in the file, the keystrokes that
> > generate them might not be. Not having your layout, I don't think
> > I can test whether you get the behaviour I think you do, that
> > when you put the cursor at the beginning of a *typed* line that
> > looks like the next one and press Delete once:
> > ങ്ങ
> > you get
> > ങ
> > whereas I get
> > ്ങ
> > Is that right?
> 
> I am sorry if I didn't explain properly and for not attaching the keyboard layout (I assumed no one would want to  go through the rather boring lines. I am attaching my .XCompose, the layout file in (the variant I wrote is named mal_puthuniraA) , and the keyboard file.
> 
> To get the ligated conjunct using this layout, I type L while holding down Shift followed by f (or j) without shift and then L while holding down Shift. When these characters are typed, the program will show the ligated conjunct form if its font supports the form or else it displays the three characters separately. So, when you go to the beginning of ങ്ങ and press delete once, you should get ്ങ or the entire ങ്ങ is deleted (based on how the program treats the ligated conjnct form).

I haven't found a font that displays three characters corresponding to
the three code points. Even a font that prints blobs only prints two.
And I don't think Delete should ever delete both characters (three
code points) with one press. Even though ligated, there are two
characters there.

> There are issues in displaying ligated forms by various programs and I assume that by extension there will be problems while deleting also. More over, the < ് > is a combining mark. So, some programs will treat the charcter preceeding < ് > and the < ് > as one character.

Yes, it does seem that the "system" knows that, even where it has no
appropriate glyphs available for displaying.

In what follows, you have to bear in mind that I can't actually type
any of these characters, so everything in the command line ultimately
originated as copy/paste from your emails or a code block printout.

I've found it useful to copy this line into the command line for
experimenting:

$ hexdump -C <<<' ങ്ങ '

> In Konsole, the ligated conjunct is formed correctly, but the width calculated for display is slightly off and so the cursor is placed over the character. When I use Home key to go to the beginning of the line and then press Delete, I get ങ്. When I get to the end and press backspace also, I get the same result. If I press back space a second time, the ങ് is deleted. Note that both characters are deleted with one back space. If I move to the beginning of ങ്ങ and then press right arrow once and then press space once, I get ങ് ങ.

Yes, I can't fully explain what's going on when you press Delete once
when at the beginning of the line. What's immediately reflected is the
same as for you, but when I recall the line, I get the consonant ങ alone.
I don't know, of course, which is right. Should it preserve the
"no vowel attached" mark on a solitary consonant or not? Here:

$ hexdump -C <<<' ങ്ങ '             ← constructed with copy/paste
00000000  20 e0 b4 99 e0 b5 8d e0  b4 99 20 0a              | ......... .|
0000000c
$ 
$ hexdump -C <<<' ങ് '              ← recalled the line above, and deleted 1st char
00000000  20 e0 b4 99 20 0a                                 | ... .|
00000006
$ 
$ hexdump -C <<<' ങ '              ← recalled the line above: it's different!
00000000  20 e0 b4 99 20 0a                                 | ... .|
00000006
$ 

> In Kate, deleting from the beginning deletes the entire ങ്ങ, backspace from the end deletes the entire one character at one time giving ങ് followed by ങ.Moving to beginning and inserting a space adds the space after ങ്ങ.

In text mode, emacs treats the string as three characters, though two
are displayed, and the cursor behaves as if there were just two.
So if I put the cursor between them and press Backspace, the first
character, both consonant and attached "vowel", gets erased.
However, if I press Delete before the first character, I see  ്ങ
with an exposed "vowel".

In terms of the underlying code points, this seems entirely logical.
The only limitation is that you can't erase (with Backspace) an
attached "vowel" without also erasing the consonant it's attached to.

But in terms of writing running text, it seems to be the wrong way
round, compared with what I've read. It you type a consonant, and then
type the wrong vowel, you ought to be able to erase the vowel and type
another in its place.

> I don't think these variations in handling Indic scripts is related to the problem of Composing. But, please do check the keyboard layout I am attaching.

As I say, I can't really test *input* properly without the layout and
locales set. And I must emphasise again that I haven't modified, or
claimed to understand, the files in /usr/share/X11/xkb/. I don't know
how the odd behaviour above is generated.

BTW rather than posting your mal_puthuniraA file, you could post its
differences instead, which is much smaller, 8½KB rather than 102KB:

$ diff -u mal_puthuniraA /usr/share/X11/xkb/symbols/in > /tmp/mal_puthuniraA.diff
$ ls -Glg mal_puthuniraA /usr/share/X11/xkb/symbols/in /tmp/mal_puthuniraA.diff
-rw-r----- 1   8401 Jul 14 17:12 /tmp/mal_puthuniraA.diff
-rw-r--r-- 1  94750 Feb 11  2019 /usr/share/X11/xkb/symbols/in
-rw------- 1 101993 Jul 14 11:21 mal_puthuniraA

> If the reason why the single line <W> : "a long sentence" in .XCompose is not working as expected is found out, I think my problem would be solved.

As I said, I'm not familiar with how you make sure that the system
rereads the file, particularly with a DE, whether you need
udevadm trigger --subsystem-match=input --action=change
or somesuch. I just restart X when I'm testing.

> > Note that the whitespace in your *attached* file (mixed tabs and
> > saces) matched my own, whereas the file here in your post does not.
> > That suggests that the 0xc2 0xa0 sequences may be a result of your
> > copy/paste operation.
> 
> By "the file here in your post" do you mean the text of the email per se?

Yes, where you pasted the contents of the file from the screen display
into the text of the message, just as I did with those hexdump
commands above.

> > again just reflects the additions in my /e/d/k.
> 
> Good to know that there are problems in my files.

No problems, I presume.

> > … because it doesn't contain any lines with \x00A0 in them.
> Ok. I understood my mistake. Now, grep $'\xc2\xa0' .XCompose doesn't return the line.
> 
> > \xHH  the eight-bit character whose value is the hexadecimal value
> >        HH (one or two hex digits)
> >  \uHHHH the Unicode (ISO/IEC 10646) character whose value is the
> >        hexadecimal value HHHH (one to four hex digits)
> >
> > So you were mixing up those two constructions, perhaps.
> 
> Yes, I am. Also, I don't understand the conversion between the two. The \x tells that two charcters that follow it are hexadecimals and \u tells that the following four hexdecimals are to be interpreted as unicode values. But, how do you derive \xc2\xa0 for \u00A0?

Character   ' '
Character name  NO-BREAK SPACE
Hex code point  00A0
Hex UTF-8 bytes  C2 A0
UTF-8 bytes as Latin-1 characters bytes  Â <A0>

https://en.wikipedia.org/wiki/UTF-8#Description

> > hexdump -C filename  will reveal exactly what's
> > in a file, hex to the left, and corresponding characters to the right.
> 
> hexdump -C .XCompose gives
> 00000000  3c 57 3e 20 3a 20 22 54  68 69 73 20 72 65 70 6c  |<W> : "This repl|
> 00000010  61 63 65 73 20 57 22 0a                           |aces W".|
> 00000018
> 
> I am attaching the .XCompose file,; so, I am not redirecting the output to a file and attaching.

[files are attached to later emails]

Cheers,
David.


Reply to: