[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Using .XCompose



Hi All,

I will start with what I did in sequence

Using Konsole from my home directory

1)executed setxkbmap -layout us

Naturally, since it is after all the topic of your thread, you are
primarily interested in troubleshooting the XCompose mechanism.

And so you have set up known environmental conditions for subsequent
tests of that mechanism.

Because I am dim/one-track-minded, it took me a while to understand
that this is the purpose of step (1) above.

2)executed xmodmap which gave the following output

xmodmap:  up to 4 keys per modifier, (keycodes in parentheses):

shift       Shift_L (0x32),  Shift_R (0x3e)
lock        Caps_Lock (0x42)
control     Control_L (0x25),  Control_R (0x69)
mod1        Alt_L (0x40),  Alt_R (0x6c),  Meta_L (0xcd)
mod2        Num_Lock (0x4d)
mod3      
mod4        Super_L (0x85),  Super_R (0x86),  Super_L (0xce),  Hyper_L (0xcf)
mod5        ISO_Level3_Shift (0x5c),  Mode_switch (0xcb)

I notice in passing that your modifier key setup here is identical to
my own, with one exception,

  lock        ISO_Next_Group (0x42)

which permits me to toggle my keyboard layout between two alternatives
("us" and "ru") by striking capslock.

I also notice that clobbering my "ru" alternate layout with the
command you issued in step (1) above does not change the output I get
when I issue the command in step (2).

So I am prone to conclude that whatever your layout may have been
prior to step (1), it was not a dual-layout setup.

(This relates more to my own curiosity than to your primary concern.)

3)executed xmodmap -pk > xmm

The file named xmm is attached

In that keymap table I can find no keys that enter nonbreaking spaces.

And though it does not have to do with the test you are conducting
here, I do remain curious about whether your day-to-day keymap table
*does* include such mappings. It is entirely possible that it does.

I imagine at this point you may be able to work this out for yourself,
if it interests you.

4) executed setxkbmap -print which gave the following 

xkb_keymap {
        xkb_keycodes  { include "evdev+aliases(qwerty)" };
        xkb_types     { include "complete"      };
        xkb_compat    { include "complete"      };
        xkb_symbols   { include "pc+us+inet(evdev)"     };
        xkb_geometry  { include "pc(pc104)"     };
};

5) execute grep $'\x00A0' .XCompose. All lines from the .XCompose file
were listed.

Others have pointed out that here you were simply printing *all* lines
in the file.

So, I replaced the .XCompose file and retyped the three lines (with
only space typed using the space bar of the kyboardbetween the
letters) and executed the grep command. Again it returned all
lines. So, I replaced .XCompose file with just the W line. Again
that line was reported by grep. So, I abandoned Kate and built one
using cat > .XCompose followed by the line <W> : "This replaces W",
followed by Enter and Ctrl+D. The grep command returned the line.

The ineffective grep command in step (5) seems to be a creation of
your own. Perhaps it is a synthesis of

 $ grep $'\xc2\xa0' sometextfile

together with various information independently learned elsewhere.

As others pointed out, it was *nearly* (but crucially not quite) the
effective replacement

 $ grep $'\u00a0' sometextfile

In APPENDICES at the bottom of this message, I make a few observations
about the two (effective) grep commands immediately above.

I noticed that there is $ sign before the search string, which I
couldn't understand. I removed it and re-executed the new grep
command grep '\x00A0' .XCompose. Now it doesn't return the line

Others have given concise explanations. See appendices below for a
longer one.

It is cool you experiment with what you don't understand. That is how
mistakes are made, and mistakes are the best teacher of all.

This is probably a good spot to recommend a coherent and
comprehensive, well-curated beginner's guide to bash:

 BashGuide - Greg's Wiki
 http://mywiki.wooledge.org/BashGuide

It is a shame if good learning resources are not used.

6) The command grep "W" .XCompose | tr $'\xc2\xa0' \! returns 
grep "W" .XCompose | tr $'\xc2\xa0' \!

This does not seem to make any sense. Where is the output?

It looks to me like you may have pasted a copy of the command, where
you meant to paste its output.

What you seem to be *showing* us, is that the command produced as
output a copy of itself. But you don't seem to be *saying* this, or
reasoning as if that were so, since whatever the output was, further
below you seem inclined to think that it indicated the file contained
no nonbreaking spaces.

Anyways, for a number of reasons that particular command line is a
poor method for rendering nonbreaking spaces visible. Greg explained
one of the reasons: tr maps bytes (octets of bits), and doesn't
understand that some characters are composed of more than one octet.

It is good to know your prefered text editor can display nonbreaking
spaces distinctively, and that you know how to turn on that option.

In case you some time want to do a mass translation of nonbreaking
spaces to some other character (let's say '%' this time) from the
shell command line, then doing

 $ sed 'y/\xc2\xa0/%/' somefile

should print contents of somefile, but with each nonbreaking space
(which is represented in that example as a pair of C-style
byte-constants, \xc2 and \xa0, which sed correctly understands denote
together a single character) replaced with a '%' character.

Note also that in that command, you can just substitute a literal
nonbreaking space for the sequence '\xc2\xa0' if you know (or can
figure out) how to type a literal nonbreaking space with your
keyboard.

(However, one advantage of using the C-style byte-constants (\xHH)
instead is that it is easy for everyone to see what they are, the web
archive won't replace them with normal spaces, etc.)

Anyways, it is good to know that sed seems to understand multi-octet
characters, unlike tr.

Also, notice that sed itself --unlike tr and grep-- can interpret a
\xHH byte-constant. We don't need to arrange for the shell to do this
on sed's behalf. See the discussion in the appendices below, if that
sounds mysterious. (sed does not, however, understand character
constants of the form \uHHHH, as far as I can tell from local
experiment and reading its documentation.)

If you have both the package called "info" and sed installed, like I do

 $ dpkg-query -l
 Desired=Unknown/Install/Remove/Purge/Hold
 | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 ||/ Name                Version           Architecture Description
 +++-===================-=================-============-=================================================
 ii  info                6.5.0.dfsg.1-4+b1 amd64        Standalone GNU Info documentation browser
 ii  sed                 4.7-1             amd64        GNU stream editor for filtering/transforming text
 ii  texinfo-doc-nonfree 6.5.0-1           all          texinfo and info documentation that is non-free

then you can do

 $ info sed

to read some documentation about sed. Do

 $ info sed "sed scripts" "other commands"

to jump straight to a brief description of the 'y///' sed operation
used in that example.

However ---and believe me I realise this sounds completely ludicrous
(because it is indeed completely ludicrous)--- comfortable use of the
info command itself is somewhat of an acquired skill, and in order to
read locally the full documentation meant to go with info one must
install the package

 texinfo-doc-nonfree - texinfo and info documentation that is non-free
   This package provides documentation in info and html format for the
   texinfo and info packages. See the respective packages for more
   explanation.

from the "non-free" component of debian.

Once that package is installed, one can do

 $ info info # Without texinfo-doc-nonfree, a nearly pointless man page.

And by this point, we are truly in Douglas Adams' Hitchhiker's Guide
to the Galaxy territory:

 "But the plans were on display..."
 "On display? I eventually had to go down to the cellar to find them."
 "That's the display department."
 "With a flashlight."
 "Ah, well, the lights had probably gone."
 "So had the stairs."
 "But look, you found the notice, didn't you?"
 "Yes," said Arthur, "yes I did. It was on display in the bottom of a
 locked filing cabinet stuck in a disused lavatory with a sign on the
 door saying 'Beware of the Leopard'."

So, maybe you would like to forget all of that and just read the sed
info pages in html form on the web:

 https://www.gnu.org/software/sed/manual/
 https://www.gnu.org/software/sed/manual/html_node/index.html

Direct link to synopsis of what y/// does:

 https://www.gnu.org/software/sed/manual/html_node/Other-Commands.html#Other-Commands

Also, with sed installed, you will have on your system the directory
/usr/share/doc/sed, which contains at least the compressed text
document "sedfaq.txt.gz", which you can read with

  $ zless /usr/share/doc/sed/sedfaq.txt.gz # note the 'z' in "zless"

Near the top of that file is probably a url for an online html version
of the same document. You might find it more comfortable to navigate
as html in a web browser.

Regarding the "info" tool. Many GNU project tools use it for user
documentation, and if you are going to use those tools you may find it
worthwhile spending --at some point in time-- the time necessary to
get acquainted with use of the info browser.

But it is hilarious --truly Hitchhiker's Guide level hilarity-- how
baroque a gatekeeper the info browser seems to be, in the beginning,
when one merely wants to get on with reading the documentation
published in info pages about grub, or sed, or bash, etc.

7) I typed W (holding down shift and pressing w). The Konsole (the
instance from which I issued the command as well as a new instance)
did nothing except displaying the W. I typed W in Kate with the same
result.

Interesting. I wonder what's going on.

Is it safe to say that the .XCompose file doesn't have the nonbreaking
spaces?

You have more evidence than we do, regarding this. You saw output that
somehow did not make it into your message to the list.

But more importantly, you have learned how to tell your prefered text
editor to show you nonbreaking spaces. So you can check for yourself.

> You can print all lines of sometextfile which contain them by doing
> this:

> $ grep $'\xc2\xa0' sometextfile

Is the $ just before the search string in single quotes in the grep
command intentional? If so, what is its significance?

Covered this in Appendix B below.

> '!' marks the spot of nonbreaking spaces that made it into OP's first
> report of odd behavior, upon testing the white scissors XCompose rule:

Could you please see if any of the lines in this message too behaves
similarly?

I'll send a second reply illuminating them, sure. (To have shoehorned
that into this message would have been non-consensually psychedelic.)

If so, isn't possible that the mail system adds them? 

Anything is possible. Nothing wrong with keeping an open mind until
the facts are illuminated. Healthier, probably.

BTW, I understand that OP refers to me; but, what exactly does it
stand for? Original Petitioner?

David Wright I believe explained this one already. But I do agree with
you that Original Petitioner would be much funnier.

  Wednesday, 6 July 2020. Now comes A****, Original Petitioner from
  K****a, with a humble request that The Committee hear that one's
  plea --for a limited waiver of The Keymap Hygiene Decree of 1972--
  pertaining to use of a personal device for strictly non-public
  purpose.

As a post-hoc justification of the practice, an OP is frequently a
newer personality, in the experience of a reader of the mailing
list. What is most salient about an OP in a given discussion, and them
alone, is their maximal proximity (in some sense at least) to the
subject of that discussion.

In other words, when you know fifty other things about somebody, their
name begins to take on meaning. Until then, it's just a handle.

OP, on the other hand, always conveys meaning in this sort of forum.

> So those examples, fi and ½, illustrate the difference between
> modification on output and input in English. I can't judge how the
> OP views this, nor whether they are contravening some conventions
> in their own computer culture by trying to make their changes.

But OP, wHaT WoUlD tHe CoMmUnItY tHiNk about your keyboard layout?

Have you petitioned your local Keyboard Zoning Board for a custom
keymap easement and filed the necessary declarations?

[snip]
> May I ask how you happened to find the post about providing linux
> support for the Breton keyboard
> https://dominiko.livejournal.com/20206.html

I found it by searching the net

Well, cool. All this has helped me make a new stab at reading the
unicode standard. Abstract material can become a more compelling
subject when you have concrete examples in mind.

> Try again, for firefox-esr (and with a ~/.XCompose file that is not
> befouled with nonbreaking spaces).

> But make one change to the procedure. When you launch firefox-esr,
> do so like this:

> $ env GTK_IM_MODULE=xim firefox-esr
> Let us know how that goes.

Yet to do this. Will inform

Cool.


APPENDICES

TLDR: I waste some space here and point out a few things for fun:

  A. grep uses regexes, but you can tell it not to
  B. any command (grep included) gets its arguments from the shell
  C. the two equivalent grep command lines above
        i.  rely on the shell's interpretation of their first arg
	ii. need not interpret their first arg as a regex to work

My aim here is to clarify a few specific things about how the grep
command

 $ grep $'\xc2\xa0' sometextfile

works, since OP seems intent to know that sort of thing.

[begin overly looong exposition]

%% A. grep uses Regular Expressions, but you can tell it not to

(TLDR: If you do not intend the pattern in grep's first argument to be
a Regular Expression, then use the -F option to avoid surprises.)

I do not know whether you are familiar with what are called "Regular
Expressions", often abbreviated as regex.

If you *are*, then good: grep typically uses them as a pattern to
filter a file line-by-line, usually so that it can print only those
lines that match the expression.

grep will assume that its first argument is such an expression, unless
you tell it otherwise.

But if you are *not* familiar with Regular Expressions, until you do
become familiar with them you should know that while *most* characters
that can appear in a regular expression do stand for themselves, not
all characters do (and the precise set of these last tends to vary
from regex-handling tool to regex-handling tool). Rather, a select few
are interpreted as special operators. (For example the character '.'
stands for any character.)

You can tell grep *not* to treat its first (non-option) argument as a
regular expression by giving it the option flag "-F" (which grep's
manual suggests is mnemonic for "Fixed-string mode"). For example:

 $ grep abc.efg # get lines that contain "abc" + any character + "efg"

 $ grep -F abc.efg # get lines that contain the string "abc.efg"

I have refrained from explaining here much of anything about regexes,
except how to tell grep you don't intend to give it one. But I
recommend learning more about them, if you have not already. They are
a useful abstraction when processing text. Many tools use them.


%% B. Any command (grep included) gets its arguments from the shell

That is to say, the shell gets the command line you enter first
--before grep ever sees it-- and will replace any parts that have
special meaning to the shell before it hands any arguments over to
grep.

In this command we rely on that very fact

 $ grep $'\xc2\xa0' sometextfile

to pass grep a single nonbreaking space as its filtering pattern.

Basically, when bash encounters

 $'somestuff'

it puts on its C costume and performs "C-style backslash
expansion". It replaces inside the sequence

 somestuff

certain backslash escape sequences with a corresponding
character:

 $ echo $'wello\n\thurled' # Expand \n to newline, \t to tab
 wello
 	hurled
 $ echo $'wello\x0a\x09hurled' # Same octets as above
 wello
 	hurled
 $

And if it hits a null byte (like '\x00' in C-style hex notation or
'\000' in octal) before encountering the single-quote that delimits
the end of somestuff, it considers the sequence complete, and ties off
the string.

 $ echo $'well\0\x0a\x09hurled' # \0 expands here to null octet/byte
 well
 $

You now ask: *Which* backslash escape sequences? *Which* corresponding
characters?

And all that is above my pay grade. So I'll simply quote here from
bash(1), under the section QUOTING, this passage:

  Words of the form $'string' are treated specially.  The word expands
  to string, with backslash-escaped characters replaced as specified
  by the ANSI C standard.  Backslash escape sequences, if present, are
  decoded as follows:

    \a     alert (bell)
    \b     backspace
    \e
    \E     an escape character
    \f     form feed
    \n     new line
    \r     carriage return
    \t     horizontal tab
    \v     vertical tab
    \\     backslash
    \'     single quote
    \"     double quote
    \?     question mark
    \nnn   the eight-bit character whose value is the octal value nnn (one
         to three octal digits)
    \xHH   the eight-bit character whose value is the hexadecimal value HH
         (one or two hex digits)
    \uHHHH the Unicode (ISO/IEC 10646) character whose value is the
         hexadecimal value HHHH (one to four hex digits)
    \UHHHHHHHH
         the Unicode (ISO/IEC 10646) character whose value is the
         hexadecimal value HHHHHHHH (one to eight hex digits)
    \cx    a control-x character

  The expanded result is single-quoted, as if the dollar sign had not been present.

And that *last* sentence means that aside from the backslash
expansions listed above, bash performs no further interpretation of
the enclosed string.

Why does it mean that? Because a sequence enclosed in bare single
quotes, with no prefixed '$' character, is protected from *any*
interpretation by bash.

So if you want bash to pass --as an argument to a command-- something
you enter on the command line without any doing interpretation first,
enclose that something in single quotes.

 $ grep -F '&amp;' somefile.txt  # without quotes, '&' and ';' chars would have special meaning to shell

I believe you've already been pointed to

 BashGuide/SpecialCharacters - Greg's Wiki
 http://mywiki.wooledge.org/BashGuide/SpecialCharacters

which discusses single quotes, among other characters bash treats
specially.


%% C. Tying (A) and (B) together with the present example

Notice that since nonbreaking space is just a normal character and not
a special operator inside a regex, these two amount to the same thing:

  $ grep -F $'\xc2\xa0' sometextfile
  $ grep $'\xc2\xa0' sometextfile

When you *know* you don't want (or need) grep to interpret a pattern
as a regex (for whatever reason), -F is a sensible option to use. In
this case, it was not necessary, but wouldn't have hurt anything
either.

[end of overly looong exposition]


--
Ce qui est important est rarement urgent
et ce qui est urgent est rarement important
-- Dwight David Eisenhower

Reply to: