On Wed, Sep 19, 2018 at 08:17:54PM +1000, Zenaan Harkness wrote: > On Mon, Sep 17, 2018 at 04:47:28AM -0400, Thomas Dickey wrote: > > On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote: > > > Create a text file containing e.g. the musical natural symbol, and > > > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function > > > symbols, a space, and three natural symbols, inside plain quotes). > > > > > > Now in an xterm -lc instance, with a UTF-8 locale, cat the file. > > > > > > xterm displays the function and the natural symbols. > > > > > > Now start the utf-8 compatible gui editor Geany, and open the same > > > file in Geany. > > > > > > Copy and paste those characters from Geany, into Geany - works. > > > > > > Copy from Geany, paste to xterm - this also works. > > > > > > Select/copy from xterm, middle-click paste into Geany - only the > > > natural symbols, and not the function symbols, are pasted, also > > > pasting to xterm (from copying from xterm) does not work. > > > > > > SO, xterm is not properly copying some UTF-8 Unicode characters. > > > > This update is unrelated to the original report, which deals with > > characters past BMP (the example uses U+0192 and U+266E). > > > > I have not been able to reproduce the problem. > > > > See also: > > > https://lists.debian.org/debian-user/2017/09/msg00518.html > > > https://lists.debian.org/debian-user/2017/09/msg00527.html > > > > > > Should I file a different bug for this, or just leave this here? > > > > It might be related to #901249, but I cannot say. The other client > > (Geany) seems to be a factor - if you can reproduce the problem with > > xsel, that would be helpful. copy and paste rely on the source to > > provide the data in different formats, and the target to request > > what's appropriate. > > OK, so I've tested just using xsel: > > The string I start with is "# ƒƒ ♮♮" without the quotes, and that > should appear as: None of your comments mention the locale you're using. The manual page description of "-lc" and the associated "locale" resource goes into some detail, mentioning that it uses LC_CTYPE. If you have some unexpected value for that, you'll get unexpected results. > > hash space function function space natural natural > > In vim in xfce4-terminal (to write this email), that sequence pastes > correctly. > > Now, in xfce4-terminal, after selecting those chars, xsel -o > correctly dumps them. > > Jumping immediately to xterm -lc, then: > > xsel -o -also- correctly dumps those chars to the xterm. > > That's good. > > Next, select those chars in xterm, and xsel -o no longer dumps the > function symbols; > > That's not good. > > xfce4-terminal now has the same problem with xsel -o NOT dumping the > function symbols, as does middle click pasting into geany - > SO, in my setup at least, the problem is copying the function symbol > -from- xterm (copying from other apps, such as geany and from vim in > xfce4-terminal, and straight from xfce4-terminal, all works > correctly for xsel -o (in both xfce4-terminal and xterm -lc). I made a shell script to check 901249, and adapted it to this report. Running that, I don't see any discrepancies (that exercises both primary and clipboard). > According to https://en.wikipedia.org/wiki/%C6%91 this "function > symbol" is actually called the "florin sign", but in any case has the > code U+0192 which seems well within the 16-bit code plane. > > > Here's what a little test run looks like in xterm -l (I've bound the > function symbol to my keyboard so I can type it successfully): > > $ echo ƒƒƒƒ > ƒƒƒƒ > $ # select above string, and: > $ xsel -o > $ > $ # now middle click: > $ ?????^C > $ # now select from xfce4-terminal, then come back here: > $ xsel -o > ƒƒƒƒ$ > $ # now middle click: > $ ƒƒƒƒ I tried this also, without seeing a problem. > Thomas is there any other test I can run on Debian stable? fwiw "locale" says LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8 and "env|grep -E '(LANG|LC_)'" says LANG=en_US.UTF-8 GDM_LANG=en_US.UTF-8 -- Thomas E. Dickey <dickey@invisible-island.net> https://invisible-island.net ftp://ftp.invisible-island.net
Attachment:
signature.asc
Description: Digital signature