[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#872778: xterm -lc (with UTF-8 locale) cannot properly copy some utf-8 unicode chars



On Wed, Sep 19, 2018 at 08:17:54PM +1000, Zenaan Harkness wrote:
> On Mon, Sep 17, 2018 at 04:47:28AM -0400, Thomas Dickey wrote:
> > On Wed, Aug 22, 2018 at 08:05:51PM +1000, Zenaan Harkness wrote:
> > > Create a text file containing e.g. the musical natural symbol, and
> > > the mathematical function symbol, e.g. "ƒƒƒ ♮♮♮" (three function
> > > symbols, a space, and three natural symbols, inside plain quotes).
> > > 
> > > Now in an xterm -lc instance, with a UTF-8 locale, cat the file.
> > > 
> > > xterm displays the function and the natural symbols.
> > > 
> > > Now start the utf-8 compatible gui editor Geany, and open the same
> > > file in Geany.
> > > 
> > > Copy and paste those characters from Geany, into Geany - works.
> > > 
> > > Copy from Geany, paste to xterm - this also works.
> > > 
> > > Select/copy from xterm, middle-click paste into Geany - only the
> > > natural symbols, and not the function symbols, are pasted, also
> > > pasting to xterm (from copying from xterm) does not work.
> > > 
> > > SO, xterm is not properly copying some UTF-8 Unicode characters.
> > 
> > This update is unrelated to the original report, which deals with
> > characters past BMP (the example uses U+0192 and U+266E).
> > 
> > I have not been able to reproduce the problem.
> >  
> >  See also:
> > > https://lists.debian.org/debian-user/2017/09/msg00518.html
> > > https://lists.debian.org/debian-user/2017/09/msg00527.html
> > > 
> > > Should I file a different bug for this, or just leave this here?
> > 
> > It might be related to #901249, but I cannot say.  The other client
> > (Geany) seems to be a factor - if you can reproduce the problem with
> > xsel, that would be helpful.  copy and paste rely on the source to
> > provide the data in different formats, and the target to request
> > what's appropriate.
> 
> OK, so I've tested just using xsel:
> 
> The string I start with is "# ƒƒ ♮♮" without the quotes, and that
> should appear as:

None of your comments mention the locale you're using.  The manual page
description of "-lc" and the associated "locale" resource goes into some
detail, mentioning that it uses LC_CTYPE.  If you have some unexpected
value for that, you'll get unexpected results.
> 
> hash space function function space natural natural
> 
> In vim in xfce4-terminal (to write this email), that sequence pastes
> correctly.
> 
> Now, in xfce4-terminal, after selecting those chars, xsel -o
> correctly dumps them.
> 
> Jumping immediately to xterm -lc, then:
> 
>   xsel -o -also- correctly dumps those chars to the xterm.
> 
> That's good.
> 
> Next, select those chars in xterm, and xsel -o no longer dumps the
> function symbols;
> 
> That's not good.
> 
> xfce4-terminal now has the same problem with xsel -o NOT dumping the
> function symbols, as does middle click pasting into geany -
> SO, in my setup at least, the problem is copying the function symbol
> -from- xterm (copying from other apps, such as geany and from vim in
> xfce4-terminal, and straight from xfce4-terminal, all works
> correctly for xsel -o (in both xfce4-terminal and xterm -lc).

I made a shell script to check 901249, and adapted it to this report.
Running that, I don't see any discrepancies (that exercises both primary
and clipboard).
 
> According to https://en.wikipedia.org/wiki/%C6%91 this "function
> symbol" is actually called the "florin sign", but in any case has the
> code U+0192 which seems well within the 16-bit code plane.
> 
> 
> Here's what a little test run looks like in xterm -l (I've bound the
> function symbol to my keyboard so I can type it successfully):
> 
> $ echo ƒƒƒƒ
> ƒƒƒƒ
> $ # select above string, and:
> $ xsel -o
> $ 
> $ # now middle click:
> $ ?????^C
> $ # now select from xfce4-terminal, then come back here:
> $ xsel -o
> ƒƒƒƒ$ 
> $ # now middle click:
> $ ƒƒƒƒ

I tried this also, without seeing a problem.

> Thomas is there any other test I can run on Debian stable?

fwiw "locale" says

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

and "env|grep -E '(LANG|LC_)'" says

LANG=en_US.UTF-8
GDM_LANG=en_US.UTF-8

-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
ftp://ftp.invisible-island.net

Attachment: signature.asc
Description: Digital signature


Reply to: