Bug#988554: xterm does not render U+A7BA ... U+A7F6

To: 988554@bugs.debian.org
Cc: 988554-submitter@bugs.debian.org
Subject: Bug#988554: xterm does not render U+A7BA ... U+A7F6
From: Thomas Dickey <dickey@his.com>
Date: Mon, 17 May 2021 18:58:20 -0400
Message-id: <[🔎] 20210517225820.GA12075@prl-debianold-64.jexium-island.net>
Reply-to: dickey@his.com, 988554@bugs.debian.org
In-reply-to: <4b758010f06ef262fb1653b4400ec858@mpi-inf.mpg.de>
References: <[🔎] 162108787265.31107.1822025995177652612.reportbug@lap-21-81> <[🔎] 162108787265.31107.1822025995177652612.reportbug@lap-21-81> <[🔎] 20210516232925.GA4665@prl-debianold-64.jexium-island.net> <4b758010f06ef262fb1653b4400ec858@mpi-inf.mpg.de> <[🔎] 162108787265.31107.1822025995177652612.reportbug@lap-21-81>

On Mon, May 17, 2021 at 09:22:35AM +0200, Uwe Waldmann wrote:
> > According to https://unicodeplus.com/U+A7BA
> > 
> > 	The character Ꞻ (Latin Capital Letter Glottal A) is represented by the
> > 	Unicode codepoint U+A7BA.  It is encoded in the Latin Extended-D block,
> > 	which belongs to the Basic Multilingual Plane.  It was added to Unicode
> > 	in version 12.0 (March, 2019).  It is HTML encoded as &#xA7BA;.
> > 
> > xterm #344 is a little earlier than that.  Its fallback copy of wcwidth
> > doesn't list that range (I updated the table to Unicode 12 in #345,
> > and added a test-driver around that time).
> > 
> > The system wcwidth doesn't cover that range either.
> > 
> > Characters which aren't known to wcwidth are treated as nonprinting...
> 
> > > (In Debian 9.1, it still worked correctly.)
> > 
> > hmm - which version of xterm was that?
> > 
> > I'm guessing that it was #327
> 
> yes.
> 
> > (it should not have worked, but there's always the possibility that I
> > fixed a bug which was making it appear to work)
> 
> OK, that's possible. Thanks for the explanation.

In #327, xterm's wcwidth checked if the codes were combining characters
(using a table), or control characters and (for example this case) matched it
against some ranges of double-width characters.  If it was none of those, it
assumed single-width.

Starting in #330, I added another table "unknowns" to account for
codes which had no specific width:

Patch #330 - 2017/06/20
     * modify wcwidth.c to return -1 for non-Unicode values, and adjust a
       couple of blocks to better match assumptions about ambiguous-width
       characters  in  other  implementations.  Also  modify wcwidth.c to
       support configurable soft-hyphen, so there is no drawback to using
       this version rather than a system wcwidth.

-- 
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
ftp://ftp.invisible-island.net

Attachment: signature.asc
Description: PGP signature

Reply to:

References:
- Bug#988554: xterm does not render U+A7BA ... U+A7F6
  - From: Uwe Waldmann <uwe@mpi-inf.mpg.de>
- Bug#988554: xterm does not render U+A7BA ... U+A7F6
  - From: Thomas Dickey <dickey@his.com>

Prev by Date: libglvnd_1.3.2-1~bpo10+2_amd64.changes ACCEPTED into buster-backports->backports-policy
Next by Date: Bug#855422: xkb-data: Toshiba Chromebook 2 CB35 now has broken media keys
Previous by thread: Bug#988554: xterm does not render U+A7BA ... U+A7F6
Next by thread: libdrm_2.4.104-1~bpo10+1_amd64.changes ACCEPTED into buster-backports->backports-policy, buster-backports
Index(es):
- Date
- Thread