On Mon, May 17, 2021 at 09:22:35AM +0200, Uwe Waldmann wrote:
> > According to https://unicodeplus.com/U+A7BA
> >
> > The character Ꞻ (Latin Capital Letter Glottal A) is represented by the
> > Unicode codepoint U+A7BA. It is encoded in the Latin Extended-D block,
> > which belongs to the Basic Multilingual Plane. It was added to Unicode
> > in version 12.0 (March, 2019). It is HTML encoded as Ꞻ.
> >
> > xterm #344 is a little earlier than that. Its fallback copy of wcwidth
> > doesn't list that range (I updated the table to Unicode 12 in #345,
> > and added a test-driver around that time).
> >
> > The system wcwidth doesn't cover that range either.
> >
> > Characters which aren't known to wcwidth are treated as nonprinting...
>
> > > (In Debian 9.1, it still worked correctly.)
> >
> > hmm - which version of xterm was that?
> >
> > I'm guessing that it was #327
>
> yes.
>
> > (it should not have worked, but there's always the possibility that I
> > fixed a bug which was making it appear to work)
>
> OK, that's possible. Thanks for the explanation.
In #327, xterm's wcwidth checked if the codes were combining characters
(using a table), or control characters and (for example this case) matched it
against some ranges of double-width characters. If it was none of those, it
assumed single-width.
Starting in #330, I added another table "unknowns" to account for
codes which had no specific width:
Patch #330 - 2017/06/20
* modify wcwidth.c to return -1 for non-Unicode values, and adjust a
couple of blocks to better match assumptions about ambiguous-width
characters in other implementations. Also modify wcwidth.c to
support configurable soft-hyphen, so there is no drawback to using
this version rather than a system wcwidth.
--
Thomas E. Dickey <dickey@invisible-island.net>
https://invisible-island.net
ftp://ftp.invisible-island.net
Attachment:
signature.asc
Description: PGP signature