Bug#1123509: xterm: various issues with the U+FE0F VARIATION SELECTOR-16 (VS16) character after emoji
On 2025-12-17 04:32:37 -0500, Thomas Dickey wrote:
> On Wed, Dec 17, 2025 at 04:50:45AM +0100, Vincent Lefevre wrote:
> > With xterm 405, the use of the U+FE0F VARIATION SELECTOR-16 (VS16)
> > character after an emoji can completely corrupt the display with
> > Mutt. GNU Screen also gets broken with the command below (issues
> > with the last line of the terminal). I suspect that this is due
> > to an inconsistency between the xterm behavior and wcwidth(),
> > which may affect various applications that rely on wcwidth().
>
> Without the Emoji width feature (which as I mentioned, I see should be
> configurable), xterm's wcwidth is a close match for glibc's wcwidth.
> The few differences which I noticed in testing appear to be problems with
> glibc.
But if xterm's Emoji width feature is enable, I don't see how this
can be fixable with wcwidth() since the width is contextual. It is
the spec that would be broken. And there should be a clean way to
query the terminal about its behavior.
> Checking now, mutt has a wcwidth.c, which is not often used (since it's
> compile-time), which is just as well because it's tables are very old.
> It has a wrapper for wcwidth which makes some assumptions about iswprint
> that make its behavior problematic except with glibc.
There was the same issue with GNU Screen in the past, which has
now been fixed by using the wcwidth function from the C library,
following my bug reports:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1027733
https://savannah.gnu.org/bugs/?63634
But I did not have any issue with Mutt itself over the years.
Well... Actually it seems that Mutt uses its own implementation
only if such functions are not present in the C library.
In configure.ac:
if test "$wc_funcs" != yes && test "$wc_funcs" != no; then
AC_CACHE_CHECK([for wchar_t functions], mutt_cv_wc_funcs,
mutt_cv_wc_funcs=no
AC_LINK_IFELSE([AC_LANG_PROGRAM([[
#define _XOPEN_SOURCE 600
#include <stddef.h>
#include <stdlib.h>
#ifdef HAVE_WCHAR_H
#include <wchar.h>
#endif
#ifdef HAVE_WCTYPE_H
#include <wctype.h>
#endif]], [[mbrtowc(0, 0, 0, 0); wctomb(0, 0); wcwidth(0);
iswprint(0); iswspace(0); towlower(0); towupper(0); iswalnum(0)]])],[mutt_cv_wc_funcs=yes],[]))
wc_funcs=$mutt_cv_wc_funcs
fi
if test $wc_funcs = yes; then
AC_DEFINE(HAVE_WC_FUNCS,1,[ Define if you are using the system's wchar_t functions. ])
else
MUTT_LIB_OBJECTS="$MUTT_LIB_OBJECTS utf8.o wcwidth.o"
fi
So, in practice, under Linux (and most systems?), Mutt uses the
wcwidth function from the C library.
> > I have not checked wcswidth().
>
> nor I - actually I don't believe it is often used.
> mutt imitates it by repeatedly calling wcwidth, and doesn't account for VS16.
>
> Because mutt isn't accounting for VS16, that's an issue for which xterm
> "should" be configurable, so we can accommodate programs which pass through
> VS15 and VS16 without accounting for their behavior.
Note that if Mutt accounted for VS16, this would break its display
on most terminals (including GNOME Terminal, which seems commonly
used under Linux).
BTW, the behavior of GNOME Terminal has the advantage to avoid
display breakage: the emoji takes 2 cells, but wcwidth is honored.
So it seems that it is up to the application to advance the cursor
to avoid partial overlap.
> (I haven't investigated "neomutt", which may provide improvements, though
> the "neo" cult appears to rely heavily upon hard-coding).
Thre would still be the question of behavior with the terminals
like xterm 403- and GNOME Terminal.
> > But there are issues even with simple output. In a 80-column terminal:
> >
> > perl -C -e 'print "\x{2642}\x{FE0F}"x60, "\n"'
>
> perl's yet another pitfall. In developing #404, I looked into the wcwidth
> data used in NetBSD/OpenBSD, which reportedly is tied to perl. That ignores
> the East Asian stuff entirely, and doesn't match glibc very well.
Note that I'm using perl here only for its "x" feature to repeat
a string. There is exactly the same issue with printf (with the
sequence repeated manually), which has the same output.
> For your example, perl's irrelevant though - this is just bits...
>
> > I get "♂♂" in the last two columns, which is inconsistent with what
> > is output before. And in case of scrolling, the spaces are missing
> > in the second line.
>
> xterm's handling fullwidth characters by putting a non-character in the
> second cell. In handling VS16, I may have overlooked some path for doing
> that (something to investigate). But the behavior in mutt was consistent
> with my expectation: an extra "blank" cell.
BTW, there was another issue I hadn't reported here: this extra
"blank" cell does not have the correct background when it is
different from the default background.
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)
Reply to: