[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Man pages and UTF-8



Adam Borowski <kilobyte@angband.pl> writes:

> Any such description file would work only as long as you hard-code any
> fonts, and somehow provide them for any potential reader.  Without this,
> wcwidth() is as good as you can get for fixed-width fonts.  For
> comparison, Red Hat makes a wild assumption that everything u0800..uFFFF
> is doublewide.

The correct thing to do is to use the information from the latest version
of the Asian character width property table:

    http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt

For more information about this area of Unicode, see:

    http://unicode.org/reports/tr11/

u0800..uFFFF is a bad approximation that misses several ranges and is
actually wrong for most of the range up to u1100.  For another
application, I use the approximation of:

our @WIDE = qw(\x{2E80}-\x{303E} \x{3041}-\x{33FF} \x{4E00}-\x{9FBB}
               \x{AC00}-\x{D7A3} \x{FF01}-\x{FF60});

but even that is not a particularly good approximation compared to using
the real table.

My guess is that wcwidth's answer is based on the latest version of that
table at the time that glibc released, although I'd have to double-check
to be sure.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>



Reply to: