[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Kind of OT] Why's this look like gibberish to me?



"Douglas A. Tutty" <dtutty@porchlight.ca> writes:
>
> What gets me is when a man page is written in english and "'" gets
> translated as "?", as in can?t or "'" is a square white blob (on a
> regular VT).  Why couldn't whoever wrote it in english have used the
> standard english "'" glyph instead of a UTF thingy?

The problem isn't the manpage author, it's your setup.

Specifically, you're using a locale that sports UTF-8 encoding, but
you're using a terminal/font combination that is not capable of
correctly rendering UTF-8-encoded common typographical symbols used
for English language text, like the right single quote / apostrophe.
If you use a locale based on ASCII encoding instead, those manpages
will render more correctly (for example, substituting the unsightly
ASCII vertical apostrophe for its more urbane cousin or writing (C) in
place of the copyright symbol).  See the bottom of this post if LANG=C
isn't good enough for you.

Unlike some people here, I couldn't give a σθιτ if you, S. Keeling, or
anyone else wants to use UTF-8 or not---I'm not on any crusade---but
an environment variable setting of "LANG=en_US.UTF-8" is basically an
announcement to applications that your terminal is UTF-8 capable.  You
don't have to run a UTF-8-capable terminal if you don't want to, but
you shouldn't lie to your applications and then whine about those damn
foreigners writing manpages incorrectly (just a joke, just a joke).

In truth, if you look at the manpage source, you'll probably find that
the manpage authors *have* used the ASCII "'" character for
apostrophes and right single quotes.  That's because this is the
encoding convention used in the typesetting language "roff" in which
manpages are written.  You write `stuff like this' knowing that a
correctly configured manpage rendering pipeline will convert those
ASCII backticks and apostrophes into the correct English typographical
symbols (if the manpage is being printed or being displayed on a
sophisticated terminal) or at least do the best it can (if it's being
delivered to an ASCII-only terminal).  If manpage writers were really
on the ball, they'd use \(lqleft and right double-quotes\(rq too, but
you don't see too much of that.

To clarify further, there's nothing English about "'".  If it's
anything, it's ASCII, not English.  I'm not sure that the ASCII
standard actually specifies what printable characters, including "'",
are supposed to look like, but in most fonts with ASCII-compatible
encoding, the "'" character is rendered as an undirected,
typewriter-style apostrophe, like a vertical tickmark, and I believe
this is pretty much universally accepted as the "correct" rendering of
this character, among those who care about these things.  In
particular, it is *not* the character used in typeset English text as
an apostrophe or right single quote.  It's rarely used in English text
at all, except in historically ASCII contents like email and computer
plain text files.  It's about as un-English as you can get.  It's very
ASCII, though.

Anyway, to really take a stand on this UTF-8 crap and announce to the
world that 7 bits were good enough for cavemen so, by God, they're
good enough for you too, you can simply use a preexisting ASCII-only
locale (like LANG=C) or you can generate one.  Add this line to
"/etc/locale.gen":

        en_US ANSI_X3.4-1968

run "/usr/sbin/locale-gen" as root, and find some way to set
"LANG=en_US" or "LC_ALL=en_US".  ANSI_X3.4-1968 is another name for
ASCII, so your new "en_US" locale shouldn't bother you with heretical
characters.  Some applications will still give up and print a "?" for
non-ASCII characters, but "man" should do an excellent job displaying
a pure ASCII rendering of your manpages for you.

-- 
Kevin Buhr <buhr+debian@asaurus.net>


Reply to: