Re: manpage character cleanup for UTF-8 compatibility
On Tue, Mar 25, 2003 at 04:01:51PM -0800, Vineet Kumar wrote:
> First of all, '-' renders as a hyphen (U+2010) instead of as ASCII 0x2D.
> The correct groff escape to use in things like command-line options is
> '\-', which renders as the 0x2D minus sign in both UTF-8 and ASCII
> locales. Hyphenated words such as "read-only" or "command-line" should
> properly be printed with actual hyphens instead of minus signs, and do
> not need to be changed. For clarity, though, I recommend that
> intentinoal hyphens be specified with the escape \(hy, to emphasize that
> they are actually intended to by hyphens and not mistakenly-unescaped
> minus signs.
I find '-' much clearer to read myself, but I don't think it's too
important either way; leave that one up to the author of the page.
Replacing '-' with '\-' when a literal dash is desired is the important
> Also, the use of '`' for left quotes, and sometimes '``' for left double
> quotes. The current situation with quotes is unclear, since groff
> doesn't really do what groff(7) says it should for ASCII 0x27 and 0x60
> (apostrophe and grave accent, respectively). The man page indicates
> that 0x27 should be rendered as U+0027, but it is errantly rendered as
> U+2019 (right single quote). Similarly, 0x60 should be rendered as
> U+0060, but is instead rendered as U+2018. This is probably to make the
> obsolete use of single quotes like `this' look pretty.
groff_char(7) is usually more helpful on these matters. You're quite
correct that \(oq and \(cq are better for balanced single quotes.
> Accents: grave (U+0060) and acute (U+00B4) should be given as \` and \'
> respectively. According to groff(7), a bare, unescaped ` should also
> render as "left quote, backquote (ASCII 0x27)". The left quote (U+2018)
> is different from the backquote (ASCII 0x27), so I think that "left
> quote" should be deleted from the groff manpage, and groff should be
> changed to display ` as `(U+0060) and not as U+2018.
I'm not sure I agree. I think groff(7) is simply unduly ASCII-centric,
> Most of these things don't make any difference in ASCII locales, but
> break in UTF-8 locales in which the special characters are actually
> rendered specially. For example, searching for a particular
> command-line option is unncessarily difficult if it is incorrectly
> specified with a hyphen instead of a minus sign.
The last time this came up in detail on the upstream groff mailing list,
it was pointed out that Unicode-capable pagers really ought to start
regarding different types of spaces and dashes as similar for searching
purposes. I've not seen much evidence of this yet, but I think this
would be a good time for such support to start happening.
> I'd like to be able to just point to this message/thread in the archives
> in the bug reports, rather than spelling it all out time and time again
> in each bug report. (Just patching the pages is tedious enough...).
/usr/share/doc/man-db/examples/manpage.example already mentions the '-'
versus '\-' problem. I'm happy to extend it if need be.
Colin Watson [firstname.lastname@example.org]