[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: manpage character cleanup for UTF-8 compatibility



On Tue, Mar 25, 2003 at 04:01:51PM -0800, Vineet Kumar wrote:
> First of all, '-' renders as a hyphen (U+2010) instead of as ASCII 0x2D.
> The correct groff escape to use in things like command-line options is
> '\-', which renders as the 0x2D minus sign in both UTF-8 and ASCII
> locales.  Hyphenated words such as "read-only" or "command-line" should
> properly be printed with actual hyphens instead of minus signs, and do
> not need to be changed.  For clarity, though, I recommend that
> intentinoal hyphens be specified with the escape \(hy, to emphasize that
> they are actually intended to by hyphens and not mistakenly-unescaped
> minus signs.

I find '-' much clearer to read myself, but I don't think it's too
important either way; leave that one up to the author of the page.
Replacing '-' with '\-' when a literal dash is desired is the important
part.

> Also, the use of '`' for left quotes, and sometimes '``' for left double
> quotes.  The current situation with quotes is unclear, since groff
> doesn't really do what groff(7) says it should for ASCII 0x27 and 0x60
> (apostrophe and grave accent, respectively).  The man page indicates
> that 0x27 should be rendered as U+0027, but it is errantly rendered as
> U+2019 (right single quote).  Similarly, 0x60 should be rendered as
> U+0060, but is instead rendered as U+2018.  This is probably to make the
> obsolete use of single quotes like `this' look pretty.

groff_char(7) is usually more helpful on these matters. You're quite
correct that \(oq and \(cq are better for balanced single quotes.

> Accents: grave (U+0060) and acute (U+00B4) should be given as \` and \'
> respectively.  According to groff(7), a bare, unescaped ` should also
> render as "left quote, backquote (ASCII 0x27)".  The left quote (U+2018)
> is different from the backquote (ASCII 0x27), so I think that "left
> quote" should be deleted from the groff manpage, and groff should be
> changed to display ` as `(U+0060) and not as U+2018.

I'm not sure I agree. I think groff(7) is simply unduly ASCII-centric,
unlike groff_char(7).

> Most of these things don't make any difference in ASCII locales, but
> break in UTF-8 locales in which the special characters are actually
> rendered specially.  For example, searching for a particular
> command-line option is unncessarily difficult if it is incorrectly
> specified with a hyphen instead of a minus sign.

The last time this came up in detail on the upstream groff mailing list,
it was pointed out that Unicode-capable pagers really ought to start
regarding different types of spaces and dashes as similar for searching
purposes. I've not seen much evidence of this yet, but I think this
would be a good time for such support to start happening.

> I'd like to be able to just point to this message/thread in the archives
> in the bug reports, rather than spelling it all out time and time again
> in each bug report.  (Just patching the pages is tedious enough...).

/usr/share/doc/man-db/examples/manpage.example already mentions the '-'
versus '\-' problem. I'm happy to extend it if need be.

-- 
Colin Watson                                  [cjwatson@flatline.org.uk]



Reply to: