[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: manpage character cleanup for UTF-8 compatibility



* Colin Watson <cjwatson@debian.org> [20030325 18:50 PST]:
> On Tue, Mar 25, 2003 at 04:01:51PM -0800, Vineet Kumar wrote:
> > First of all, '-' renders as a hyphen (U+2010) instead of as ASCII 0x2D.
> > The correct groff escape to use in things like command-line options is
> > '\-', which renders as the 0x2D minus sign in both UTF-8 and ASCII
> > locales.  Hyphenated words such as "read-only" or "command-line" should
> > properly be printed with actual hyphens instead of minus signs, and do
> > not need to be changed.  For clarity, though, I recommend that
> > intentinoal hyphens be specified with the escape \(hy, to emphasize that
> > they are actually intended to by hyphens and not mistakenly-unescaped
> > minus signs.
> 
> I find '-' much clearer to read myself, but I don't think it's too
> important either way; leave that one up to the author of the page.
> Replacing '-' with '\-' when a literal dash is desired is the important
> part.

Sure.  It would also make for smaller patches to leave hyphens as '-'
instead of changing them to \(hy, which is good.

> > Accents: grave (U+0060) and acute (U+00B4) should be given as \` and \'
> > respectively.  According to groff(7), a bare, unescaped ` should also
> > render as "left quote, backquote (ASCII 0x27)".  The left quote (U+2018)
> > is different from the backquote (ASCII 0x27), so I think that "left
> > quote" should be deleted from the groff manpage, and groff should be
> > changed to display ` as `(U+0060) and not as U+2018.
> 
> I'm not sure I agree. I think groff(7) is simply unduly ASCII-centric,
> unlike groff_char(7).

Well, that's fine, too, as long \` or \(ga is used when U+0060 is
intended, and not bare `.  This does make that line of groff(7) pretty
misleading, but I guess it's really a consequence of the long-standing
`-as-left-quote mess.  It would help if groff(7) could make this
clearer, but as you point out, groff_char(7) makes it pretty clear.

> > Most of these things don't make any difference in ASCII locales, but
> > break in UTF-8 locales in which the special characters are actually
> > rendered specially.  For example, searching for a particular
> > command-line option is unncessarily difficult if it is incorrectly
> > specified with a hyphen instead of a minus sign.
> 
> The last time this came up in detail on the upstream groff mailing list,
> it was pointed out that Unicode-capable pagers really ought to start
> regarding different types of spaces and dashes as similar for searching
> purposes. I've not seen much evidence of this yet, but I think this
> would be a good time for such support to start happening.

Right.  That would help usability in the short term, but it feels more
like a workaround than a fix.  It still doesn't solve the copy-and-paste
problem, either.  Having the pagers change what they display when given
the proper multibyte characters would really a step backwards.

Thanks for the input.

good times,
Vineet
-- 
http://www.doorstop.net/
-- 
"Those who desire to give up freedom in order to gain security will not
have, nor do they deserve, either one."  --President Thomas Jefferson

Attachment: pgpZnaC8_ERhI.pgp
Description: PGP signature


Reply to: