Re: What's the character encoding of manpages?
On Thu, Jul 24, 2003 at 03:55:43PM +0200, Aaron Isotton wrote:
> what are man pages, or more generally, groff documents, supposed to be
> encoded in? I didn't find any reference to that in groff(7). Is it
See groff_char(7). Technically it's Latin-1, but this is planned to
change to UTF-8 for groff 2.0 (no schedule yet); groff_char(7) advises
sticking to ASCII, and I agree. You can get everything in Latin-1 using
named characters anyway without having to worry about encoding.
> The problem arises because I have to transform a Docbook XML document
> into a manpage; there, all spaces (ASCII 0x20) inside a <literallayout>
> are translated into 0xA0 in the output. I don't know what an A0 is
> supposed to be, but man ignores it when generating output, thus
> effectively removing the spaces from the output.
0xA0 is the Latin-1 non-breaking space. Bug #199422 notes that this
doesn't work in current groff. I'm not sure whether this is actually a
groff bug or not, and need to check with upstream. I suggest using '\ '
instead anyway, though.
> [Side note: there's also another problem unrelated to this; see
> for more information]
Using .nf and .fi would probably be more sensible than large numbers of
.br requests. (Feel free to pass on this comment.)
Colin Watson, groff maintainer [email@example.com]