[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: What's the character encoding of manpages?

On Thu, Jul 24, 2003 at 03:55:43PM +0200, Aaron Isotton wrote:
> what are man pages, or more generally, groff documents, supposed to be
> encoded in?  I didn't find any reference to that in groff(7).  Is it

See groff_char(7). Technically it's Latin-1, but this is planned to
change to UTF-8 for groff 2.0 (no schedule yet); groff_char(7) advises
sticking to ASCII, and I agree. You can get everything in Latin-1 using
named characters anyway without having to worry about encoding.

> The problem arises because I have to transform a Docbook XML document
> into a manpage; there, all spaces (ASCII 0x20) inside a <literallayout>
> are translated into 0xA0 in the output.  I don't know what an A0 is
> supposed to be, but man ignores it when generating output, thus
> effectively removing the spaces from the output.

0xA0 is the Latin-1 non-breaking space. Bug #199422 notes that this
doesn't work in current groff. I'm not sure whether this is actually a
groff bug or not, and need to check with upstream. I suggest using '\ '
instead anyway, though.

> [Side note: there's also another problem unrelated to this; see
> http://sourceforge.net/tracker/?func=detail&aid=763861&group_id=21935&atid=516914
> for more information]

Using .nf and .fi would probably be more sensible than large numbers of
.br requests. (Feel free to pass on this comment.)


Colin Watson, groff maintainer                [cjwatson@flatline.org.uk]

Reply to: