[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reasons to not use quote signs directly?

[ Colin CCed for some input on groff vs minus situation.  ]

On Thu, 2016-10-27 at 17:10:59 -0700, Russ Allbery wrote:
> Guillem Jover <guillem@debian.org> writes:
> > For the current conversion in dpkg, I've taken most of the common
> > symbols from groff_char(7) and created a very simple sed script, I'm not
> > sure if you were thinking about something along those lines (although in
> > proper perl)?
> >   <https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/tree/man/utf8toman.sed?h=next/master&id=c07b9b79447e200645ea423f959194fcbf8d4d32>
> Yeah, that would work, although aren't there quite a few more sequences
> than that?  Does groff have a way of representing an arbitrary Unicode
> code point?

Ah right, indeed it does. And it's explained in that same man page I
referred. O:) The escape sequence would be something like \[u0021] or

> For Pod::Man usage, the output format I'd want would be a hash mapping
> Unicode code points to the correct groff escape.  Or, in an absolutely
> ideal world, to have an Encode encoding for groff escapes, similar to how
> the Encode::MIME::Header encoding works to generate RFC 2047 strings.

I happened to stumble over an old patch by Brendan O'Dea that might be
helpful, including a reference here to not lose track of that:


> > If you could specify exactly which symbols you'd like to see supported I
> > might take a stab at this, when I have some spare time. Say everything
> > in groff_char(7) or similar. :)
> As much as possible is of course ideal, but I'm happy to take partial
> work!  :)

Ok! :)

> > The other major issue are commands, which I'm not sure are so easy to
> > detect. Maybe they could get to use the \- minus if they are inside some
> > other markup. I see that C<some-command> escapes them, as does
> > L<some-command(1)>, but L<some-command> does not (any reason?), which
> > could be handy to use I guess. Filenames are also safe with
> > F</some-dir/file-name>. The only problem is using the proper markup that
> > also preserves the same output as the current man pages.
> B<> and I<> could just be surrounding normal words that should use normal
> hyphens.  L<some-command> is a link to a section in the same document
> entitled some-command, so the assumption there is also that it could be a
> regular English word.

Oh, at least perlpod(1) says that L<name> links to a Perl manual page,
so I'd expect it to be equivalent to the L<crontab(5)> style when
processing minus chars, and L</sec> does the inter-section linking?

> As you say, though, I'm not entirely sure the distinction is worth all the
> trouble we've put into it over the years.  nroff at least seems to have
> just given up and maps them all to "-" in the output anyway.  That used to
> be a Debian-specific change, but it looks like upstream has switched to
> treating - as \-, I think?  For HTML output, upstream maps \- to &minus;
> and Debian still overrides that to - instead.  (If upstream thinks \- is a
> minus sign and not ASCII 45, I'm really confused what's going on with
> this, though.)

We should probably ask Colin about this. :)

> > I've always found the AUTHORS, COPYRIGHT or LICENSE sections to be
> > distracting, and in dpkg we got rid of all of them, because in addition
> > they were getting usually out-of-sync with the actual copyright
> > statements, and required adding names and updating years in two places.
> Yeah, that part is irritating.  The alternative, which I use in my
> packages these days, is to have these reflect the authors, copyright, and
> license of the *manual page*, but that's also weird.

Right, that's what dpkg used to have. But even then I've still found this

> =for license, resulting in a comment in the generated man page, seems like
> a better general solution (and then it probably makes sense for this to
> always reflect the license of the documentation file itself, not the
> larger package).



Reply to: