Re: Reasons to not use quote signs directly?
[ Colin CCed for some input on groff vs minus situation. ]
On Thu, 2016-10-27 at 17:10:59 -0700, Russ Allbery wrote:
> Guillem Jover <email@example.com> writes:
> > For the current conversion in dpkg, I've taken most of the common
> > symbols from groff_char(7) and created a very simple sed script, I'm not
> > sure if you were thinking about something along those lines (although in
> > proper perl)?
> > <https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/tree/man/utf8toman.sed?h=next/master&id=c07b9b79447e200645ea423f959194fcbf8d4d32>
> Yeah, that would work, although aren't there quite a few more sequences
> than that? Does groff have a way of representing an arbitrary Unicode
> code point?
Ah right, indeed it does. And it's explained in that same man page I
referred. O:) The escape sequence would be something like \[u0021] or
> For Pod::Man usage, the output format I'd want would be a hash mapping
> Unicode code points to the correct groff escape. Or, in an absolutely
> ideal world, to have an Encode encoding for groff escapes, similar to how
> the Encode::MIME::Header encoding works to generate RFC 2047 strings.
I happened to stumble over an old patch by Brendan O'Dea that might be
helpful, including a reference here to not lose track of that:
> > If you could specify exactly which symbols you'd like to see supported I
> > might take a stab at this, when I have some spare time. Say everything
> > in groff_char(7) or similar. :)
> As much as possible is of course ideal, but I'm happy to take partial
> work! :)
> > The other major issue are commands, which I'm not sure are so easy to
> > detect. Maybe they could get to use the \- minus if they are inside some
> > other markup. I see that C<some-command> escapes them, as does
> > L<some-command(1)>, but L<some-command> does not (any reason?), which
> > could be handy to use I guess. Filenames are also safe with
> > F</some-dir/file-name>. The only problem is using the proper markup that
> > also preserves the same output as the current man pages.
> B<> and I<> could just be surrounding normal words that should use normal
> hyphens. L<some-command> is a link to a section in the same document
> entitled some-command, so the assumption there is also that it could be a
> regular English word.
Oh, at least perlpod(1) says that L<name> links to a Perl manual page,
so I'd expect it to be equivalent to the L<crontab(5)> style when
processing minus chars, and L</sec> does the inter-section linking?
> As you say, though, I'm not entirely sure the distinction is worth all the
> trouble we've put into it over the years. nroff at least seems to have
> just given up and maps them all to "-" in the output anyway. That used to
> be a Debian-specific change, but it looks like upstream has switched to
> treating - as \-, I think? For HTML output, upstream maps \- to −
> and Debian still overrides that to - instead. (If upstream thinks \- is a
> minus sign and not ASCII 45, I'm really confused what's going on with
> this, though.)
We should probably ask Colin about this. :)
> > I've always found the AUTHORS, COPYRIGHT or LICENSE sections to be
> > distracting, and in dpkg we got rid of all of them, because in addition
> > they were getting usually out-of-sync with the actual copyright
> > statements, and required adding names and updating years in two places.
> Yeah, that part is irritating. The alternative, which I use in my
> packages these days, is to have these reflect the authors, copyright, and
> license of the *manual page*, but that's also weird.
Right, that's what dpkg used to have. But even then I've still found this
> =for license, resulting in a comment in the generated man page, seems like
> a better general solution (and then it probably makes sense for this to
> always reflect the license of the documentation file itself, not the
> larger package).