Re: Reasons to not use quote signs directly?
- To: Helge Kreutzmann <debian@helgefjell.de>
- Cc: debian-dpkg@lists.debian.org, Colin Watson <cjwatson@debian.org>
- Subject: Re: Reasons to not use quote signs directly?
- From: Russ Allbery <rra@debian.org>
- Date: Sat, 03 Dec 2016 12:51:18 -0800
- Message-id: <[🔎] 87inr0ohx5.fsf@hope.eyrie.org>
- In-reply-to: <20161130031045.yfj534hkell7d37x@gaara.hadrons.org> (Guillem Jover's message of "Wed, 30 Nov 2016 04:10:45 +0100")
- References: <20160919163049.GA27815@Debian-50-lenny-64-minimal> <20160920235910.ewk2ejmlx2sghe7w@gaara.hadrons.org> <878ttk5d3x.fsf@hope.eyrie.org> <20161019231407.irobeno7dlomodr5@gaara.hadrons.org> <87bmy5pc30.fsf@hope.eyrie.org> <20161130031045.yfj534hkell7d37x@gaara.hadrons.org>
Guillem Jover <guillem@debian.org> writes:
> Ah right, indeed it does. And it's explained in that same man page I
> referred. O:) The escape sequence would be something like \[u0021] or
> \[u0041_0300].
Oh! So, if I can just convert all Unicode characters to their numeric
codes, this becomes very easy to do. No tables and other machinery
required.
I'm a little worried about the \[u0041_0300] form, though. Does that mean
that \[u0041]\[u0300] does not work, and Pod::Man has to know whether
characters are combining or not? I suppose that's possible with the Perl
Unicode support, if necessary.
Are the numbers there the hex digits of a Unicode code point? The
groff_char man page is maddeningly light on details about this escape
form, mentioning it only in a REFERENCE section.
>> For Pod::Man usage, the output format I'd want would be a hash mapping
>> Unicode code points to the correct groff escape. Or, in an absolutely
>> ideal world, to have an Encode encoding for groff escapes, similar to how
>> the Encode::MIME::Header encoding works to generate RFC 2047 strings.
> I happened to stumble over an old patch by Brendan O'Dea that might be
> helpful, including a reference here to not lose track of that:
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=442066;filename=groff-utf8;msg=22>
Oh, aha, that's basically the table I was looking for, although that's
very limited compared to all Unicode characters, so it seems easier to
just do a straight conversion to the \[uNNNN] form.
>> B<> and I<> could just be surrounding normal words that should use
>> normal hyphens. L<some-command> is a link to a section in the same
>> document entitled some-command, so the assumption there is also that it
>> could be a regular English word.
> Oh, at least perlpod(1) says that L<name> links to a Perl manual page,
> so I'd expect it to be equivalent to the L<crontab(5)> style when
> processing minus chars, and L</sec> does the inter-section linking?
Oh, sorry, yes, I was thinking of L</some-command>. So the idea is that
L<some-command> should always use \- for all embedded hyphens?
>> As you say, though, I'm not entirely sure the distinction is worth all
>> the trouble we've put into it over the years. nroff at least seems to
>> have just given up and maps them all to "-" in the output anyway. That
>> used to be a Debian-specific change, but it looks like upstream has
>> switched to treating - as \-, I think? For HTML output, upstream maps
>> \- to − and Debian still overrides that to - instead. (If
>> upstream thinks \- is a minus sign and not ASCII 45, I'm really
>> confused what's going on with this, though.)
> We should probably ask Colin about this. :)
Yes, please -- Colin, do you have any idea what the current best practice
is here? I'm trying to figure out what to have Pod::Man do.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Reply to: