[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reasons to not use quote signs directly?



Guillem Jover <guillem@debian.org> writes:

> Ah right, indeed it does. And it's explained in that same man page I
> referred. O:) The escape sequence would be something like \[u0021] or
> \[u0041_0300].

Oh!  So, if I can just convert all Unicode characters to their numeric
codes, this becomes very easy to do.  No tables and other machinery
required.

I'm a little worried about the \[u0041_0300] form, though.  Does that mean
that \[u0041]\[u0300] does not work, and Pod::Man has to know whether
characters are combining or not?  I suppose that's possible with the Perl
Unicode support, if necessary.

Are the numbers there the hex digits of a Unicode code point?  The
groff_char man page is maddeningly light on details about this escape
form, mentioning it only in a REFERENCE section.

>> For Pod::Man usage, the output format I'd want would be a hash mapping
>> Unicode code points to the correct groff escape.  Or, in an absolutely
>> ideal world, to have an Encode encoding for groff escapes, similar to how
>> the Encode::MIME::Header encoding works to generate RFC 2047 strings.

> I happened to stumble over an old patch by Brendan O'Dea that might be
> helpful, including a reference here to not lose track of that:

>   <https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=442066;filename=groff-utf8;msg=22>

Oh, aha, that's basically the table I was looking for, although that's
very limited compared to all Unicode characters, so it seems easier to
just do a straight conversion to the \[uNNNN] form.

>> B<> and I<> could just be surrounding normal words that should use
>> normal hyphens.  L<some-command> is a link to a section in the same
>> document entitled some-command, so the assumption there is also that it
>> could be a regular English word.

> Oh, at least perlpod(1) says that L<name> links to a Perl manual page,
> so I'd expect it to be equivalent to the L<crontab(5)> style when
> processing minus chars, and L</sec> does the inter-section linking?

Oh, sorry, yes, I was thinking of L</some-command>.  So the idea is that
L<some-command> should always use \- for all embedded hyphens?

>> As you say, though, I'm not entirely sure the distinction is worth all
>> the trouble we've put into it over the years.  nroff at least seems to
>> have just given up and maps them all to "-" in the output anyway.  That
>> used to be a Debian-specific change, but it looks like upstream has
>> switched to treating - as \-, I think?  For HTML output, upstream maps
>> \- to &minus; and Debian still overrides that to - instead.  (If
>> upstream thinks \- is a minus sign and not ASCII 45, I'm really
>> confused what's going on with this, though.)

> We should probably ask Colin about this. :)

Yes, please -- Colin, do you have any idea what the current best practice
is here?  I'm trying to figure out what to have Pod::Man do.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>


Reply to: