Re: Reasons to not use quote signs directly?

To: Helge Kreutzmann <debian@helgefjell.de>
Cc: debian-dpkg@lists.debian.org, Colin Watson <cjwatson@debian.org>
Subject: Re: Reasons to not use quote signs directly?
From: Russ Allbery <rra@debian.org>
Date: Sat, 03 Dec 2016 12:51:18 -0800
Message-id: <[🔎] 87inr0ohx5.fsf@hope.eyrie.org>
In-reply-to: <20161130031045.yfj534hkell7d37x@gaara.hadrons.org> (Guillem Jover's message of "Wed, 30 Nov 2016 04:10:45 +0100")
References: <20160919163049.GA27815@Debian-50-lenny-64-minimal> <20160920235910.ewk2ejmlx2sghe7w@gaara.hadrons.org> <878ttk5d3x.fsf@hope.eyrie.org> <20161019231407.irobeno7dlomodr5@gaara.hadrons.org> <87bmy5pc30.fsf@hope.eyrie.org> <20161130031045.yfj534hkell7d37x@gaara.hadrons.org>

Guillem Jover <guillem@debian.org> writes:

> Ah right, indeed it does. And it's explained in that same man page I
> referred. O:) The escape sequence would be something like \[u0021] or
> \[u0041_0300].

Oh!  So, if I can just convert all Unicode characters to their numeric
codes, this becomes very easy to do.  No tables and other machinery
required.

I'm a little worried about the \[u0041_0300] form, though.  Does that mean
that \[u0041]\[u0300] does not work, and Pod::Man has to know whether
characters are combining or not?  I suppose that's possible with the Perl
Unicode support, if necessary.

Are the numbers there the hex digits of a Unicode code point?  The
groff_char man page is maddeningly light on details about this escape
form, mentioning it only in a REFERENCE section.

>> For Pod::Man usage, the output format I'd want would be a hash mapping
>> Unicode code points to the correct groff escape.  Or, in an absolutely
>> ideal world, to have an Encode encoding for groff escapes, similar to how
>> the Encode::MIME::Header encoding works to generate RFC 2047 strings.

> I happened to stumble over an old patch by Brendan O'Dea that might be
> helpful, including a reference here to not lose track of that:

>   <https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=442066;filename=groff-utf8;msg=22>

Oh, aha, that's basically the table I was looking for, although that's
very limited compared to all Unicode characters, so it seems easier to
just do a straight conversion to the \[uNNNN] form.

>> B<> and I<> could just be surrounding normal words that should use
>> normal hyphens.  L<some-command> is a link to a section in the same
>> document entitled some-command, so the assumption there is also that it
>> could be a regular English word.

> Oh, at least perlpod(1) says that L<name> links to a Perl manual page,
> so I'd expect it to be equivalent to the L<crontab(5)> style when
> processing minus chars, and L</sec> does the inter-section linking?

Oh, sorry, yes, I was thinking of L</some-command>.  So the idea is that
L<some-command> should always use \- for all embedded hyphens?

>> As you say, though, I'm not entirely sure the distinction is worth all
>> the trouble we've put into it over the years.  nroff at least seems to
>> have just given up and maps them all to "-" in the output anyway.  That
>> used to be a Debian-specific change, but it looks like upstream has
>> switched to treating - as \-, I think?  For HTML output, upstream maps
>> \- to &minus; and Debian still overrides that to - instead.  (If
>> upstream thinks \- is a minus sign and not ASCII 45, I'm really
>> confused what's going on with this, though.)

> We should probably ask Colin about this. :)

Yes, please -- Colin, do you have any idea what the current best practice
is here?  I'm trying to figure out what to have Pod::Man do.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Reply to:

Prev by Date: Re: [Debconf-devel] Bug#846624: debconf: invalid version in maintainer script
Next by Date: Compression ratios -- gz, bzip2 and xz
Previous by thread: Re: [Debconf-devel] Bug#846624: debconf: invalid version in maintainer script
Next by thread: Compression ratios -- gz, bzip2 and xz
Index(es):
- Date
- Thread