Re: Reasons to not use quote signs directly?

To: Russ Allbery <rra@debian.org>
Cc: Helge Kreutzmann <debian@helgefjell.de>, debian-dpkg@lists.debian.org, Colin Watson <cjwatson@debian.org>
Subject: Re: Reasons to not use quote signs directly?
From: Guillem Jover <guillem@debian.org>
Date: Wed, 30 Nov 2016 04:10:45 +0100
Message-id: <[🔎] 20161130031045.yfj534hkell7d37x@gaara.hadrons.org>
Mail-followup-to: Russ Allbery <rra@debian.org>, Helge Kreutzmann <debian@helgefjell.de>, debian-dpkg@lists.debian.org, Colin Watson <cjwatson@debian.org>
In-reply-to: <87bmy5pc30.fsf@hope.eyrie.org>
References: <20160919163049.GA27815@Debian-50-lenny-64-minimal> <20160920235910.ewk2ejmlx2sghe7w@gaara.hadrons.org> <878ttk5d3x.fsf@hope.eyrie.org> <20161019231407.irobeno7dlomodr5@gaara.hadrons.org> <87bmy5pc30.fsf@hope.eyrie.org>

[ Colin CCed for some input on groff vs minus situation.  ]

On Thu, 2016-10-27 at 17:10:59 -0700, Russ Allbery wrote:
> Guillem Jover <guillem@debian.org> writes:
> > For the current conversion in dpkg, I've taken most of the common
> > symbols from groff_char(7) and created a very simple sed script, I'm not
> > sure if you were thinking about something along those lines (although in
> > proper perl)?
> 
> >   <https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/tree/man/utf8toman.sed?h=next/master&id=c07b9b79447e200645ea423f959194fcbf8d4d32>
> 
> Yeah, that would work, although aren't there quite a few more sequences
> than that?  Does groff have a way of representing an arbitrary Unicode
> code point?

Ah right, indeed it does. And it's explained in that same man page I
referred. O:) The escape sequence would be something like \[u0021] or
\[u0041_0300].

> For Pod::Man usage, the output format I'd want would be a hash mapping
> Unicode code points to the correct groff escape.  Or, in an absolutely
> ideal world, to have an Encode encoding for groff escapes, similar to how
> the Encode::MIME::Header encoding works to generate RFC 2047 strings.

I happened to stumble over an old patch by Brendan O'Dea that might be
helpful, including a reference here to not lose track of that:

  <https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=442066;filename=groff-utf8;msg=22>

> > If you could specify exactly which symbols you'd like to see supported I
> > might take a stab at this, when I have some spare time. Say everything
> > in groff_char(7) or similar. :)
> 
> As much as possible is of course ideal, but I'm happy to take partial
> work!  :)

Ok! :)

> > The other major issue are commands, which I'm not sure are so easy to
> > detect. Maybe they could get to use the \- minus if they are inside some
> > other markup. I see that C<some-command> escapes them, as does
> > L<some-command(1)>, but L<some-command> does not (any reason?), which
> > could be handy to use I guess. Filenames are also safe with
> > F</some-dir/file-name>. The only problem is using the proper markup that
> > also preserves the same output as the current man pages.
> 
> B<> and I<> could just be surrounding normal words that should use normal
> hyphens.  L<some-command> is a link to a section in the same document
> entitled some-command, so the assumption there is also that it could be a
> regular English word.

Oh, at least perlpod(1) says that L<name> links to a Perl manual page,
so I'd expect it to be equivalent to the L<crontab(5)> style when
processing minus chars, and L</sec> does the inter-section linking?

> As you say, though, I'm not entirely sure the distinction is worth all the
> trouble we've put into it over the years.  nroff at least seems to have
> just given up and maps them all to "-" in the output anyway.  That used to
> be a Debian-specific change, but it looks like upstream has switched to
> treating - as \-, I think?  For HTML output, upstream maps \- to &minus;
> and Debian still overrides that to - instead.  (If upstream thinks \- is a
> minus sign and not ASCII 45, I'm really confused what's going on with
> this, though.)

We should probably ask Colin about this. :)

> > I've always found the AUTHORS, COPYRIGHT or LICENSE sections to be
> > distracting, and in dpkg we got rid of all of them, because in addition
> > they were getting usually out-of-sync with the actual copyright
> > statements, and required adding names and updating years in two places.
> 
> Yeah, that part is irritating.  The alternative, which I use in my
> packages these days, is to have these reflect the authors, copyright, and
> license of the *manual page*, but that's also weird.

Right, that's what dpkg used to have. But even then I've still found this
distracting.

> =for license, resulting in a comment in the generated man page, seems like
> a better general solution (and then it probably makes sense for this to
> always reflect the license of the documentation file itself, not the
> larger package).

Yeah.

Thanks,
Guillem

Reply to:

Prev by Date: Re: symbolic link size
Next by Date: Re: Intent to commit craziness - source package unpacking
Previous by thread: Re: Re: dpkg was interrupted, you must MANUALLY *what*...????
Next by thread: Re: Intent to commit craziness - source package unpacking
Index(es):
- Date
- Thread