[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#440420: [PROPOSAL] Manual page encoding



On Mon, Dec 31, 2007 at 02:37:48PM +0000, Colin Watson wrote:
> On Sun, Dec 30, 2007 at 10:28:12PM -0800, Russ Allbery wrote:
> > Colin Watson <cjwatson@debian.org> writes:
> > > I propose that policy should standardise that we move to using UTF-8 as
> > > the source encoding for all manual pages since it clearly makes sense to
> > > do so.
[...]
> Right. Here's an update; I think I've captured most of the discussion in
> the thread so far. The following patch could in principle be applied
> now, given seconds. Wordsmithing welcome, as I'm aware that this is a
> rather dense recommendation; I'm also looking for seconds for this
> proposal.

Christian Perrier seconded this here:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440420#100

However, since later discussion indicated that we should drop the .UTF-8
business, I think we can also drop it from the policy proposal. (Manual
pages still shouldn't lie about their encoding if they install files
there, but since this will not be the recommended default there is no
reason to bloat policy with it.)

Here's another updated version. Christian, are you still OK with this?
I'm also looking for at least one more second for this proposal.

--- orig/policy.sgml
+++ mod/policy.sgml
@@ -8521,6 +8521,37 @@
 	      be present in the future.
  	  </footnote>
  	</p>
+
+	<p>
+	  Manual pages in locale-specific subdirectories of
+	  <file>/usr/share/man</file> should use either UTF-8 or the usual
+	  legacy encoding for that language (normally the one corresponding
+	  to the shortest relevant locale name in
+	  <file>/usr/share/i18n/SUPPORTED</file>). For example, pages under
+	  <file>/usr/share/man/fr</file> should use either UTF-8 or
+	  ISO-8859-1.<footnote><prgn>man</prgn> will automatically detect
+	  whether UTF-8 is in use. In future, all manual pages will be
+	  required to use UTF-8.</footnote>
+	</p>
+
+	<p>
+	  A country name (e.g. <file>de_DE</file>) should not be included in
+	  the subdirectory name unless it indicates a significant difference
+	  in the language, as this excludes speakers of the language in
+	  other countries.<footnote>At the time of writing, Chinese and
+	  Portuguese are the main languages with such differences, so
+	  <file>pt_BR</file>, <file>zh_CN</file>, and <file>zh_TW</file> are
+	  all allowed.</footnote>
+	</p>
+
+	<p>
+	  Due to limitations in current implementations, all characters
+	  in the manual page source should be representable in the usual
+	  legacy encoding for that language, even if the file is
+	  actually encoded in UTF-8. Safe alternative ways to write many
+	  characters outside that range may be found in
+	  <manref name="groff_char" section="7">.
+	</p>
       </sect>
 
       <sect>

> Thus, an updated transition plan:
> 
>   1. Initial status: packages should use only /usr/share/man/<ll>/
>      (although some packages have anticipated an approximation of the
>      transition plan; we ignore these for the moment as there is little
>      point in changing them only to change them back later), and must
>      use the legacy encoding for pages installed there.
> 
>   2. man-db 2.5.0-1 uploaded, including support for installing pages in
>      /usr/share/man/<ll>.<codeset>/ (e.g. /usr/share/man/fr.UTF-8). The
>      basename of this directory is not typically a well-formed locale,
>      but it allows a clear specification of the hierarchy's encoding
>      while applying to all countries using that language. [DONE]
> 
>   3. man-db 2.5.1-1 uploaded, including 'man --recode'.

We are now here; I uploaded man-db 2.5.1-1 this morning.

>   4. dh_installman updated to recode manual pages to UTF-8 automatically
>      (and install them under /usr/share/man/<ll>.UTF-8/?), using 'man
>      --recode UTF-8' to guess the original encoding. debhelper Depends:
>      man-db (>= 2.5.1-1) for this. Pages for which the DWIM fails can
>      include an explicit coding: directive, which will be documented.

Bug filed:

  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=462937

>   8. Distant future: deprecate /usr/share/man/<ll>/. This will only be
>      for consistency, so there's no need to rush.

This step is deleted.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]


Reply to: