[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#440420: [PROPOSAL] Manual page encoding



On Sun, Sep 02, 2007 at 10:24:43PM +0200, Jens Seidel wrote:
> On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> > --- orig/policy.sgml
> > +++ mod/policy.sgml
> > @@ -8450,6 +8450,39 @@
> >  	      be present in the future.
> >   	  </footnote>
> >   	</p>
> > +
> > +	<p>
> > +	  Manual pages that are installed under
> > +	  <file>/usr/share/man/</file><var>ll</var>, where <var>ll</var>
> 
> Please use <file>/usr/share/man/<var>ll</var></file> as ll is part of
> the filename.

Thanks; consider it amended.

> > +	  is an ISO-639 language code, must be encoded with the usual
> > +	  legacy (non-UTF-8) character set for that language, as shown
> > +	  by:
> > +	  <example compact="compact">
> > +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> 
> You are aware of the fact that some languages such as Vietnamese have a
> 8 bit encoding but do not match this regular expression
> (vi_VN.TCVN TCVN5712-1)?

Hmm, yes. I'm not sure what to do about Vietnamese at the moment; I
doubt it works properly right now. I'll check it out.

> > +	  At present, it is not generally possible to install a manual
> > +	  page encoded in UTF-8 such that it will be used in all locales
> > +	  for that language (for example, a page installed under
> > +	  <file>/usr/share/man/fr_FR.UTF-8</file> will not be used in
> > +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> > +	  recommended to install pages encoded in UTF-8, but rather to
> > +	  continue using the legacy encoding.<footnote>This is expected
> > +	  to change as of man-db 2.5.0.</footnote>
> 
> Maybe it would be a good idea to explain what to do with non supported
> encodings these days. What to do with a Vietnamese page? Installing it
> now UTF-8 encoded into vi.UTF-8/ seems fine to me but you write "not yet
> recommended"!

Well, that just plain won't work; man won't look there. There are some
locales that are unfortunately left out in the cold at the moment. I'm
working to improve the situation.

(While man will look in /usr/share/man/vi_VI.UTF-8, that won't work
properly either because groff doesn't accept UTF-8 input and man doesn't
know how to recode that to an 8-bit encoding that can be passed through
groff's ascii8 device and recoded back to UTF-8 on the other side.
Basically, if man doesn't know about the legacy encoding for your
language, you're currently screwed, and no amount of changes to policy
will help you. Yes, this is far from ideal.)

> >   5. Update dh_installman to recode manual pages to UTF-8 automatically
> >      and install them under /usr/share/man/<ll>.UTF-8/. Getting the
> 
> This requires an option to specify the encoding of the manual page. Or
> assume UTF-8 by default for all languages not having a matching regular
> expression.

I was thinking of having dh_installman recode to UTF-8, and yes, you
would need to know the encoding somehow (maybe a table of legacy
encodings as is currently in man-db would do the job). This is the least
well-thought-out part of my transition plan, though, so ideas are good.

> >      Conflicts:/Breaks: in here might be difficult, plus I'm not sure
> 
> Why not just ignoring this? If updating man-db is sufficient let's
> ignore dependencies. (If a HTML documentation file uses the new
> (fictitious) HTML version 9 there is no need to list all browsers
> supporting this in the dependencies.)

Yeah, I'm open to just ignoring this. It's probably the pragmatic
approach.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]



Reply to: