[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#440420: [PROPOSAL] Manual page encoding



On Mon, Sep 03, 2007 at 08:30:39AM +0200, Jens Seidel wrote:
> On Sun, Sep 02, 2007 at 11:31:45PM +0100, Colin Watson wrote:
> > On Sun, Sep 02, 2007 at 10:24:43PM +0200, Jens Seidel wrote:
> > > On Sat, Sep 01, 2007 at 01:02:33PM +0100, Colin Watson wrote:
> > > > +	  is an ISO-639 language code, must be encoded with the usual
> > > > +	  legacy (non-UTF-8) character set for that language, as shown
> > > > +	  by:
> > > > +	  <example compact="compact">
> > > > +egrep -v '\.|@|UTF-8' /usr/share/i18n/SUPPORTED
> > > 
> > > You are aware of the fact that some languages such as Vietnamese have a
> > > 8 bit encoding but do not match this regular expression
> > > (vi_VN.TCVN TCVN5712-1)?
> > 
> > Hmm, yes. I'm not sure what to do about Vietnamese at the moment; I
> > doubt it works properly right now. I'll check it out.
> 
> I doubt it too...

Regardless, to make it work with current groff (which reserves a part of
the input character set for its own use and thereby conflicts with UTF-8
input), a legacy character set is needed; what I was trying to express
is that this should be the "most usual" legacy encoding for that
language.

Vietnamese is an odd case. In the long term, I think being explicit
(vi.UTF-8) is the right answer anyway.

> > > > +	  the <tt>fr_BE.UTF-8</tt> locale). It is therefore not yet
> > > > +	  recommended to install pages encoded in UTF-8, but rather to
> > > 
> > > Maybe it would be a good idea to explain what to do with non supported
> > > encodings these days. What to do with a Vietnamese page? Installing it
> > > now UTF-8 encoded into vi.UTF-8/ seems fine to me but you write "not yet
> > > recommended"!
> > 
> > Well, that just plain won't work; man won't look there. There are some
> 
> Yup, I'm aware of it. But once proper support to man-db is added it will
> work. There should be no need to upload a large amount of packages just
> to fix manual pages after the man-db transition if this can happen
> already now. (Or should currently not supported manual pages not
> be installed at all?)

This is sort of what my caveat about the "not yet recommended" language
was about. I agree with you that if it doesn't work with current man
anyway then there is no harm in installing it in the future location.
I'm not sure how to word this in policy though; do you have any
suggestions?

Maybe it would be better for me to just focus on getting man-db 2.5.0
done ASAP and not worry too much about policy in the meantime. :-)

> Isn't this the core idea of extenting the policy? To guide the
> developer what should/will be used once the transition happened?
> 
> hex-a-hop installs already the Vietnamese and the Bulgarian manpages,
> both are currently not supported (at least in Etch and according to the
> changelog also in Sid -- and can be used as a test for you). (I will
> file a bug for Bulgarian on man-db soon.)

That Bulgarian page is a particularly unfortunate example because it
uses the ѝ character which is not in CP1251 (the encoding of the bg_BG
locale), so right now we have no reliable path to render this page. I've
added Bulgarian support anyway, it's just that this page will be a bit
broken. I think you would be best advised to move this page to
/usr/share/man/bg.UTF-8 given that it definitely won't work in
/usr/share/man/bg.

In the case of the Vietnamese page, please change the "—" character
(U+2014) to "\-" as is standard in NAME sections; otherwise this works
fine when recoded via TCVN5712-1 so I've added support for this too.
Again, I think you would be best advised to install this in
/usr/share/man/vi.UTF-8.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]



Reply to: