Re: Bug#440420: [PROPOSAL] Manual page encoding

To: SmartList <debian-i18n@lists.debian.org>
Cc: Bruno Haible <bruno@clisp.org>
Subject: Re: Bug#440420: [PROPOSAL] Manual page encoding
From: Clytie Siddall <clytie@riverland.net.au>
Date: Wed, 5 Sep 2007 22:15:20 +0930
Message-id: <[🔎] 4DDDD8B4-5E9B-40C3-BF6C-2DAD956D523A@riverland.net.au>
In-reply-to: <[🔎] 20070904105256.GK6091@riva.ucam.org>
References: <[🔎] 20070901120232.GB18492@riva.ucam.org> <46DC2A62.50402@debian.org> <20070903164719.GE6091@riva.ucam.org> <46DD1D67.6020906@debian.org> <[🔎] 20070904105256.GK6091@riva.ucam.org>

I support the proposal for UTF-8 manual page encoding.

I don't know which groff Debian is currently using, but...

On 04/09/2007, at 8:22 PM, Colin Watson wrote:

  * Because our current groff implementation imposes quite strict
    restrictions on what input and output encodings are possible, and

usually needs to know detailed information about theseencodings in

    order to achieve correct typography, it is if anything more
    important than usual for man to have an accurate idea of the
    document's character set.


... (second post)

groff 1.19 supports full Unicode-style composite glyphs, but theversion

we have doesn't (see the comment in my original bug report about groff
versioning). Both our version and newer versions support named
characters such as \[:a] or \(:a (variant spellings), again documented
in groff_char(7). There's also the \N escape which can give you
font-dependent numbered glyphs, which are Unicode codepoints if you
happen to know that the utf8 device is in use.

As above, though, these have been available and translators generally

haven't used them; I can imagine that they're insanely cumbersometo use

in practice for e.g. Japanese. So I'd really rather just support plain
UTF-8 input for alphanumerics, which I think will actually get used.

Do you think we will need explicit language in policy for this? Forthe

time being, until we have a version of groff supporting direct UTF-8
input, the implementation will require that the page be convertible to

the legacy encoding for that language using iconv (it'll use 'iconv-c'

so that unknown characters are dropped rather than breaking the whole
page, but all the same): so e.g. for German pages characters without a
direct equivalent in ISO-8859-1 should be avoided. This seems like a

reasonable thing to document after man-db 2.5.0, and would coverthings

like UTF-8 hyphen characters.

I'm not sure how groff will handle such characters once it does have
UTF-8 input support. I suspect it would convert U+2010 to its internal

"hy" glyph and render that in whatever way is appropriate for theoutput

device; that would really be ideal. However, I don't have enough
information to make a decision based on that guess.

In general, I think it's worthwhile for policy to make comments on
encoding for purposes of interoperability and standardisation, but I'd
be inclined to draw the line at filling it up with instructions on how
to use groff correctly. Does this sound reasonable?

Bruno Haible very kindly created a groff-utf8 [1] some time back tohelp me test my pilot UTF-8 (Vietnamese) manpage translation. I don'tknow if he's gone any further with that, but it works fine in any ofmy terminal apps.

I am ashamed to confess that I haven't had time to take manpagetranslation any further, either, but I hope to do so. <blush>

Hmm, it doesn't look like he's had time to take this any further, butthat implementation displays UTF-8 perfectly for me. Vietnameserequires UTF-8, so I'm a particularly keen UTF-8 supporter. ;)

Also, on precomposed and decomposed Unicode glyphs, there are anumber of problems with displaying decomposed characters. You can getseparate display of character and diacritic, even separation to thepoint that the character is displayed, but the accent follows thecursor around the page!

Precomposed characters are a much safer choice. They have moreconsistent support, and simply provide less opportunities for error.

from Clytie (vi-VN, Vietnamese free-software translation team / nhómViệt hóa phần mềm tự do)

http://groups-beta.google.com/group/vi-VN

[1] http://www.haible.de/bruno/packages-groff-utf8.html

Attachment: PGP.sig
Description: This is a digitally signed message part

Reply to:

References:
- Bug#440420: [PROPOSAL] Manual page encoding
  - From: Colin Watson <cjwatson@debian.org>
- Re: Bug#440420: [PROPOSAL] Manual page encoding
  - From: Colin Watson <cjwatson@debian.org>

Prev by Date: Review in progress for openslp debconf templates
Next by Date: Any Japanese translators out there?
Previous by thread: Re: Bug#440420: [PROPOSAL] Manual page encoding
Next by thread: Re: syslinux F1..F6 screens
Index(es):
- Date
- Thread