On Thu, Feb 28, 2008 at 09:30:55PM +0000, Colin Watson wrote:
On Thu, Feb 28, 2008 at 09:21:41PM +0100, Adam Borowski wrote:
man-db really does have some special-casing here. Trust me. It was
necessary at the time. There are a finite number of known aliases for
the very small number of locales in question, and until it becomes
unnecessary I will simply support those.

(And I agree that it should go away, but can't easily just yet.)

Is there some way to query what character set a locale uses? If not, I think that man-db should default to UTF-8 (since that *is* the standard on Debian) and handle exceptions to that. Processing an ASCII manpage as UTF-8 is a no-op. And it's pretty easy to tell if something isn't valid UTF-8, and man-db can handle that as it normally would.

Of course, I'm not contributing code, so my opinion is worth what you paid for it.

Too bad, groff doesn't have real Unicode support, and supports only several
special-cased locales (which may then be transcoded as UTF-8, but they still
get wrapped into their old-style charsets).

AIUI, PostScript doesn't have UTF-8 support either, yet it seems to work just fine. Anyway, newer versions of groff have a conversion tool that maps UTF-8 (or any arbitrary character set) input into glyph names. But Debian's groff has been very heavily patched with support for kinsoku shori (prohibition character handling) and so we cannot simply update to a newer version. Believe me, if it were that easy, I'm sure Colin would have done it.

Are you working with Brian M. Carlson on this? He has been working on a
solution acceptable to groff upstream, which is, frankly, the only way I
want to go now. He has already made substantial progress with character
class support.

Please be aware that I have little time with school right now, so this may not be implemented soon. In fact, it may not be ready in time for lenny's release. I will sit down and work on it some more soon, but my time is limited. If people want more information on my plan of attack, please do let me know, and I'll be happy to share.

In fact, I'm off to hack some more on groff right now.

brian m. carlson / brian with sandals: Houston, Texas, US
+1 713 440 7475 | http://crustytoothpaste.ath.cx/~bmc | My opinion only
troff on top of XML: http://crustytoothpaste.ath.cx/~bmc/code/thwack
OpenPGP: RSA v4 4096b 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

