[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: FW by lidaobing@gmail.com : Bug#467249: man-db: over sensitive on the spell of locale



On Thu, Feb 28, 2008 at 10:42:30AM +0100, Michelle Konzack wrote:
> Hello Maintainers,
> 
> It seems there is a common problem while setting up the correct UNICODE
> locale in systems.  As the posster in the attached message has written,
> he has setup his locale to "zh_CN.utf8" which is wrong, but as he has
> written too, the output of "locale -a" show it.

No way which way the _locale_ is spelt (including "vi_VI" without even the
word "utf" inside), the _charset_ is UTF-8.  No program ever should look at
the locale's name, as it has more quirks like this.  Checking the charset
will get you what you want.

> I think, there should be a global solution for this, since patching
> man-db is worthless.

Actually, it's groff what's at fault here.  Mostly.

> $ LANG=zh_CN.UTF-8 man --warnings -l ls.zh_CN.1 > /dev/null
> $ LANG=zh_CN.utf8 man --warnings -l ls.zh_CN.1 > /dev/null
> <standard input>:9: warning: can't find special character `u013F'
> <standard input>:9: warning: can't find special character `u011A'
> <standard input>:9: warning: can't find special character `u021D'
> <standard input>:11: warning: can't find special character `u0321'
> <standard input>:11: warning: can't find special character `u04AA'
> <standard input>:12: warning: can't find special character `u0461'
> // snip

Too bad, groff doesn't have real Unicode support, and supports only several
special-cased locales (which may then be transcoded as UTF-8, but they still
get wrapped into their old-style charsets).

Instead of changing the special-case recognition, I would instead completely
skip special-casing and just treat all characters equally.  Including, but
not limited to, u013F and u0461.


I've did some initial work at this, but unfortunately I'm dead busy right
now.  For "show me the code", working but not good enough to even to submit
to Colin pan-Unicode groff and man-db are at
    deb-src http://angband.pl/debian sid main
(just don't look inside, they're too ugly to live).  On the upside, on tty
everything but RTL (Hebrew/Arabic) works just fine, including CJK,
Vietnamese, Devanagari and cuneiform, even all together in one manpage (try
"man utf8test").  What's lacking is support for html (should be trivial), ps
(aargh...) and other devices.
I'm afraid I can do nothing at least until late friday...  but it looks like
we may be able to help Colin squash at least this bastion of locale
dependency.

-- 
1KB		// Microsoft corollary to Hanlon's razor:
		//	Never attribute to stupidity what can be
		//	adequately explained by malice.


Reply to: