Re: Bug#467249: FW by email@example.com : Bug#467249: man-db: over sensitive on the spell of locale
On Thu, Feb 28, 2008 at 09:21:41PM +0100, Adam Borowski wrote:
> On Thu, Feb 28, 2008 at 10:42:30AM +0100, Michelle Konzack wrote:
> > It seems there is a common problem while setting up the correct UNICODE
> > locale in systems. As the posster in the attached message has written,
> > he has setup his locale to "zh_CN.utf8" which is wrong, but as he has
> > written too, the output of "locale -a" show it.
> No way which way the _locale_ is spelt (including "vi_VI" without even the
> word "utf" inside),
Irrelevant to this bug, as you'll see if you look at the code.
> the _charset_ is UTF-8. No program ever should look at the locale's
> name, as it has more quirks like this. Checking the charset will get
> you what you want.
> > I think, there should be a global solution for this, since patching
> > man-db is worthless.
> Actually, it's groff what's at fault here. Mostly.
man-db really does have some special-casing here. Trust me. It was
necessary at the time. There are a finite number of known aliases for
the very small number of locales in question, and until it becomes
unnecessary I will simply support those.
(And I agree that it should go away, but can't easily just yet.)
Please don't drag groff into this bug. I really hate it when bugs drift
wildly off their original (accurately-constrained) topic despite
attempts to haul them back. It makes them impossible to keep organised.
> > $ LANG=zh_CN.UTF-8 man --warnings -l ls.zh_CN.1 > /dev/null
> > $ LANG=zh_CN.utf8 man --warnings -l ls.zh_CN.1 > /dev/null
> > <standard input>:9: warning: can't find special character `u013F'
> > <standard input>:9: warning: can't find special character `u011A'
> > <standard input>:9: warning: can't find special character `u021D'
> > <standard input>:11: warning: can't find special character `u0321'
> > <standard input>:11: warning: can't find special character `u04AA'
> > <standard input>:12: warning: can't find special character `u0461'
> > // snip
> Too bad, groff doesn't have real Unicode support, and supports only several
> special-cased locales (which may then be transcoded as UTF-8, but they still
> get wrapped into their old-style charsets).
> Instead of changing the special-case recognition, I would instead completely
> skip special-casing and just treat all characters equally. Including, but
> not limited to, u013F and u0461.
Are you working with Brian M. Carlson on this? He has been working on a
solution acceptable to groff upstream, which is, frankly, the only way I
want to go now. He has already made substantial progress with character
Treating all characters equally will absolutely not be acceptable to
groff upstream. groff is a typesetter and needs to know about properties
Colin Watson [firstname.lastname@example.org]