Re: Bug#467249: man-db/groff and locales
I see that my (inept but working) patches are not welcome right now. So,
I'll leave groff alone; just let me answer the issues raised.
On Sat, Mar 01, 2008 at 11:56:28PM +0000, Colin Watson wrote:
> On Fri, Feb 29, 2008 at 12:32:29AM +0100, Adam Borowski wrote:
> > On Thu, Feb 28, 2008 at 10:10:32PM +0000, brian m. carlson wrote:
> > > On Thu, Feb 28, 2008 at 09:30:55PM +0000, Colin Watson wrote:
> > > >man-db really does have some special-casing here. Trust me. It was
> > > >necessary at the time. There are a finite number of known aliases for
> > > >the very small number of locales in question, and until it becomes
> > > >unnecessary I will simply support those.
> > Of, course, encodings for _source_ pages are those we can't get away with.
> > But for all intermediate steps, I don't see any reason to not go to a
> > well-known encoding, do everything there and finally convert to whatever
> > locale is set -- and you don't even need to name the charset there.
> > Special-casing _output_ locales seems quite strange to me.
> /* An ugly special case is needed here. The utf8 device normally
> * takes ISO-8859-1 input. However, with the multibyte patch, when
> * recoding from CJK character sets it takes UTF-8 input instead.
> * This is evil, but there's not much that can be done about it
> * apart from waiting for groff 2.0.
The idea is to make it take UTF-8 input _always_. Either hard-coded as in
Red Hat, or settable with -K<charset> as in upstream groff.
> > > >(And I agree that it should go away, but can't easily just yet.)
> > Could you tell us what keeps us with all the old cruft?
> Sanity. I am not interested in making the groff package even more
> incredibly difficult to update to a new upstream in the future.
Having the outside API (ie, -K and expected charsets) be more in line with
current upstreams sounds like something that would make upgrading _easier_.
If most of groff-1.8 patches cannot be ported to 1.9, I would label at least
bringing outside interfaces together a good thing.
> Official groff does not yet support proper CJK typography. Until that is
> in place it is not a viable replacement.
Yet it does support every other language save for Arabic and Hebrew. And
unless I'm missing something, it's just word-wrapping that's amiss. I'm not
sure what is the extent of kinsoku shori -- but if its description in
Wikipedia is accurate, it could be done by injecting a separator character
like U+200B ZERO WIDTH SPACE between chars than allow word wrap and then
using the normal rules for scripts with explicit spaces.
But again, if you have already done some research, I'll better leave you
> I think I'm fairly clearly active in man-db; could you please accept that
> I have my reasons beyond laziness,
Uhm... neither me nor Brian Carlson have accused you of laziness. Heck, I
think that you have done a bunch of great work in man-db recently --
allowing uniformly encoded sources in particular. I just offered some help
with following through -- full Unicode support would be a logical next step.
> and look up what has been said on this topic over and over again in the
Indeed, I've taken a look only at past debian-devel threads and the BTS;
there's probably lots of wisdom I missed on new groff lists. I was fooled
by an impression I taken in a previous discussion that groff-1.9 is a no-no
> I am honestly not willing to support a backport of -K/preconv to our
> groff package,
That's sad, but if indeed groff-1.9 will be deemed acceptable soon, you're
> I appreciate your research into this. But please, I beg you, focus your
> energies on upstream. There is really not much left to do; Brian's done
> the heavy lifting of character class support (or most of it, anyway),
> and now somebody just needs to take the specialised typographic rules
> and make them sufficiently general for inclusion.
> I hope you will take my advice born of nearly seven years of maintaining
> groff in Debian.
Ok. Since groff is a really tangled, complex beast that would take a lot of
time to understand well enough, I think I'll go pester someone else now.
There's a lot of other places with flaky non-ASCII support in Debian. Like,
if you use a JFS partition, d-i fails to add "iocharset=utf8" in fstab
making non-ASCII filenames lose badly. And so on, so on...
Cheers and schtuff,
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.