[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: support for multilingual Packages files?



On Mon, Jul 30, 2001 at 05:45:22PM +0100, David Starner wrote:
> From: Radovan Garabik <garabik@melkor.dnp.fmph.uniba.sk>
> > On Mon, Jul 30, 2001 at 01:04:50PM +0200, Michael Bramer wrote:
> > >
> > > IMHO the packages, the crontroll file and the Package files are to big
> > > with this. We need a better system!
> >
> > probably. But we have to keep with current one for a while
> 
> It'd be less complex and quicker to switch over to Bramer's plan than to
> yours. In fact, I wouldn't be surprised if we have fully localized Package
> files for unstable the day this freeze ends.

we'll see.
In fact, I do not have a plan. I just noticed having different
encodings in Packages, and suggested to unify them to utf-8

<braintorm, comments welcome>
I can see one problem with ascii only Packages:

Debian is entering the phase of localized Packages.
As with any such a project, the best way to handle many
different translations is to have ONE CANONICAL source, otherwise
you never make anything out of the mess.
Each groups of translators is monitoring this source, noticing new
packages, changes in existing ones, and catching up with translations.

Now, you suggest the canonical source is in ASCII. That is my gripe.
You just cannot include the necesasry information in ASCII (Names!)

And translation teams have additional burden figuring out correct
diacritics in names (or just use version without, which is dumb)

Of course, there could be one translation, let's call it Packages-ascii
(or Packages-en, name is not important, or even Packages and keep
Packages-utf8 as the canonical source) and that is the 7-bit ASCII
version, suitable for dselect. </braintorm>

> >
> > Problems should be made visible and discussed, and solutions
> > should be find, instead of just telling "unicode is bad, we are never
> > going to accept it"
> 
> You don't understand. The problems Tomohiro wants fixed, aren't going to be

I do understand. I told earlier I do not feel competent to comment
on CJK unification. I can imagine that Kapanese feel bad about it,
and uderstand their reasoning. OTOH, I equally well understand reasoning
of the opposite party :-)
All other problems could be fixed.

> 
> > > 2. Generally, number of bytes, characters, and columns differs one
> > > another.  The difference is different between locales.  Thus,
> > > mismatching of locale and encoding will break the layout of the
> > > screen.  (This is not a problem for dumb-terminal-based softwares
> >
> > yes, this is bad. However, it does not make dselect unusable.
> 
> So mojibake isn't bad . . .

I did not say it is not bad. But if Tomohiro sees a random garbage kana or
a question mark in my name, I do not think it will be the end of the world.
And the same, if someone sees random ISO-8859-2 characters in place of his name,
it is not the end of the world.

> 
> > Yes, this is a big problem. I was caught by this, when I installed
> > intranet fulltext search machine. (and I also made multilingual
> > on-line dictionary).

Statistics: about 30% of searches uses plain ascii, with diacritics 
stripped. The rest of users do enter words with diacritics, and
expect the dictionary to work with that.

> > Fortunately, searching by maintainers' names is not a keypoint for
> > debian tools.
> 
> . . . and neither is breaking current functionality? For your information, I
> search on maintainer's names all the time in places like the BTS.

FYI, when someone wants to search for my name with diacritics, he is
equally likely to enter it with diacritics. I've seen it. 
(and he would find nothing in BTS then).
You can always search via e-mail, anyway.


> 
> > or they could decide if they prefer not to include the ASCII version at
> all,
> > so that nobody is confused by incorrect variant of their name (I am
> talking
> > now about latin-script names with diacritics)
> 
> What about people being confused by the diacritc version of their name?
> People mangle people's names all the time - I get called "Dave" every so
> often, which annoys me to no end. It's life - is it more important to
> communicate and work well with others, or to have your name be pedantically
> correct everywhere you can.

That is my problem - I cannot communicate well with ASCII only.
Not even with my ISO-8859-2 console I am (sort of) forced to use.
It is not just my name - to hell with it. But the inability
to mix languages freely when I need it drives me up the wall.


-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: