[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99933: second attempt at more comprehensive unicode policy



Colin Walters <walters@debian.org> writes:

> I don't think so.  I have put forth many real-world scenarios in which
> using national charsets for filenames simply breaks, in ways that are
> basically impossible to fix.  You may be able to get away with using a
> national charset on a machine where everyone speaks the same language,
> and never interacts with speakers of another language, but that's about
> it.

Don't you think this is a common case? I'd even say more common than
your scenarios. At least common enough that it should be acknowledged.

> Again, my policy proposal does *not* (I am 95% sure) create any new RC
> bugs.  The only "must" is for filenames actually included in packages.  

I am not concerned about RC bugs in mine or others packages. My point
is that ways how things have worked up to now will no longer, and this
can be avoided.

> First of all, there is no need for 'if and only if'.  Programs can
> always try to decode filenames in UTF-8, and if that fails, then try the
> locale's charset.

This will invariably interpret some non-ASCII non-UTF8 filenames wrong.

> Again, note this part of my proposal is still not a "must".  Your
> programs will not get RC bugs for a lack of UTF-8 support for filenames.

But it will condone or even suggest broken behaviour like Gnome2's.

> Well, you might have to set G_BROKEN_FILENAMES.

Considering old standards broken because a newer one exists is just
ridiculous.

I still think taking LC_CTYPE unconditionally as a hint is the best
solution. People who don't care (e.g. USians) are happy with any
solution. People that have it at an older encoding get some slack.
People like you should already have it at UTF8 and get all the fun
right away.

> But this is the whole reason we are switching to UTF-8; so programs
> will not have to deal with the nightmare of recoding filenames!

No argument there.

> I've noticed that UTF-8 sometimes makes zsh unhappy, [...]

That's quite an understatement. The commandline editor can't deal with
multibyte characters in any way. So for example entering an o umlaut
and then deleting it gets you in trouble, because zsh does not handle
the two byte sequence as one character.

FWIW, I am quite content with mandating the contents of some files as
UTF8. We may want a BOM, at the start, though.

-- 
Robbe



Reply to: