Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
On Wed, Mar 06, 2013 at 01:45:14PM +0900, Charles Plessy wrote:
> Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> > I'd second something like this, but I'd first like us to consider if
> > we really want any non-ASCII characters in filenames. Currently on sid
> > there does not appear to be many such filenames (64 from my check, if
> > that's not bogus):
> > $ LC_ALL=C zgrep '[^[:print:]]' \
> > ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l
> Hi Guillem and everybody,
> I had a closer look at these files.
> * There are dictionaries where the filename is the native name of the
> language, like català, español, bokmål, etc. In all the case the
> characters are valid Unicode. I think that it would be fair to allow
> such cases.
This is not the current practice:
In /usr/share/dict/ and /usr/lib/ispell/, only bokmål is 8bit.
Most dictionnary names are in English,
with sometime an alias in the language
(catala, dansk, foeroyskt, bokmål, svenska).
In /usr/lib/aspell/, most dictionnary are named using the ISO-639 2-letter code
or the english name. There are some non-english aliases like francais.alias,
which is missing the cedilla. Only català, español and íslenska are not 8bit.
So currently, there is no standard practice to name dictionnaries after the
UTF-8 encoding of the native spelling for the language, and it would be more
practical to standardize on ISO 639 language code instead.
> * There are names that look rather arbitrary and replaceable
> with ASCII alternatives if needed. For instance in python-pyramid,
Probably some test files that could be removed form the binary packages.
> * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
> Since I do not know how these certificates work, I do not know if they
> can be renamed.
The main reason they have such name is to avoid name clash with other .crt file.
> * There is a file that need to be in non-ASCII Unicode to fit its purpose:
> usr/share/doc/console-tools/examples/♪♬ in console-tools. The package
> also distributes a file called README.strange-name in the same directory.
The value of such file is pretty low.
> * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
> or Miroir_Sphérique in optgeo. However, they do not cause much inconvenience
> with a Unicode locale.
Miroir_Sphe♦rique is a bug in itself: it should be
'6Sze¶æ_Jab³ek.png' is probably misencoded (it is intended to be 6 in Polish, i.e.
Imagine a large red swirl here.