[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#701081: debian-policy: mandate an encoding for filenames in binary packages



On Wed, Mar 06, 2013 at 01:45:14PM +0900, Charles Plessy wrote:
> Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> > 
> > I'd second something like this, but I'd first like us to consider if
> > we really want any non-ASCII characters in filenames. Currently on sid
> > there does not appear to be many such filenames (64 from my check, if
> > that's not bogus):
> > 
> >   $ LC_ALL=C zgrep '[^[:print:]]' \
> >     ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l
> 
> Hi Guillem and everybody,
> 
> I had a closer look at these files.
> 
>  * There are dictionaries where the filename is the native name of the
>    language, like català, español, bokmål, etc.  In all the case the
>    characters are valid Unicode.  I think that it would be fair to allow
>    such cases.

This is not the current practice:
In /usr/share/dict/ and /usr/lib/ispell/, only bokmål is 8bit. 
Most dictionnary names are in English,
with sometime an alias in the language
(catala, dansk, foeroyskt, bokmål, svenska).

In /usr/lib/aspell/, most dictionnary are named using the ISO-639 2-letter code
or the english name. There are some non-english aliases like francais.alias,
which is missing the cedilla.  Only català, español and íslenska  are not 8bit.

So currently, there is no standard practice to name dictionnaries after the
UTF-8 encoding of the native spelling for the language, and it would be more
practical to standardize on ISO 639 language code instead.

>  * There are names that look rather arbitrary and replaceable
>    with ASCII alternatives if needed.  For instance in python-pyramid,
>    usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

Probably some test files that could be removed form the binary packages.

>  * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
>    Since I do not know how these certificates work, I do not know if they
>    can be renamed.

The main reason they have such name is to avoid name clash with other .crt file.

>  * There is a file that need to be in non-ASCII Unicode to fit its purpose:
>    usr/share/doc/console-tools/examples/♪♬ in console-tools.  The package
>    also distributes a file called README.strange-name in the same directory.

The value of such file is pretty low.

>  * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
>    or Miroir_Sphérique in optgeo.  However, they do not cause much inconvenience
>    with a Unicode locale.

Miroir_Sphe♦rique is a bug in itself: it should be
Miroir_Sphérique.
'6Sze¶æ_Jab³ek.png' is probably misencoded (it is intended to be 6 in Polish, i.e.
sześć).

Cheers,
-- 
Bill. <ballombe@debian.org>

Imagine a large red swirl here. 


Reply to: