[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#701081: debian-policy: mandate an encoding for filenames in binary packages



Le Sat, Mar 02, 2013 at 04:38:49PM +0100, Guillem Jover a écrit :
> 
> I'd second something like this, but I'd first like us to consider if
> we really want any non-ASCII characters in filenames. Currently on sid
> there does not appear to be many such filenames (64 from my check, if
> that's not bogus):
> 
>   $ LC_ALL=C zgrep '[^[:print:]]' \
>     ftp.debian.org_debian_dists_sid_*_Contents-amd64.gz | wc -l

Hi Guillem and everybody,

I had a closer look at these files.

 * There are dictionaries where the filename is the native name of the
   language, like català, español, bokmål, etc.  In all the case the
   characters are valid Unicode.  I think that it would be fair to allow
   such cases.

 * There are names that look rather arbitrary and replaceable
   with ASCII alternatives if needed.  For instance in python-pyramid,
   usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

 * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
   Since I do not know how these certificates work, I do not know if they
   can be renamed.

 * There is a file that need to be in non-ASCII Unicode to fit its purpose:
   usr/share/doc/console-tools/examples/♪♬ in console-tools.  The package
   also distributes a file called README.strange-name in the same directory.

 * There are some more dubious names like 6Sze¶æ_Jab³ek.png in lletters-media,
   or Miroir_Sphérique in optgeo.  However, they do not cause much inconvenience
   with a Unicode locale.

 * The pitivi package gives entries with no obvious Unicode characters, like 
   usr/share/gnome/help/pitivi/C/figures/codecscontainers.jpg.
   I think that we should at least strongly recommend that if a name looks ASCII
   then it should be ASCII.

 * Lastly, there seems to be only a single package that ships non-Unicode filenames,
   non-free/ooohg with for instance 13_Afr d<U+0082>col.gif.

Requiring that all file and directory names are encoded in Unicode and
preferably in ASCII would therefore make only one package RC-buggy.  Requiring
all-ASCII would be also possible with a bit more work, but I am not sure that it
would be worth the effort, as most of the current examples above do not require
specialised fonts.  Altogether, there seems to be a good self-discipline.
However, if there are ways to test the following automatically, maybe we should
consider requesting that what is displayed ASCII should be ASCII.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


Reply to: