[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

Charles Plessy <plessy@debian.org> writes:

>  * There are names that look rather arbitrary and replaceable
>    with ASCII alternatives if needed.  For instance in python-pyramid,
>    usr/lib/python2.6/dist-packages/pyramid/tests/fixtures/static/héhé.html

At least some of these (for things located in a directory named tests) are
probably explicit tests of non-ASCII file names.

>  * There are CA certificates with names like Certinomis_-_Autorité_Racine.crt.
>    Since I do not know how these certificates work, I do not know if they
>    can be renamed.

This to me feels like a good use of Unicde.  One of the reasons why I'm in
favor of a general policy saying we should use UTF-8, rather than a policy
saying to use only ASCII names, is that names of things in the real world
(people and organizations) are often put into file names.  And it really
bothers me when we tell people they can't use their *actual* name or are
required to misspell it in some arbitrary way in order to shoehorn
themselves into ASCII.

In this case, I assume the name of the relevant certificate authority is
Certinomis - Autorité Racine.  I think it's quite reasonable to use the
actual name for the certificate authority in the file name.

>  * The pitivi package gives entries with no obvious Unicode characters,
>  like usr/share/gnome/help/pitivi/C/figures/codecscontainers.jpg.  I
>  think that we should at least strongly recommend that if a name looks
>  ASCII then it should be ASCII.

It's mildly difficult to be clear about this, since this can depend very
heavily on the font.  In general, the way this sort of requirement is
stated in the Unicode world is to require a normalized form, but I think
that's rather heavy-weight for what we're trying to accomplish.

But yes, we can just make a general (but not formally precise)

> Requiring that all file and directory names are encoded in Unicode and
> preferably in ASCII would therefore make only one package RC-buggy.
> Requiring all-ASCII would be also possible with a bit more work, but I
> am not sure that it would be worth the effort, as most of the current
> examples above do not require specialised fonts.  Altogether, there
> seems to be a good self-discipline.  However, if there are ways to test
> the following automatically, maybe we should consider requesting that
> what is displayed ASCII should be ASCII.

I think it's reasonable to say that file names that can be represented in
ASCII should be in ASCII.  But I do think that it's entirely reasonable to
use Unicode for names that truly aren't ASCII names, and it would bother
me to tell people to misspell those names to squeeze them into ASCII.

For the other half of what's been discussed, I don't think that Debian
should have a position about what's *inside* files other than files where
we're already standardizing the contents (such as the copyright file).
There may be reasons why files should be encoded in legacy encodings for
specific uses, and I don't feel like it's the proper role of Policy to
dictate to all package maintainers that they can't work with those use

Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>

Reply to: