[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

Le Thu, Feb 21, 2013 at 12:43:28PM +0100, Helmut Grohne a écrit :
> Apparently the debian-policy currently says nothing about the characters
> used in filenames contained in binary packages. Most packages use common
> sense and only use a small subset of US-ASCII. In Debian sid main most
> filenames can be represented using the following subset of US-ASCII
> characters (written as a regular expression):
> 	[][a-zA-Z0-9{}<>() ^/,=:&!*%#$~@+._-]
> The number of exceptions is about 200 contained in about 50 binary
> packages. In those packages some filenames are not representable as
> UTF-8 (for example aspell-is) and others don't make any sense in
> ISO-8859-15 (for example ca-certificates).
> It would be nice if some common ground concerning filename encoding
> could be reached. The options range from a rather restrictive definition
> of acceptable characters via requiring filenames to be representable in
> US-ASCII to mandating a particular encoding (such as UTF-8). This could
> be first introduced as a SHOULD and later turned into a MUST.
> Personally I do not really care about what the precise restriction is as
> long as it permits a mechanical transformation to unicode.

Dear all,

after more than one month of discussion, we have not reached a conclusion.

In the current situation there is no policy, which means that everything is
allowed.  Indeed, there is at least one package with filenames using more than
one set of non-ASCII characters, so no user can see correctly the names of
every file in this package at the same time.

However, I think that it is clear from the discussion is that it would not
satisfy anybody if we would modify the Policy to implement the current
practice, that everything is permitted.

Given that this bug report asks for a policy about the encoding of filenames,
doing nothing is equivalent to reject it.  I therefore propose one more round
of concertation, and if it is not conclusive, I will tag this bug wontfix and
close it (we have 185 other bugs in the queue).

Of course, every developer is free to tackle the issue by working with all the
other package maintainers in order to change the current practice until it
matches something that we do not feel uncomfortable documenting in the Policy. 

On my side, I made a proposal with actionable items: fix the few packages that
are not using UTF-8, and modify the Policy to reflect the current practice
of using ASCII in most of the times and other UTF-8 characters parcimoniously.

I understand very well the arguments against having any UTF-8 character at all,
but we currently have such packages in our archive, so if there is no plan to
modify these packages, then we can not plan to solve this bug.

Can others comment how they would like to see this bug solved ?

Have a nice day,


Reply to: