Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
On Sun, Mar 24, 2013 at 08:01:03PM +0900, Charles Plessy wrote:
> after more than one month of discussion, we have not reached a conclusion.
Thanks for the ping.
> In the current situation there is no policy, which means that everything is
> allowed. Indeed, there is at least one package with filenames using more than
> one set of non-ASCII characters, so no user can see correctly the names of
> every file in this package at the same time.
Some more data here. I checked sid main amd64 binary packages. The only
ones containing invalid UTF-8 sequences (and thus violating the current
proposal) would be aspell-is and jpilot. This suggests that UTF-8 is a
defacto standard already. Fixing two packages shouldn't be that hard. I
have filed a wishlist bug #704446 against lintian to check for this
regardless of the outcome of this bug.
> On my side, I made a proposal with actionable items: fix the few packages that
> are not using UTF-8, and modify the Policy to reflect the current practice
> of using ASCII in most of the times and other UTF-8 characters parcimoniously.
I am in favour of this solution.
* Requiring any subset of UTF-8 has the direct benefit of being able to
interpret all filenames used without guesswork.
* This is in line with Fedora's policy.
* I saw very little disagreement about whether to permit non-UTF-8
sequences. Discussion seemed mostly to be around which subset to
require.
> I understand very well the arguments against having any UTF-8 character at all,
> but we currently have such packages in our archive, so if there is no plan to
> modify these packages, then we can not plan to solve this bug.
I see little benefit with restricting to ASCII compared to the benefit
with restricting to UTF-8. Remember that the goal of this bug was to
make filenames machine readable. I think that further restrictions
should happen in the context of #99933. I asked for not merging these
issues, because I would like to keep the scope of this issue limited and
thus implementable.
> Can others comment how they would like to see this bug solved ?
Any proposal that limits to a subset of UTF-8 and a superset of
printable ASCII is fine with me. My preferred choice would be just
UTF-8. I have no objections to recommending the use of a subset of
printable ASCII either.
To me it appears to be a matter of wording right now. Consensus is
basically there. Implementing it would cause two policy violations
(aspell-is and jpilot), which imo is little impact.
Helmut
Reply to: