Bug#701081: debian-policy: mandate an encoding for filenames in binary packages

To: Charles Plessy <plessy@debian.org>
Cc: 701081@bugs.debian.org
Subject: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
From: Helmut Grohne <helmut@subdivi.de>
Date: Mon, 1 Apr 2013 11:37:55 +0200
Message-id: <[🔎] 20130401093755.GA16174@alf.mars>
Mail-followup-to: Helmut Grohne <helmut@subdivi.de>, Charles Plessy <plessy@debian.org>, 701081@bugs.debian.org
Reply-to: Helmut Grohne <helmut@subdivi.de>, 701081@bugs.debian.org
In-reply-to: <20130324110057.GA20598@plessy.org>
References: <20130221114327.GA19746@alf.mars> <20130324110057.GA20598@plessy.org>

On Sun, Mar 24, 2013 at 08:01:03PM +0900, Charles Plessy wrote:
> after more than one month of discussion, we have not reached a conclusion.

Thanks for the ping.

> In the current situation there is no policy, which means that everything is
> allowed.  Indeed, there is at least one package with filenames using more than
> one set of non-ASCII characters, so no user can see correctly the names of
> every file in this package at the same time.

Some more data here. I checked sid main amd64 binary packages. The only
ones containing invalid UTF-8 sequences (and thus violating the current
proposal) would be aspell-is and jpilot. This suggests that UTF-8 is a
defacto standard already. Fixing two packages shouldn't be that hard. I
have filed a wishlist bug #704446 against lintian to check for this
regardless of the outcome of this bug.

> On my side, I made a proposal with actionable items: fix the few packages that
> are not using UTF-8, and modify the Policy to reflect the current practice
> of using ASCII in most of the times and other UTF-8 characters parcimoniously.

I am in favour of this solution.

 * Requiring any subset of UTF-8 has the direct benefit of being able to
   interpret all filenames used without guesswork.
 * This is in line with Fedora's policy.
 * I saw very little disagreement about whether to permit non-UTF-8
   sequences. Discussion seemed mostly to be around which subset to
   require.

> I understand very well the arguments against having any UTF-8 character at all,
> but we currently have such packages in our archive, so if there is no plan to
> modify these packages, then we can not plan to solve this bug.

I see little benefit with restricting to ASCII compared to the benefit
with restricting to UTF-8. Remember that the goal of this bug was to
make filenames machine readable. I think that further restrictions
should happen in the context of #99933. I asked for not merging these
issues, because I would like to keep the scope of this issue limited and
thus implementable.

> Can others comment how they would like to see this bug solved ?

Any proposal that limits to a subset of UTF-8 and a superset of
printable ASCII is fine with me. My preferred choice would be just
UTF-8. I have no objections to recommending the use of a subset of
printable ASCII either.

To me it appears to be a matter of wording right now. Consensus is
basically there. Implementing it would cause two policy violations
(aspell-is and jpilot), which imo is little impact.

Helmut

Reply to:

Prev by Date: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Next by Date: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Previous by thread: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Next by thread: Re: Bug#701081: debian-policy: mandate an encoding for filenames in binary packages
Index(es):
- Date
- Thread