[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DEP 5 and directory/file names with spaces



Peter Samuelson wrote:
First, as I've said elsewhere, this thread is just about the most
impressive bikeshedding session I've ever seen.

In my defence (I started this sub-bikeshedding): it was a sentence
in a postscriptum.
Technically: on handling external data: for every rules there will
be exception.
But no, I don't think debian should support filename components with
'/'. I think we have enough freedom from policy, for such extreme
cases, to repacking sensibly the sources in a new orig.tar.*

So IMO I would like to have file description in a shell format,
so that:
- a paste copy to "ls" command is a good test for maintainer and reader
- paste and copy works also to check license and files

I really don't think we should support <slack> in filename, nor control
character (which are also difficult (very ugly) to quote in shell
language). But IMHO spaces (common in windows), comma (common for revisions)
and "extended" characters should be allowed.


Continuing to the slash thread:

 So I'll try and stick
to a single post, and I'm only posting because I don't think I've seen
mention of the following problem:

[Gunnar Wolf]
Yup - But the newline is also a valid (altough, yes, very uncommon)
part of a filename.

So are non-UTF-8 byte sequences, and I suspect those are a great deal
more common in filenames than newlines.  If you want your copyright
file to be UTF-8, you have to escape those byte sequences somehow.

I propose something very simple: ? to escape any single byte that seems
problematic in any way.  Spaces, tabs, newlines, the ISO-8859-1
registered trademark symbol, etc., etc.  I mean, we don't need this
transform to be reversible, do we?

I think it should be reversible, or we will find a case where two
files will be coded into one encoding.
E.g. this virtual case: a source (and tarball done in ISO 8859-15)

currency/
currency/$		(0x24 in ISO 8859-1, ISO 8859-15)
currency/<pound sign>   (0xA3 in ISO 8859-1, ISO 8859-15)
currency/<yen/yuan sign>   (0xA5 in ISO 8859-1, ISO 8859-15)
currency/<euro sign>	(0xA4 in ISO 8859-15)

Ehi! but these cases cannot be written easily in shell code
(in a UTF-8 environment).

Simple problem, not so simple solution.
But eventually:
- every source package should use be coded with UTF-8 filename components,
  without control chars nor slashes and backslashes.
This could be done automatically, but in few (I hope very seldom) cases.
(so the burden is put to maintainer, but very seldom, and not in DEP5/policy
and tools)

ciao
	cate


Reply to: