[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#246016: Define allowed charsets properly



Package: debian-policy
Severity: wishlist

Hi,

this mail trys to summarize the recent discussion about charsets
(especially utf-8), and which we accept where, starting with
http://lists.debian.org/debian-policy/2004/debian-policy-200404/msg00016.html
It replaces (at least) the following bugs:
#99324: Default charset should be UTF-8
#142164: Packages files should be in UTF-8
#208011: [PROPOSAL] UTF-8 encoding for debian/control
#241333: policy mentions that changelogs should be utf-8; this is a bug
If you feel that this summarization has errors, please don't hesitate
to correct me. If there are no corrections (or we agree on the
changes), I'll send a proposal to policy to this bug in the next
weeks.

Very first, a technical remark: ASCII (that are the characters < 128)
is a subset of valid UTF-8-charset, and also a subset of ISO-LATIN-1
and -15. So, if we say "nothing except UTF-8", plain ASCII-characters
are also allowed. We say that any charset that defines the characters
< 128 same as ASCII is "compatible with ASCII".

First, on what we all agreed rather easy: Non-UTF-8 is not allowed.
Also, for the release of sarge (at least if it is rather soon),
non-ASCII is at least strong deprecated. We all agreed that there is
now other way than either allow UTF-8, or add checks against.

We also agreed that field-names (like "Package:") must consist of
ASCII; I would try to be even stronger on this and say:
"A-Za-z0-9\-\+" are the only allowed characters for a field-name.

We also agreed that we won't localize "normal" control files. The only
localized files may be the *.po files, and that we don't put any
restriction for them in the policy (because there is already a
standard).

We also agreed that we won't accept non-ASCII for non-descriptive
fields, e.g. Package Name, Dependencies, Version, ...


We had some discussion whether we allow some or all characters for
some descriptive fields, i.e. Maintainer, Uploaders, Package
description, changelog and other required or standard documents
(copyright, README.Debian, ...). At the end, nobody really vetoed
against allowing all utf-8-characters there, but requiring a
transcription for characters that are not in "Basic Latin", "Latin-1
supplement" and "Latin Extended-A". (Jeroens proposal)

However, allowing these characters doesn't change the need for all
documents to be english (except, of course, the localized *.po files).


Is this ok for you all?


Cheers,
Andi
-- 
   http://home.arcor.de/andreas-barth/
   PGP 1024/89FB5CE5  DC F1 85 6D A6 45 9C 0F  3B BE F1 D0 C5 D1 D9 0C



Reply to: