[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RFC: default encoding of documentation and debian control files



Given the recent discussion about UTF-8 support in debian, 
I would like to come forth with following proposal.
Any comments, suggestions, and grammar corrections are welcome


*Addition to section 3 Control files and their fields:

3.3 Default charset of control files

If, for whatever reason (such as upstream author's or maintainer's
names, foreign language package description and similar), you need to
use characters outside 7 bit ASCII range in control files, these
characters must be encoded using UTF-8 encoding.

[Rationale: currently, there is no default charset for these fields.
As a result, everybody uses his own national encoding.
Doing a quick glance at /var/lib/packages/available, I noticed 
some ISO-8859-1 encoded characters, some words in KOI8-R,
and an unknown japanese encoding.]


*Addition to 5.3 debian/changelog:

Character set of debian/changelog must be either pure ASCII, or UTF-8.

[Explanation: it would be sufficient to mention UTF-8, since it is
superset of ASCII, but we want to stay logically consistent with the
next addition, if the rest of documentation is not UTF-8 (and just
hope nobody will use EBDIC :-))]


*Addition to 13.3 Additional documentation:

Documentation of debian packages in text format, if written in language
requiring characters outside of 7-bit ASCII range, should use either
well-established encoding for the given language (such as ISO-8859-2 for
some central- and easter europe languages, KOI8-R for Russian etc...),
or UTF-8 encoding. Maintainers are being encouraged to use UTF-8,
having in mind the general debian migration toward unified character encoding.

Original upstream documentation, if in encoding other than UTF-8 or
the well-established encoding for the particular language, should be
converted either to UTF-8 or to the well-established encoding.
Choice between UTF-8 and other encoding is left to the maintainer
discretion, however, one package should have all the documentation
in one consistent encoding.


*Addition to 13.5 Preferred documentation formats:

HTML documents, if in encoding other than us-ascii, must
have in their header an appropriate META tag describing the used encoding. 

[example: 
<META HTTP-Equiv="Content-Type" CONTENT="text/html; charset=iso-8859-2">
]


*No good section to put this into, perhaps 13.9?

Names of maintainers, upstream authors and other data in packages'
descriptions and related data files (such as debian/changelog,
debian/copyright, debian/control), as well as in English language
documentation, should be either transliterated or transcribed to ASCII,
or used in UTF-8 encoding at the discretion of the maintainer. However,
for names in scripts based on non-latin alphabets, ASCII (or suitable
latin-script) version should be provided along with original name.



-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: