Bug#593909: Names of Fields in Control Files
Le Mon, Oct 11, 2010 at 03:13:43PM -0700, Russ Allbery a écrit :
> Charles Plessy <plessy@debian.org> writes:
>
> > how about simply paraphrasing the RFC 822/5832, which our the source of
> > inspiration ? In that case, the requirement for field names will be to
> > be printable ASCII characters, except colons.
>
> > I propose the following change in the context the patch that I am
> > preparing for clarifying the Policy's chapter about control files, in
> > bug #593909.
>
> It occurred to me, on reviewing your other patch as well, that this change
> should probably also say explicitly that field names may not begin with #.
Here is an updated patch, that contains the following:
Each paragraph consists of a series of data fields; each
field consists of the field name, followed by a colon and
- then the data/value associated with that field. It ends at
- the end of the (logical) line. Horizontal whitespace
+ then the data/value associated with that field. The field
+ name is composed of printable ASCII characters (i.e.,
+ characters that have values between 33 and 126, inclusive)
+ except colon and must not with a begin with #. The
+ field ends at the end of the line or at the end of the
+ last continuation line (see below). Horizontal whitespace
(spaces and tabs) may occur immediately before or after the
value and is ignored there; it is conventional to put a
Apart from adding that fields names may not begin with #, I also changed
‘US-ASCII’ for ‘ASCII’, since this is the vocabulary used by the Policy.
Have a nice day,
--
Charles
>From ae5afd407773a02863169dc71bdaacaeb644570c Mon Sep 17 00:00:00 2001
From: Charles Plessy <plessy@debian.org>
Date: Wed, 13 Oct 2010 00:14:42 +0900
Subject: [PATCH] Clarification of the format of control files, Closes: #501930, #593909.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Specifies field names similarly to RFC 822/5832;
- Distinguishes simple, folded and mulitiline fields;
- Clarifies paragraph separators (#501930);
- The order of paragraphs is significant;
- Fields can have different types or purposes in different control files;
- Moved the description of comments from §5.2 to §5.1;
- Documented that relationship fields can only be folded in debian/control.
---
policy.sgml | 116 +++++++++++++++++++++++++++++++++++++---------------------
1 files changed, 74 insertions(+), 42 deletions(-)
diff --git a/policy.sgml b/policy.sgml
index 642f672..02637f0 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -2479,19 +2479,26 @@ endif
fields<footnote>
The paragraphs are also sometimes referred to as stanzas.
</footnote>.
- The paragraphs are separated by blank lines. Some control
+ The paragraphs are separated by empty lines. Parsers may accept
+ lines consisting solely of spaces and tabs as paragraph
+ separators, but control files should use empty lines. Some control
files allow only one paragraph; others allow several, in
which case each paragraph usually refers to a different
package. (For example, in source packages, the first
paragraph refers to the source package, and later paragraphs
- refer to binary packages generated from the source.)
+ refer to binary packages generated from the source.) The
+ ordering of the paragraphs in control files is significant.
</p>
<p>
Each paragraph consists of a series of data fields; each
field consists of the field name, followed by a colon and
- then the data/value associated with that field. It ends at
- the end of the (logical) line. Horizontal whitespace
+ then the data/value associated with that field. The field
+ name is composed of printable ASCII characters (i.e.,
+ characters that have values between 33 and 126, inclusive)
+ except colon and must not with a begin with #. The
+ field ends at the end of the line or at the end of the
+ last continuation line (see below). Horizontal whitespace
(spaces and tabs) may occur immediately before or after the
value and is ignored there; it is conventional to put a
single space after the colon. For example, a field might
@@ -2509,22 +2516,52 @@ Package: libc6
</p>
<p>
- Many fields' values may span several lines; in this case
- each continuation line must start with a space or a tab.
- Any trailing spaces or tabs at the end of individual
- lines of a field value are ignored.
+ There are three types of fields:
+ <taglist>
+ <tag>simple</tag>
+ <item>
+ The field, including its value, must be a single line. Folding
+ of the field is not permitted. This is the default field type
+ if the definition of the field does not specify a different
+ type.
+ </item>
+ <tag>folded</tag>
+ <item>
+ The value of a folded field is a logical line that may span
+ several lines. The lines after the first are called
+ continuation lines and must start with a space or a tab.
+ Whitespace, including any newlines, is not significant in the
+ field values of folded fields.<footnote>
+ This folding method is similar to RFC 5322, allowing control
+ files that contain only one paragraph and no multiline fields
+ to be read by parsers written for RFC 5322.
+ </footnote>
+ </item>
+ <tag>multiline</tag>
+ <item>
+ The value of a multiline field may comprise multiple continuation
+ lines. The first line of the value, the part on the same line as
+ the field name, often has special significance or may have to be
+ empty. Other lines are added following the same syntax as the
+ continuation lines the folded fields. Whitespace, including newlines,
+ is significant in the values of multiline fields.
+ </item>
+ </taglist>
</p>
<p>
- In fields where it is specified that lines may not wrap,
- only a single line of data is allowed and whitespace is not
- significant in a field body. Whitespace must not appear
+ Whitespace must not appear
inside names (of packages, architectures, files or anything
else) or version numbers, or between the characters of
multi-character version relationships.
</p>
<p>
+ The presence and purpose of a field, and the syntax of its
+ value may differ between types of control files.
+ </p>
+
+ <p>
Field names are not case-sensitive, but it is usual to
capitalize the field names using mixed case as shown below.
Field values are case-sensitive unless the description of the
@@ -2532,9 +2569,17 @@ Package: libc6
</p>
<p>
- Blank lines, or lines consisting only of spaces and tabs,
- are not allowed within field values or between fields - that
- would mean a new paragraph.
+ Paragraph separators (empty lines) and lines consisting only of
+ spaces and tabs are not allowed within field values or between
+ fields. Empty lines in field values are usually escaped by
+ representing them by a space followed by a dot.
+ </p>
+
+ <p>
+ Lines starting with # without any preceding whitespace are comments
+ lines that are only permitted in source package control files
+ (<file>debian/control</file>). These comment lines are ignored, even
+ between two continuation lines. They do not end logical lines.
</p>
<p>
@@ -2600,8 +2645,8 @@ Package: libc6
<file>.changes</file> file to accompany the upload, and by
<prgn>dpkg-source</prgn> when it creates the
<file>.dsc</file> source control file as part of a source
- archive. Many fields are permitted to span multiple lines in
- <file>debian/control</file> but not in any other control
+ archive. Some fields are folded in <file>debian/control</file>,
+ but not in any other control
file. These tools are responsible for removing the line
breaks from such fields when using fields from
<file>debian/control</file> to generate other control files.
@@ -2614,16 +2659,6 @@ Package: libc6
when they generate output control files.
See <ref id="substvars"> for details.
</p>
-
- <p>
- In addition to the control file syntax described <qref
- id="controlsyntax">above</qref>, this file may also contain
- comment lines starting with <tt>#</tt> without any preceding
- whitespace. All such lines are ignored, even in the middle of
- continuation lines for a multiline field, and do not end a
- multiline field.
- </p>
-
</sect>
<sect id="binarycontrolfiles">
@@ -2822,11 +2857,7 @@ Package: libc6
</p>
<p>
- Any parser that interprets the Uploaders field in
- <file>debian/control</file> must permit it to span multiple
- lines. Line breaks in an Uploaders field that spans multiple
- lines are not significant and the semantics of the field are
- the same as if the line breaks had not been present.
+ The Uploaders field in <file>debian/control</file> can be folded.
</p>
</sect1>
@@ -3006,7 +3037,7 @@ Package: libc6
<p>
This is a boolean field which may occur only in the
control file of a binary package or in a per-package fields
- paragraph of a main source control data file.
+ paragraph of a source package control file.
</p>
<p>
@@ -3242,7 +3273,8 @@ Package: libc6
In a source or binary control file, the <tt>Description</tt>
field contains a description of the binary package, consisting
of two parts, the synopsis or the short description, and the
- long description. The field's format is as follows:
+ long description. It is a multiline field with the following
+ format:
</p>
<p>
@@ -3306,8 +3338,8 @@ Package: libc6
field contains a summary of the descriptions for the packages
being uploaded. For this case, the first line of the field
value (the part on the same line as <tt>Description:</tt>) is
- always empty. The content of the field is expressed as
- continuation lines, one line per package. Each line is
+ always empty. It is a multiline field, with one
+ line per package. Each line is
indented by one space and contains the name of a binary
package, a space, a hyphen (<tt>-</tt>), a space, and the
short description line from that package.
@@ -3443,7 +3475,7 @@ Package: libc6
<heading><tt>Changes</tt></heading>
<p>
- This field contains the human-readable changes data, describing
+ This multiline field contains the human-readable changes data, describing
the differences between the last version and the current one.
</p>
@@ -3481,7 +3513,7 @@ Package: libc6
<heading><tt>Binary</tt></heading>
<p>
- This field is a list of binary packages. Its syntax and
+ This folded field is a list of binary packages. Its syntax and
meaning varies depending on the control file in which it
appears.
</p>
@@ -3491,7 +3523,7 @@ Package: libc6
packages which a source package can produce, separated by
commas<footnote>
A space after each comma is conventional.
- </footnote>. It may span multiple lines. The source package
+ </footnote>. The source package
does not necessarily produce all of these binary packages for
every architecture. The source control file doesn't contain
details of which architectures are appropriate for which of
@@ -3501,7 +3533,7 @@ Package: libc6
<p>
When it appears in a <file>.changes</file> file, it lists the
names of the binary packages being uploaded, separated by
- whitespace (not commas). It may span multiple lines.
+ whitespace (not commas).
</p>
</sect1>
@@ -3624,7 +3656,7 @@ Files:
and <tt>Checksums-Sha256</tt></heading>
<p>
- These fields contain a list of files with a checksum and size
+ These multiline fields contain a list of files with a checksum and size
for each one. Both <tt>Checksums-Sha1</tt>
and <tt>Checksums-Sha256</tt> have the same syntax and differ
only in the checksum algorithm used: SHA-1
@@ -4473,13 +4505,13 @@ Checksums-Sha256:
specification subject to the rules in <ref
id="controlsyntax">, and must appear where it's necessary to
disambiguate; it is not otherwise significant. All of the
- relationship fields may span multiple lines. For
+ relationship fields can only be folded in source package control files. For
consistency and in case of future changes to
<prgn>dpkg</prgn> it is recommended that a single space be
used after a version relationship and before a version
number; it is also conventional to put a single space after
each comma, on either side of each vertical bar, and before
- each open parenthesis. When wrapping a relationship field, it
+ each open parenthesis. When opening a continuation line in a relationship field, it
is conventional to do so after a comma and before the space
following that comma.
</p>
--
1.7.1
Reply to: