[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#593909: Names of Fields in Control Files



Le Mon, Oct 11, 2010 at 03:13:43PM -0700, Russ Allbery a écrit :
> Charles Plessy <plessy@debian.org> writes:
> 
> > how about simply paraphrasing the RFC 822/5832, which our the source of
> > inspiration ? In that case, the requirement for field names will be to
> > be printable ASCII characters, except colons.
> 
> > I propose the following change in the context the patch that I am
> > preparing for clarifying the Policy's chapter about control files, in
> > bug #593909.
> 
> It occurred to me, on reviewing your other patch as well, that this change
> should probably also say explicitly that field names may not begin with #.

Here is an updated patch, that contains the following:

          Each paragraph consists of a series of data fields; each
          field consists of the field name, followed by a colon and
-         then the data/value associated with that field.  It ends at
-         the end of the (logical) line.  Horizontal whitespace
+         then the data/value associated with that field.  The field
+         name is composed of printable ASCII characters (i.e.,
+         characters that have values between 33 and 126, inclusive)
+         except colon and must not with a begin with #.  The
+         field ends at the end of the line or at the end of the
+         last continuation line (see below).  Horizontal whitespace
          (spaces and tabs) may occur immediately before or after the
          value and is ignored there; it is conventional to put a

Apart from adding that fields names may not begin with #, I also changed
‘US-ASCII’ for ‘ASCII’, since this is the vocabulary used by the Policy.

Have a nice day,

-- 
Charles
>From ae5afd407773a02863169dc71bdaacaeb644570c Mon Sep 17 00:00:00 2001
From: Charles Plessy <plessy@debian.org>
Date: Wed, 13 Oct 2010 00:14:42 +0900
Subject: [PATCH] Clarification of the format of control files, Closes: #501930, #593909.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

 - Specifies field names similarly to RFC 822/5832;
 - Distinguishes simple, folded and mulitiline fields;
 - Clarifies paragraph separators (#501930);
 - The order of paragraphs is significant;
 - Fields can have different types or purposes in different control files;
 - Moved the description of comments from §5.2 to §5.1;
 - Documented that relationship fields can only be folded in debian/control.
---
 policy.sgml |  116 +++++++++++++++++++++++++++++++++++++---------------------
 1 files changed, 74 insertions(+), 42 deletions(-)

diff --git a/policy.sgml b/policy.sgml
index 642f672..02637f0 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -2479,19 +2479,26 @@ endif
 	  fields<footnote>
 		The paragraphs are also sometimes referred to as stanzas.
 	  </footnote>.
-	  The paragraphs are separated by blank lines.  Some control
+	  The paragraphs are separated by empty lines.  Parsers may accept
+	  lines consisting solely of spaces and tabs as paragraph
+	  separators, but control files should use empty lines.  Some control
 	  files allow only one paragraph; others allow several, in
 	  which case each paragraph usually refers to a different
 	  package.  (For example, in source packages, the first
 	  paragraph refers to the source package, and later paragraphs
-	  refer to binary packages generated from the source.)
+	  refer to binary packages generated from the source.)  The
+	  ordering of the paragraphs in control files is significant.
 	</p>
 
 	<p>
 	  Each paragraph consists of a series of data fields; each
 	  field consists of the field name, followed by a colon and
-	  then the data/value associated with that field.  It ends at
-	  the end of the (logical) line.  Horizontal whitespace
+	  then the data/value associated with that field.  The field
+	  name is composed of printable ASCII characters (i.e.,
+	  characters that have values between 33 and 126, inclusive)
+	  except colon and must not with a begin with #.  The
+	  field ends at the end of the line or at the end of the
+	  last continuation line (see below).  Horizontal whitespace
 	  (spaces and tabs) may occur immediately before or after the
 	  value and is ignored there; it is conventional to put a
 	  single space after the colon.  For example, a field might
@@ -2509,22 +2516,52 @@ Package: libc6
 	</p>
 
 	<p>
-	  Many fields' values may span several lines; in this case
-	  each continuation line must start with a space or a tab.
-	  Any trailing spaces or tabs at the end of individual
-	  lines of a field value are ignored. 
+	  There are three types of fields:
+	  <taglist>
+	    <tag>simple</tag>
+	    <item>
+	      The field, including its value, must be a single line.  Folding
+	      of the field is not permitted.  This is the default field type
+	      if the definition of the field does not specify a different
+	      type.
+	    </item>
+	    <tag>folded</tag>
+	    <item>
+	      The value of a folded field is a logical line that may span
+	      several lines.  The lines after the first are called
+	      continuation lines and must start with a space or a tab.
+	      Whitespace, including any newlines, is not significant in the
+	      field values of folded fields.<footnote>
+	        This folding method is similar to RFC 5322, allowing control
+	        files that contain only one paragraph and no multiline fields
+	        to be read by parsers written for RFC 5322.
+	      </footnote>
+	    </item>
+	    <tag>multiline</tag>
+	    <item>
+	      The value of a multiline field may comprise multiple continuation
+	      lines.  The first line of the value, the part on the same line as
+	      the field name, often has special significance or may have to be
+	      empty.  Other lines are added following the same syntax as the
+	      continuation lines the folded fields.  Whitespace, including newlines,
+	      is significant in the values of multiline fields.
+	    </item>
+	  </taglist>
 	</p>
 
 	<p>
-	  In fields where it is specified that lines may not wrap,
-          only a single line of data is allowed and whitespace is not
-          significant in a field body. Whitespace must not appear
+	  Whitespace must not appear
           inside names (of packages, architectures, files or anything
           else) or version numbers, or between the characters of
           multi-character version relationships.
 	</p>
 
 	<p>
+	  The presence and purpose of a field, and the syntax of its
+	  value may differ between types of control files.
+	</p>
+
+	<p>
 	  Field names are not case-sensitive, but it is usual to
 	  capitalize the field names using mixed case as shown below.
 	  Field values are case-sensitive unless the description of the
@@ -2532,9 +2569,17 @@ Package: libc6
 	</p>
 
 	<p>
-	  Blank lines, or lines consisting only of spaces and tabs,
-	  are not allowed within field values or between fields - that
-	  would mean a new paragraph.
+	  Paragraph separators (empty lines) and lines consisting only of
+	  spaces and tabs are not allowed within field values or between
+	  fields.  Empty lines in field values are usually escaped by
+	  representing them by a space followed by a dot.
+	</p>
+
+	<p>
+	  Lines starting with # without any preceding whitespace are comments
+	  lines that are only permitted in source package control files
+	  (<file>debian/control</file>).  These comment lines are ignored, even
+	  between two continuation lines.  They do not end logical lines.
 	</p>
 
 	<p>
@@ -2600,8 +2645,8 @@ Package: libc6
 	  <file>.changes</file> file to accompany the upload, and by
 	  <prgn>dpkg-source</prgn> when it creates the
 	  <file>.dsc</file> source control file as part of a source
-	  archive. Many fields are permitted to span multiple lines in
-	  <file>debian/control</file> but not in any other control
+	  archive.  Some fields are folded in <file>debian/control</file>,
+	  but not in any other control
 	  file. These tools are responsible for removing the line
 	  breaks from such fields when using fields from
 	  <file>debian/control</file> to generate other control files.
@@ -2614,16 +2659,6 @@ Package: libc6
 	  when they generate output control files.
 	  See <ref id="substvars"> for details.
 	</p>
-
-	<p>
-	  In addition to the control file syntax described <qref
-	  id="controlsyntax">above</qref>, this file may also contain
-	  comment lines starting with <tt>#</tt> without any preceding
-	  whitespace.  All such lines are ignored, even in the middle of
-	  continuation lines for a multiline field, and do not end a
-	  multiline field.
-	</p>
-
       </sect>
 
       <sect id="binarycontrolfiles">
@@ -2822,11 +2857,7 @@ Package: libc6
 	  </p>
 
 	  <p>
-	    Any parser that interprets the Uploaders field in
-	    <file>debian/control</file> must permit it to span multiple
-	    lines.  Line breaks in an Uploaders field that spans multiple
-	    lines are not significant and the semantics of the field are
-	    the same as if the line breaks had not been present.
+	    The Uploaders field in <file>debian/control</file> can be folded.
 	  </p>
  	</sect1>
 
@@ -3006,7 +3037,7 @@ Package: libc6
 	  <p>
 	    This is a boolean field which may occur only in the
 	    control file of a binary package or in a per-package fields
-	    paragraph of a main source control data file.
+	    paragraph of a source package control file.
 	  </p>
 
 	  <p>
@@ -3242,7 +3273,8 @@ Package: libc6
 	    In a source or binary control file, the <tt>Description</tt>
 	    field contains a description of the binary package, consisting
 	    of two parts, the synopsis or the short description, and the
-	    long description. The field's format is as follows:
+	    long description.  It is a multiline field with the following
+	    format:
 	  </p>
 
 	  <p>
@@ -3306,8 +3338,8 @@ Package: libc6
 	    field contains a summary of the descriptions for the packages
 	    being uploaded.  For this case, the first line of the field
 	    value (the part on the same line as <tt>Description:</tt>) is
-	    always empty.  The content of the field is expressed as
-	    continuation lines, one line per package.  Each line is
+	    always empty.  It is a multiline field, with one
+	    line per package.  Each line is
 	    indented by one space and contains the name of a binary
 	    package, a space, a hyphen (<tt>-</tt>), a space, and the
 	    short description line from that package.
@@ -3443,7 +3475,7 @@ Package: libc6
 	  <heading><tt>Changes</tt></heading>
 
 	  <p>
-	    This field contains the human-readable changes data, describing
+	    This multiline field contains the human-readable changes data, describing
 	    the differences between the last version and the current one.
 	  </p>
 
@@ -3481,7 +3513,7 @@ Package: libc6
 	  <heading><tt>Binary</tt></heading>
 
 	  <p>
-	    This field is a list of binary packages.  Its syntax and
+	    This folded field is a list of binary packages.  Its syntax and
 	    meaning varies depending on the control file in which it
 	    appears.
 	  </p>
@@ -3491,7 +3523,7 @@ Package: libc6
 	    packages which a source package can produce, separated by
 	    commas<footnote>
 		A space after each comma is conventional.
-	    </footnote>.  It may span multiple lines.  The source package
+	    </footnote>.  The source package
 	    does not necessarily produce all of these binary packages for
 	    every architecture.  The source control file doesn't contain
 	    details of which architectures are appropriate for which of
@@ -3501,7 +3533,7 @@ Package: libc6
 	  <p>
 	    When it appears in a <file>.changes</file> file, it lists the
 	    names of the binary packages being uploaded, separated by
-	    whitespace (not commas).  It may span multiple lines.
+	    whitespace (not commas).
 	  </p>
 	</sect1>
 
@@ -3624,7 +3656,7 @@ Files:
 	    and <tt>Checksums-Sha256</tt></heading>
 
 	  <p>
-	    These fields contain a list of files with a checksum and size
+	    These multiline fields contain a list of files with a checksum and size
 	    for each one.  Both <tt>Checksums-Sha1</tt>
 	    and <tt>Checksums-Sha256</tt> have the same syntax and differ
 	    only in the checksum algorithm used: SHA-1
@@ -4473,13 +4505,13 @@ Checksums-Sha256:
 	  specification subject to the rules in <ref
 	  id="controlsyntax">, and must appear where it's necessary to
 	  disambiguate; it is not otherwise significant.  All of the
-	  relationship fields may span multiple lines.	For
+	  relationship fields can only be folded in source package control files.  For
 	  consistency and in case of future changes to
 	  <prgn>dpkg</prgn> it is recommended that a single space be
 	  used after a version relationship and before a version
 	  number; it is also conventional to put a single space after
 	  each comma, on either side of each vertical bar, and before
-	  each open parenthesis.  When wrapping a relationship field, it
+	  each open parenthesis.  When opening a continuation line in a relationship field, it
 	  is conventional to do so after a comma and before the space
 	  following that comma.
 	</p>
-- 
1.7.1


Reply to: