[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#593909: debian-policy: Clarifications about the syntax of Debian control files.



Le Sat, Aug 28, 2010 at 01:34:35PM -0700, Russ Allbery a écrit :
> Charles Plessy <plessy@debian.org> writes:
> > Le Fri, Aug 27, 2010 at 10:24:57AM -0700, Russ Allbery a écrit :
> 
> >>     In fields where the value may not span multiple lines, the amount
> >>     of whitespace in the field body is not significant.  Any amount of
> >>     whitespace is equivalent to a single space.  Whitespace must not
> >>     appear inside names (of packages, architectures, files, or anything
> >>     else) or version numbers, or between the characters of
> >>     multi-character versoin relationships.
> 
> > I still have difficulties to understand the meaning of this paragraph,
> > and to what fields it applies. Is it just specifiying that the parser,
> > in the case of fields that allow continuation lines, can be either
> > intructed to replace all white spaces and newlines by single spaces, or
> > to leave the value as it is, including the new lines?
> 
> No, it's really trying to say that the amount of whitespace isn't
> significant.  I'm not sure how else to explain it.  That's one of those
> precise terms of art for which there isn't really an acceptable synonym,
> at least not that I can think of.  Replacing all whitespace with a single
> space is one of the things that you can do *because* the amount of
> whitespace is not significant, but it's not an equivalent statement.

Dear Russ and everybody

since I was still confused, I comprehensively inspected the wordings in
chapter 5, to better understand what is meant by ’span multiple lines’.

First, as a sidenote, no field specifies that it may not span multiple lines. I
therefore agree with you that it is an implicit default case, and propose to
make it explicit in § 5.1 (see below).

I then looked at which field description specifies that they ‘may span multiple
lines’. These are the relationship fields (Depends etc., §7.1), the Binary
field, and the Uploaders field, but only in source package control files. The
Files and Checksums-* fields, on the other hand are described as ‘multiline
fields’. Lastly, nothing is specified for the Description and Changes fields,
perhaps because it is so obvious.

It looks like ‘may span multiple lines’ means that continuation lines are
allowed but newlines are not significant, and ‘multiline fields’ means that
continuation lines are allowed and newlines are significant. At this point, I
am not sure what is expected from the parsers: deliver the value of the fields
that ‘may span multiple lines’ with or without newlines? In any casee, I find
this similarity of terminology very confusing. I therefore propose to replace
‘may span multiple lines’ by ‘continuation lines are allowed’ and add it where
it was implicit, and add ‘continuation lines are allowed and newlines are
significant’ where needed as well.

I noted another confusing sentence on the subject, in §5.2: ‘Many fields are
permitted to span multiple lines in <file>debian/control</file> but not in any
other control file’. Actually, I found this to be true only for the Uploaders
field. I propose to replace it by ‘Continuation lines can be permitted for some
fields in <file>debian/control</file> but not in any other control file.’

Together with the separation that you suggested in your answer, the rewording
that I propose allows to dramaticly slim down—and in my opinion, clarify—the
paragraph that discusses mulitple line spanning and whitespace:

        <p>
	  Continuation lines need to be allowed for each field separately.
	  When continuation lines are allowed, whitespace including newlines
	  is not significant in the fields values, unless specified otherwise.
        </p>

        <p>
          Whitespace must not appear
          inside names (of packages, architectures, files or anything
          else) or version numbers, or between the characters of
          multi-character version relationships.
        </p>

This would need to be changed if the parser is expected to deliver the field's
value with the newlines already stripped, except for fields where whitespace is
speficied to be significant. Note that in all cases the trailing whitespace of
a continuation line is not included in field values, since they are removed as
part of the parsing of the continuation lines.

I attached a patch that tries to summarise the above and the points discussed
in other messages of this thread. It also includes a minor correction for the
Essential field, to bring it in line with the nomenclature used in this
chapter.  I also tried to replace occurences of ‘wrap’ and ‘span’ by less
ambiguous words.


Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan
>From 8e2afc25d07bee246e546801a9a0b81bb8526454 Mon Sep 17 00:00:00 2001
From: Charles Plessy <plessy@debian.org>
Date: Fri, 3 Sep 2010 10:30:55 +0900
Subject: [PATCH] Clarifications on the syntax of control fields.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

See ‘http://bugs.debian.org/593909’;.
---
 policy.sgml |   96 +++++++++++++++++++++++++++++++++--------------------------
 1 files changed, 54 insertions(+), 42 deletions(-)

diff --git a/policy.sgml b/policy.sgml
index 9037de8..666590e 100644
--- a/policy.sgml
+++ b/policy.sgml
@@ -2450,19 +2450,22 @@ endif
 	  fields<footnote>
 		The paragraphs are also sometimes referred to as stanzas.
 	  </footnote>.
-	  The paragraphs are separated by blank lines.  Some control
+	  The paragraphs are separated by empty lines.  As a special exception
+	  for backwards compatibility, parsers may accept lines consisting
+	  solely of spaces and tabs as paragraph separators. Some control
 	  files allow only one paragraph; others allow several, in
 	  which case each paragraph usually refers to a different
 	  package.  (For example, in source packages, the first
 	  paragraph refers to the source package, and later paragraphs
-	  refer to binary packages generated from the source.)
+	  refer to binary packages generated from the source.).  The
+	  ordering of the paragraphs in control files is significant.
 	</p>
 
 	<p>
 	  Each paragraph consists of a series of data fields; each
 	  field consists of the field name, followed by a colon and
 	  then the data/value associated with that field.  It ends at
-	  the end of the (logical) line.  Horizontal whitespace
+	  the end of a logical line (see below).  Horizontal whitespace
 	  (spaces and tabs) may occur immediately before or after the
 	  value and is ignored there; it is conventional to put a
 	  single space after the colon.  For example, a field might
@@ -2480,22 +2483,31 @@ Package: libc6
 	</p>
 
 	<p>
-	  Many fields' values may span several lines; in this case
-	  each continuation line must start with a space or a tab.
-	  Any trailing spaces or tabs at the end of individual
-	  lines of a field value are ignored. 
+	  Fields values may be contained in a logical line that spans
+	  several lines; these lines are called continuation lines and
+	  must start with a space or a tab. Any trailing spaces or tabs
+	  at the end of individual lines of a field value are ignored.
 	</p>
 
 	<p>
-	  In fields where it is specified that lines may not wrap,
-          only a single line of data is allowed and whitespace is not
-          significant in a field body. Whitespace must not appear
+	  Continuation lines need to be allowed for each field separately.
+	  When continuation lines are allowed, whitespace including newlines
+	  is not significant in the field values, unless specified otherwise.
+	</p>
+
+	<p>
+	  Whitespace must not appear
           inside names (of packages, architectures, files or anything
           else) or version numbers, or between the characters of
           multi-character version relationships.
 	</p>
 
 	<p>
+	  The presence and purpose of a field, and the syntax of its
+	  value may differ between types of control files.
+	</p>
+
+	<p>
 	  Field names are not case-sensitive, but it is usual to
 	  capitalize the field names using mixed case as shown below.
 	  Field values are case-sensitive unless the description of the
@@ -2503,9 +2515,17 @@ Package: libc6
 	</p>
 
 	<p>
-	  Blank lines, or lines consisting only of spaces and tabs,
-	  are not allowed within field values or between fields - that
-	  would mean a new paragraph.
+	  Paragraph separators (empty lines) and lines consisting only of
+	  spaces and tabs are not allowed within field values or between
+	  fields.  Empty lines in field values are usually escaped by
+	  representing them by a space followed by a dot.
+	</p>
+
+	<p>
+	  Lines starting with # without any preceding whitespace are comments
+	  lines that are only permitted in source package control files
+	  (<file>debian/control</file>). These comment lines are ignored, even
+	  in the middle of continuation lines. They do not end logical lines.
 	</p>
 
 	<p>
@@ -2570,7 +2590,7 @@ Package: libc6
 	  <file>.changes</file> file to accompany the upload, and by
 	  <prgn>dpkg-source</prgn> when it creates the
 	  <file>.dsc</file> source control file as part of a source
-	  archive. Many fields are permitted to span multiple lines in
+	  archive. Continuation lines can be permitted for some fields in
 	  <file>debian/control</file> but not in any other control
 	  file. These tools are responsible for removing the line
 	  breaks from such fields when using fields from
@@ -2584,16 +2604,6 @@ Package: libc6
 	  when they generate output control files.
 	  See <ref id="substvars"> for details.
 	</p>
-
-	<p>
-	  In addition to the control file syntax described <qref
-	  id="controlsyntax">above</qref>, this file may also contain
-	  comment lines starting with <tt>#</tt> without any preceding
-	  whitespace.  All such lines are ignored, even in the middle of
-	  continuation lines for a multiline field, and do not end a
-	  multiline field.
-	</p>
-
       </sect>
 
       <sect id="binarycontrolfiles">
@@ -2791,11 +2801,10 @@ Package: libc6
 	  </p>
 
 	  <p>
-	    Any parser that interprets the Uploaders field in
-	    <file>debian/control</file> must permit it to span multiple
-	    lines.  Line breaks in an Uploaders field that spans multiple
-	    lines are not significant and the semantics of the field are
-	    the same as if the line breaks had not been present.
+	    Continuation lines are allowed in the Uploaders field in
+	    <file>debian/control</file>.  Line breaks are not significant and
+	    the semantics of the field are the same as if the line breaks had
+	    not been present.
 	  </p>
  	</sect1>
 
@@ -2975,7 +2984,7 @@ Package: libc6
 	  <p>
 	    This is a boolean field which may occur only in the
 	    control file of a binary package or in a per-package fields
-	    paragraph of a main source control data file.
+	    paragraph of a source package control file.
 	  </p>
 
 	  <p>
@@ -3211,7 +3220,8 @@ Package: libc6
 	    In a source or binary control file, the <tt>Description</tt>
 	    field contains a description of the binary package, consisting
 	    of two parts, the synopsis or the short description, and the
-	    long description. The field's format is as follows:
+	    long description. Continuation lines are allowed and whitespace
+	    is significant.  The field's format is as follows:
 	  </p>
 
 	  <p>
@@ -3417,6 +3427,7 @@ Package: libc6
 	  </p>
 
 	  <p>
+	    Continuation lines are allowed and whitespace is significant.
 	    The first line of the field value (the part on the same line
 	    as <tt>Changes:</tt>) is always empty.  The content of the
 	    field is expressed as continuation lines, with each line
@@ -3460,7 +3471,7 @@ Package: libc6
 	    packages which a source package can produce, separated by
 	    commas<footnote>
 		A space after each comma is conventional.
-	    </footnote>.  It may span multiple lines.  The source package
+	    </footnote>.  Continuation lines are allowed.  The source package
 	    does not necessarily produce all of these binary packages for
 	    every architecture.  The source control file doesn't contain
 	    details of which architectures are appropriate for which of
@@ -3470,7 +3481,7 @@ Package: libc6
 	  <p>
 	    When it appears in a <file>.changes</file> file, it lists the
 	    names of the binary packages being uploaded, separated by
-	    whitespace (not commas).  It may span multiple lines.
+	    whitespace (not commas).  Continuation lines are allowed.
 	  </p>
 	</sect1>
 
@@ -3502,7 +3513,8 @@ Package: libc6
 	  </p>
 
 	  <p>
-	    In all cases, Files is a multiline field.  The first line of
+	    In all cases, continuation lines are allowed and
+	    whitespace is significant.  The first line of
 	    the field value (the part on the same line as <tt>Files:</tt>)
 	    is always empty.  The content of the field is expressed as
 	    continuation lines, one line per file.  Each line must be
@@ -3602,8 +3614,8 @@ Files:
 	  </p>
 
 	  <p>
-	    <tt>Checksums-Sha1</tt> and <tt>Checksums-Sha256</tt> are
-	    multiline fields.  The first line of the field value (the part
+	    Continuation lines are allowed and whitespace is significant.
+	    The first line of the field value (the part
 	    on the same line as <tt>Checksums-Sha1:</tt>
 	    or <tt>Checksums-Sha256:</tt>) is always empty.  The content
 	    of the field is expressed as continuation lines, one line per
@@ -4426,16 +4438,16 @@ Checksums-Sha256:
 	  Whitespace may appear at any point in the version
 	  specification subject to the rules in <ref
 	  id="controlsyntax">, and must appear where it's necessary to
-	  disambiguate; it is not otherwise significant.  All of the
-	  relationship fields may span multiple lines.	For
-	  consistency and in case of future changes to
+	  disambiguate; it is not otherwise significant.
+	  Continuation lines are allowed in all of the relationship fields.
+	  For consistency and in case of future changes to
 	  <prgn>dpkg</prgn> it is recommended that a single space be
 	  used after a version relationship and before a version
 	  number; it is also conventional to put a single space after
 	  each comma, on either side of each vertical bar, and before
-	  each open parenthesis.  When wrapping a relationship field, it
-	  is conventional to do so after a comma and before the space
-	  following that comma.
+	  each open parenthesis.  When opening a continuation line in a
+	  relationship field, it is conventional to do so after a comma
+	  and before the space following that comma.
 	</p>
 
 	<p>
-- 
1.7.1


Reply to: