[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DEP-5: general file syntax

On Sat, Aug 21 2010, Russ Allbery wrote:

> Ben Finney <ben+debian@benfinney.id.au> writes:
>> Lars Wirzenius <liw@liw.fi> writes:
>>> * Have one copyright statement per Copyright field, and have multiple
>>> instances of the field.
>> This is my preference, and what I've been doing in my packages.
> Unfortunately, this creates real challenges for parsers.  I've written a
> few RFC 5322 parsers, particularly for Usenet, and allowing repetition of
> headers always causes headaches in representation.  You end up having to
> add another layer of data structure, with corresponding changes to
> everything that consumes information from the parser, if you don't want to
> throw away information.  It's also a divergence from the Debian control
> file format, which allows only one instance of a field per stanza,
> probably for much the same reason.

        If I recall correctly, 2822 allows for header field folding:
--8<---------------cut here---------------start------------->8---
2.2. Header Fields

   Header fields are lines composed of a field name, followed by a colon
   (":"), followed by a field body, and terminated by CRLF.  A field
   name MUST be composed of printable US-ASCII characters (i.e.,
   characters that have values between 33 and 126, inclusive), except
   colon.  A field body may be composed of any US-ASCII characters,
   except for CR and LF.  However, a field body may contain CRLF when
   used in header "folding" and  "unfolding" as described in section
   2.2.3.  All field bodies MUST conform to the syntax described in
   sections 3 and 4 of this standard.
2.2.3. Long Header Fields

   Each header field is logically a single line of characters comprising
   the field name, the colon, and the field body.  For convenience
   however, and to deal with the 998/78 character limitations per line,
   the field body portion of a header field can be split into a multiple
   line representation; this is called "folding".  The general rule is
   that wherever this standard allows for folding white space (not
   simply WSP characters), a CRLF may be inserted before any WSP.  For
   example, the header field:

           Subject: This is a test

   can be represented as:

           Subject: This
            is a test

   Note: Though structured field bodies are defined in such a way that
   folding can take place between many of the lexical tokens (and even
   within some of the lexical tokens), folding SHOULD be limited to
   placing the CRLF at higher-level syntactic breaks.  For instance, if
   a field body is defined as comma-separated values, it is recommended
   that folding occur after the comma separating the structured items in
   preference to other places where the field could be folded, even if
   it is allowed elsewhere.

   The process of moving from this folded multiple-line representation
   of a header field to its single line representation is called
   "unfolding". Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
--8<---------------cut here---------------end--------------->8---

        Can't we just "fold" long copyright header fields similarly?


Houston, Tranquillity Base here.  The Eagle has landed. Neil Armstrong
Manoj Srivastava <srivasta@acm.org> <http://www.golden-gryphon.com/>  
4096R/C5779A1C E37E 5EC5 2A01 DA25 AD20  05B6 CF48 9438 C577 9A1C

Reply to: