[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Why YAML is not a good choice for Debian control files.



Le Fri, Jul 31, 2009 at 10:01:36AM -0400, Adrian Perez a écrit :

> There's any plan of supporting another format - without breaking
> compatibility, I mean supporting - besides the RFC one?
> I think YAML would be a good one.

Hello Adrian,

I thought about YAML for machine-readable license summaries and came the
conclusion that it is not suitable. I think that it is also true for Debian
control files for the following reasons:

The “pseudo-RFC” format that Debian uses is organised in paragraphs, also
called ‘stanzas’, and often the first of them has a special role. YAML on the
other hand has concepts of scalars, sequences and mappings (in Perl, they would
be called scalars, arrays and hashes). First of all, if we want the first
paragraph of a Debian control file to have a special role, then the YAML must
be organised as a sequence of mappings. Here is YAML's example:

 Example 2.4.  Sequence of Mappings
 (players’ statistics)
 
 -
   name: Mark McGwire
   hr:   65
   avg:  0.278
 -
   name: Sammy Sosa
   hr:   63
   avg:  0.288

(http://www.yaml.org/spec/1.2/spec.html#id2559116)

In a Debian control file, it would reduce readability with no benefit.

Second, the “pseudo-RFC” format delegates the management of folding to the
Debian Policy, while in YAML it has to be part of the markup: “|” and “>” are
used to denote when line breaks are significant or not:

 name: Mark McGwire
 accomplishment: >
   Mark set a major league
   home run record in 1998.
 stats: |
   65 Home Runs
   0.278 Batting Average

(http://www.yaml.org/spec/1.2/spec.html#id2559996)

So basically, switching Debian control files to YAML would mean addign “-”, “|”
and “>” signs in precise locations, each of them being one opportunity for a
parsing error.

Here is a simple example based on a debian/control file for the seaview package:

Source: seaview
Section: non-free/science
Priority: optional
Maintainer: Debian-Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
DM-Upload-Allowed: yes
Uploaders: Charles Plessy <plessy@debian.org>
Build-Depends: debhelper ( >= 7  ), libfltk1.1-dev, libjpeg62-dev, libpng12-dev, libxft-dev,
 libxext-dev,  zlib1g-dev
Standards-Version: 3.8.1
Vcs-Browser: http://svn.debian.org/wsvn/debian-med/trunk/packages/seaview/trunk/?rev=0&sc=0
Vcs-Svn: svn://svn.debian.org/svn/debian-med/trunk/packages/seaview/trunk/
Homepage: http://pbil.univ-lyon1.fr/software/seaview.html

Package: seaview
Architecture: any
Depends: ${shlibs:Depends}, ${misc:Depends}
Recommends: clustalw, muscle, phyml
Description: Multiplatform interface for sequence alignment and phylogeny
 SeaView reads and writes various file formats (NEXUS, MSF, CLUSTAL, FASTA,
 PHYLIP, MASE, Newick) of DNA and protein sequences and of phylogenetic trees.
 Alignments can be manually edited. It drives the programs Muscle or Clustal W
 for multiple sequence alignment, and also allows to use any external alignment
 algorithm able to read and write FASTA-formatted files.
 .
 It computes phylogenetic trees by parsimony using PHYLIP's dnapars/protpars
 algorithm, by distance with NJ or BioNJ algorithms on a variety of evolutionary
 distances, or by maximum likelihood using the program PhyML 3.0. SeaView draws
 phylogenetic trees on screen or PostScript files, and allows to download
 sequences from EMBL/GenBank/UniProt using the Internet.

Translated in YAML, it would be:

-
 Source: seaview
 Section: non-free/science
 Priority: optional
 Maintainer: Debian-Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
 DM-Upload-Allowed: yes
 Uploaders: Charles Plessy <plessy@debian.org>
 Build-Depends: >
  debhelper ( >= 7  ), libfltk1.1-dev, libjpeg62-dev, libpng12-dev, libxft-dev, libxext-dev,
  zlib1g-dev
 Standards-Version: 3.8.1
 Vcs-Browser: http://svn.debian.org/wsvn/debian-med/trunk/packages/seaview/trunk/?rev=0&sc=0
 Vcs-Svn: svn://svn.debian.org/svn/debian-med/trunk/packages/seaview/trunk/
 Homepage: http://pbil.univ-lyon1.fr/software/seaview.html

- 
 Package: seaview
 Architecture: any
 Depends: ${shlibs:Depends}, ${misc:Depends}
 Recommends: clustalw, muscle, phyml
 Description: |
  Multiplatform interface for sequence alignment and phylogeny
  SeaView reads and writes various file formats (NEXUS, MSF, CLUSTAL, FASTA,
  PHYLIP, MASE, Newick) of DNA and protein sequences and of phylogenetic trees.
  Alignments can be manually edited. It drives the programs Muscle or Clustal W
  for multiple sequence alignment, and also allows to use any external alignment
  algorithm able to read and write FASTA-formatted files.

  It computes phylogenetic trees by parsimony using PHYLIP's dnapars/protpars
  algorithm, by distance with NJ or BioNJ algorithms on a variety of evolutionary
  distances, or by maximum likelihood using the program PhyML 3.0. SeaView draws
  phylogenetic trees on screen or PostScript files, and allows to download
  sequences from EMBL/GenBank/UniProt using the Internet. 

Alternatively, the Description field could indicate with the markup that the
first line is the short description. Either:

 Description:
  - Multiplatform interface for sequence alignment and phylogeny
  - |
   SeaView reads and writes various file formats (NEXUS, MSF, CLUSTAL, FASTA,
   PHYLIP, MASE, Newick) of DNA and protein sequences and of phylogenetic trees.
   Alignments can be manually edited. It drives the programs Muscle or Clustal W
   for multiple sequence alignment, and also allows to use any external alignment
   algorithm able to read and write FASTA-formatted files.
 
   It computes phylogenetic trees by parsimony using PHYLIP's dnapars/protpars
   algorithm, by distance with NJ or BioNJ algorithms on a variety of evolutionary
   distances, or by maximum likelihood using the program PhyML 3.0. SeaView draws
   phylogenetic trees on screen or PostScript files, and allows to download
   sequences from EMBL/GenBank/UniProt using the Internet.

or:

 Description:
  Short: Multiplatform interface for sequence alignment and phylogeny
  Long: |
   SeaView reads and writes various file formats (NEXUS, MSF, CLUSTAL, FASTA,
   PHYLIP, MASE, Newick) of DNA and protein sequences and of phylogenetic trees.
   Alignments can be manually edited. It drives the programs Muscle or Clustal W
   for multiple sequence alignment, and also allows to use any external alignment
   algorithm able to read and write FASTA-formatted files.

   It computes phylogenetic trees by parsimony using PHYLIP's dnapars/protpars
   algorithm, by distance with NJ or BioNJ algorithms on a variety of evolutionary
   distances, or by maximum likelihood using the program PhyML 3.0. SeaView draws
   phylogenetic trees on screen or PostScript files, and allows to download
   sequences from EMBL/GenBank/UniProt using the Internet.

As you see, in terms of human readability and writability, YAML does not bring
advantages over the current format.

I like YAML a lot, so if I overlooked something that would make it more
suitable, please let me/us know !

Have a nice week-end,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


Reply to: