Debian Metadata Proposal -- draft rev.1.4
Ok one and all. This is my proposal for Debian Metadata, which is
part of the doc-base (and probably doc-base-dev) package(s). A
version on the web can be found at
http://va.debian.org/~aph/debian-metadata.html/ .
Marco, would you be willing to help me work on the API and thinking
about how to attach a database backend a la dhelp ?
Major changes:
* flattened the entity modeling; we only model documents, not
documents and formats
* adopt Dublin Core element set, with changes for our RFC822 format
and extensions for internationalization
* explain the elements, provide an example
* document IDs are URLs, with an implied BASE (in HTML parlance) of
file://localhost/usr/doc
* remove capability for multiple docreg files to update a single
document metadata
* stub out sections talking about autoconversion, local
configurables, and the API for dhelp etc.
Todo:
* more examples, i.e., for most interesting www.debian.org pages;
volunteers wanted
* Marcus is working on the DDH, it's looking very impressive!
* work out the install-docs hooks mechanism for backwards compat
support of dhelp and dwww
* work out the API for a better (no shadowing of data required)
dwww/dhelp etc working right out of the document store (I need
lots of help here)
* work out our storage system, which will lend itself well to a flat
file-based db list berkeley db (I need lots of help here)
* reimplement install-docs, which is easy compared to all this
analysis/spec work
--
.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>
Debian Metadata Project
-----------------------
Adam P. Harris <aph@debian.org>
The Debian-Doc List <debian-doc@lists.debian.org>
$Revision: 1.4 $
0.1 Abstract
------------
This manual contains a guide and a reference to the Debian Metadata
Project. The Project's purpose, and the purpose of this document, is
to outline a set of metadata elements, to specify an interface for
package maintainers use in order to provide metadata about resources
in their packages, to specify a unified subject catalog for
categorizing metadata, and to specify an API for developers who wish
to make use of a system's metadata. This manual is intended to serve
as sub-policy for the deployment and utilization of metadata in
Debian. Currently, it carries no actual force and is for informational
purposes only. The manual is intended for both package maintainers,
Debian document writers, and those implementing document display
systems such as dwww and dhelp.
0.2 Contents
------------
1. Introduction
1.1. Organization of this Document
1.2. Contributing to the Project
2. Local Configuration Options
2.1. Automatic Document Conversion
3. Debian Metadata Elements
3.1. Metadata Element Structure
3.2. Metadata Elements
4. docreg File Format
4.1. Design Rationale and Goals
4.2. How To Use the docreg File
4.3. docreg File Format
5. Debian Metadata API
0.3 Copyright Notice
--------------------
Copyright ©1998 Adam P. Harris; some parts ©1998 Christian Swartz.
This documentation is free software; you may redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2, or (at
your option) any later version.
However, even though you are empowered to modify this specification,
please do not do so; as a standard, it loses power if there are
alternate versions of it available. Methods for centralized management
and modification of this specification are outlined below.
This manual is free software; you may redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
This is distributed in the hope that it will be useful, but *without
any warranty*; without even the implied warranty of merchantability or
fitness for a particular purpose. See the GNU General Public License
for more details.
A copy of the GNU General Public License is available as
`/usr/doc/copyright/GPL' in the Debian GNU/Linux distribution or on
the World Wide Web at `http://www.gnu.org/copyleft/gpl.html'. You can
also obtain it by writing to the Free Software Foundation, Inc., 675
Mass Ave, Cambridge, MA 02139, USA.
-------------------------------------------------------------------------------
1. Introduction
----------------
What is metadata? Metadata is information about information. The
Debian Metadata Project is an attempt to provide a robust,
standards-based metadata set, and the facilities to collect and
display information about resources (usually, documents on a user's
machine). Collected information includes the document's title, author,
format, placement in a subject catalog, description, the language it
is in, etc.
Why should anyone care about metadata? Well, primarily, it is useful
in *resource discovery*. This is the process of finding out where to
find information. You do this every time you run man -k or apropos;
Altavista and HotBot are typical of the current technologies in
resource discovery. But *metadata* allows you to find resources in
different and better ways. You can search by title, by language, by
author; you can traverse a subject heirarchy, like a book's index.
Metadata allows a more intelligent was to organize and present the
vast amount of documentation that Debian already provides.
Debian uses as their metadata entity definition a specialized
application of the Dublin Core
(http://purl.oclc.org/metadata/dublin_core/). The Dublin Core is an
informal standard formulated by an international group of professions
in the fields of library science, and the networking and digital
library research communities.
1.1. Organization of this Document
-----------------------------------
The document is split into three main sections. The first section
contains information of interest to any Debian user, curious about the
features and capabilities of our metadata system. The second section
is of interest to package maintainers. The final section is mainly of
interest to documentation system providers or metadata display system
developers.
System administration controls provided by the Debian Metadata system
is documented in chapter 2, `Local Configuration Options '. chapter 3,
`Debian Metadata Elements ' defines the metadata elements, which are
the data fields which can be populated for a given resource.
The part of this manual is primarily of interest to Debian package
maintainers. It begins with chapter 4, `docreg File Format ', which
describes the "docreg" file, the file that the package maintainer uses
to *register* document metadata into the local document store.
Finally, in *section not yet written*, the use of install-docs is
covered. Developer tools to convert metadata to and from HTML is
discussed in *section not yet written*.
In the final part, of interest to those who are working with Debian's
metadata collection, we start in chapter 5, `Debian Metadata API '
with a discussion of the API used to access the document store. In
*section not yet written*, we discuss how developers can hook into
install-docs in case they need to capture certain metadata events or
shadow data (a deprecated practice).
1.2. Contributing to the Project
---------------------------------
Discussions about the Debian Metadata Project generally take place on
the <debian-doc@lists.debian.org> mailing list. This is an open
project; all are invited. To subscribe to this list, go to
http://www.debian.org/MailingLists/subscribe .
-------------------------------------------------------------------------------
2. Local Configuration Options
-------------------------------
Providing knobs and dials for system administrators to control their
documentation is possible once we have the data provided by the Debian
Metadata scheme. None of this functionality is present yet; however,
preliminary ideas of desirable configuration capabilities are
discussed here.
Such options relate to a few major issues. The first issue is making
decisions, based on local policy, whether or not to install the
documentation. Here's a feature list:
* don't install particular formats ever, i.e., "I don't want any
PostScript on my machine, this is a firewall"
* don't install particular languages, i.e., "I don't want any
Spanish documentation installed".
* conditionally, don't install a particular language, i.e., if
another language is available, i.e., "if a Spanish version of a
document is available already, we don't need the English version,
otherwise, we do."
2.1. Automatic Document Conversion
-----------------------------------
Another major functional group has to do with the possiblity of
auto-conversion of documentation, either on demand or at install time.
Here's a possible feature list:
* autoconvert on install based on format, i.e., "I want all SGML
files to be converted into PDF, A4 sized paper. Please retain the
SGML."
* autoconvert on demand based on formats, i.e., provide a facility
such that we could write a CGI to convert documents on demand,
say, using content negotiation or user selection.
* Even though policy says don't gzip HTML files, I've setup my
browsers to handle it, so go ahead and gzip them.
Autoconversion will eventually require its own separate document. It
is a very complex issue. Packages being installed should be capable of
registering their conversion capabilities with the system. For
example, sdc can translate a particular set of DTDs into HTML, ASCII,
nroff, or PostScript. gs can translate PostScript to PDF. The
`docbook-stylesheets' package can translate documents written in the
Docbook DTD to HTML, PostScript, or RTF. When conversions are done,
the system should make new metadata for them and register this new
metadata, probably with special fields to allow an audit-trail of the
conversion actions.
But document formatting is a very complex issue. It can have
dependancies on many different things in the system, such as fonts,
obscure configuration settings, etc. For instance, if I change my
papersize in `/etc/papersize/', do I need to recreate any documents
which depended on that setting? Additionally, we might need to allow a
facility for the document manager to associate processing instructions
for files.
Finally, the logistics of package maintenace make autoconversion
complex. Do we remove converted documents when the package from whence
its source came is removed? purged?
-------------------------------------------------------------------------------
3. Debian Metadata Elements
----------------------------
This chapter contains a description the the Debian metadata, which is
used to describe human-legible texts in a consistent and coherent way.
The Debian Metadata Project uses the Dublin Core standard metadata
fields. Below, we reiterate these fields and describe their meaning
and use.
3.1. Metadata Element Structure
--------------------------------
The Dublin Core Element semantics can be found at
http://purl.oclc.org/metadata/dublin_core_elements. In some cases, we
have restricted the syntax for the benefit of simplicity of
implementation. These restrictions noted.
Metadata elements consist of two required parts and an optional part.
The required parts of a complete element are its *label* and its
*content*. The optional part are its *qualifiers*. Labels are the name
or label of an element, and are selected from the domain of possible
lables listed below. Contents are the value for the element.
Qualifiers either describe or restrict the content of an element, for
instance, a qualifier is used to stipulate
In standard Dublin Core, each element is repeatable. However, we have
restricted the repeatability of certain fields for simplicity of
implementation; these restrictions may be lifted at a later date.
Generally, if an element's contents are not free-text (i.e., if it
doesn't make sense to talk of the *language* of the contents), we do
not allow it to iterate.
Elements may occur in any order. Order is never significant. Case is
never significant in labels or qualifiers; case is preserved in the
content.
For the precise syntax of how the elements are encoded in docreg
files, see chapter 4, `docreg File Format '.
The Debian flavor of Dublin Core also places restrictions on qualifier
use. For instance, as the subject scheme, we have no use for Dewey
Decimal schemes; instead, we require our own scheme. Unknown or
unacceptable schemes are ignored as if they never appeared.
3.1.1. The LANG Qualifier
--------------------------
The `LANG' qualifier indicates the language of the content of the
element itself. For instances, if a `Description' element has a LANG
qualifier value of <de>, the description itself is in German.
Language identifiers for the `LANG' qualifier are in RFC 1766, *Tags
for the Identification of Languages*,
http://ds.internic.net/rfc/rfc1766.txt. Examples include `en', `de',
`es', `fi', `fr', `ja', `th', and `zh'. Currently, only the two-letter
schemes are recognized, i.e., not `en-us'.
The default is `en'. This default does not indicated that English is
better; it simply is this value for the benefit of maintainers, since
the great majority of documentation is in English.
3.1.2. The SCHEME Qualifier
----------------------------
The `SCHEME' qualifier indicates what notational scheme the content is
encoded in. This qualifier is usually not available to the maintainer
for manipulation, since there is only one reasonable scheme for an
element in the Debian environment.
The default scheme is generally `freetext'. Other elements have a
scheme of `URL' or `ISO.636'.
3.1.2.1. Concerning the URL Scheme
-----------------------------------
In some cases, metadata refers to actual documents, which indicate the
*resource* corresponding to the element set, or may be a different
resource, which might have its own metadata elements associated with
it.
The means by which metadata indicates a resource is by URL. Most
everybody is already familiar with what a URL is; complete information
can be found in RFC 1738.
There is one additional wrinkle. Often, an element needs to indicate a
relationship with a local file (which may or may not be installed). In
such cases, for the ease of the maintainer, a relative URL may be
used. The implied basepath in such cases is
`file://localhost/usr/doc/'.
3.1.2.2. Concerning the Debian Type Scheme
-------------------------------------------
The Debian Type scheme is the set of allowable values (that is, the
*domain*) for the content of the `Type' element. This scheme is very
volatile and experimental, but we encourage you to use it and to offer
comments on its use. Of course, the `Type' element is completely
optional.
Resource types, it must be remembered, are orthogonal to both the
`Subject' and `Format' elements. So, the `Type' element can be thought
of as the generic *class* of resources, which is not related to its
particular media or subject matter.
Currently, the domain of allowable values is
* howto
* faq
* manual *(i.e., manual for software)*
* reference
* specification
* tutorial
* figure
* mailbox
* homepage
* glossary
* collection
* package *(for future use, representing Debian package metadata)*
3.1.2.3. Concerning the DDH Scheme
-----------------------------------
3.2. Metadata Elements
-----------------------
In Debian Dublin Core, certain elements are required, some are
optional, and some are ignored as insignificant. As a rule, the adage,
"be liberal in what you accept and conservative in what you emit"
applies to the system.
The following is a summary of the elements, which are described in
detail below:
* *Required elements*
* Identifier
* Title
* Subject
* Format
* *Optional elements*
* Description
* Language
* Creator
* Contributor
* Publisher
* Date
* Source
* Relation.IsFormatOf
* Relation.IsBasedOn
* Type
* Rights
* Ignored Elements
* Coverage
3.2.1. Required Elements
-------------------------
These elements are required. Lacking these elements constitutes an
error which will cause install-docs to reject the entire entry.
Identifier
A URL used to uniquely identify the resource. If the URL is a
relative URL, it is relative to the location
`file://localhost/usr/doc/', as described in subsubsection
3.1.2.1, `Concerning the URL Scheme '.
SCHEME
URL
LANG
ignored
repeatable?
no
Title
The title for the document, usually only a single line. If the
document does not have a title, formulate the title as if it is
the short selectable string of an HREF.
SCHEME
freetext
LANG
can set
repeatable?
no
Subject
Where this document is situated in the Subject Catalog. For
Debian, this Subject Catalog is the *Debian Document Hierarchy*,
which see.
SCHEME
Debian, indicating the Debian Document Hierarchy
LANG
ignored
repeatable?
yes
Format
The format of the document, indicated as a MIME type, for
example, `text/html'.
SCHEME
MIME
LANG
cannot set
repeatable?
no
3.2.2. Optional Elements
-------------------------
These elements are optional. The content of these elements are
captured by the system and should be displayed to the user by some
means.
Description
A description or abstract for the document. This gives the user
more information about the document, so that they are able to
decide whether it contains the information they are looking for.
SCHEME
freetext
LANG
can set
repeatable?
no
Language
The language of the intellectual content of the resource. If this
element is not present, it defaults to `en', for English.
SCHEME
RFC 1766
LANG
cannot set
repeatable?
no
example
`de'
Creator
The person or organization primarily responsible for creating the
intellectual content of the resource. For example, authors in the
case of written documents, artists, photographers, or
illustrators in the case of visual resources.
SCHEME
freetext, or email
LANG
cannot set
repeatable?
yes
example
`A. P. Harris <aph@debian.org>'
Contributor
Contributor to a document. For our purposes, this should only be
used to indicate the translator of a document. Multiple authors
for a document should simply use multiple Creator elements.
SCHEME
freetext, or email
LANG
can set
repeatable?
yes
example
`A. P. Harris <aph@debian.org>'
Publisher
The element responsible for making the resource available in its
present form, such as a publishing house, a university
department, or a corporate entity.
SCHEME
freetext, or email
LANG
ignored
repeatable?
yes
Date
A date associated with the resource. For our purposes, this
should indicated the last modification date of a resource.
SCHEME
restricted ISO 8601
LANG
ignored
repeatable?
no
example
`1997-11-05' or `1998'
Source
Upstream location where a document originated. Generally this is
a web site maintained by the document author, or the URL for a
canonical upstream archive such as Sunsite.
SCHEME
URL
LANG
ignored
repeatable?
yes
example
`http://sunsite.unc.edu/mdw/HOWTOs/FOOBAR.html'
Relation.IsFormatOf, Relation.IsBasedOn
Indicates a relationship to another resource. The content of this
field is the URL to the resource related to, as in the Identifier
element. (If the URL is relative, it is relative to the location
`file://localhost/usr/doc'.) Relation.IsFormatOf indicates a
format of the resource indicated in the content of this element,
i.e., an HTML or ASCII version of an SGML file.
Relation.IsBasedOn is used to indicated translations based on
another document. Note that it is *not* an error for the
content's URL to not exist on the users filesystem.
SCHEME
URL
LANG
ignored
repeatable?
no
example
`FAQ/Linux-FAQ'
Type
The category of the resource, describing what sort of resource it
is. This is orthogonal to both subject and format. The values of
this field are constrained to the set of allowable values.
SCHEME
Debian type, see subsubsection 3.1.2.2, `Concerning the
Debian Type Scheme '.
LANG
cannot set
repeatable?
yes
example
`howto'
Rights
An identifier that links to a rights management statement, such
as acceptable terms of use, the GPL, etc.
SCHEME
URL
LANG
cannot set
repeatable?
yes
example
`copyright/GPL'
3.2.3. Ignored Elements
------------------------
The following elements are ignored. They are mentioned here because
these fields are part of standard Dublin Core; they may some day
become supported.
Coverage
The spatial or temporal characteristics of the intellectual
content of the resource.
-------------------------------------------------------------------------------
4. docreg File Format
----------------------
The docreg file is the medium for the transmission of document
metadata information to the local Document Store. As such, it is the
package maintainer's way of attaching metadata to documents included
in their package, and ensuring that metadata is available to the user
who installed the package.
The docreg file is used in combination with install-docs as the
complete interface that a document-provide package needs to worry
about. End users need not be aware of docreg files at all; they are
not end-user-editable.
4.1. Design Rationale and Goals
--------------------------------
The docreg file is meant to be an easy, familiar mechanism for busy
package maintainers. It uses a syntax similar to `control' files
already used by package maintainers, namely an RFC-822 complaint
syntax.
The docreg file format has the following design goals:
* Adherence to recognized metadata standards, namely, Dublin Core
(see http://purl.oclc.org/metadata/dublin_core/).
* Easy to use for package maintainers; uses a very simple data
model.
* Language-independant syntax, allowing for indication of the
language of the document, as well as indication of the language
of the metadata.
* Allow for flexiblity and inter-relationships between documents
without imposing any additional dependancy complexity.
4.2. How To Use the docreg File
--------------------------------
The docreg file itself is the file used by package maintainers to
register documents into the Debian Document Registry. The doc-base
packaging system (specifically the install-docs program) is
responsible for processing the docreg file and adding the document's
meta-information contained in the docreg file to the system's local
Document Store.
Document metadata is all the information contained in the Debian
Document Registry for a file. The composition of this metadata is
directly related to the docreg file, since the docreg file is the sole
transmitter of document metadata into the registry (via install-docs).
While it is easy to confuse the difference between the document
metadata and the docreg file, there is a distinction.
A docreg file may contain *metadata* for any number of distinct
*documents*. A document is defined by a URL (generally a file in the
`/usr/doc' area on the local machine). The metadata attached to this
document describes this and only this document. Therefore, there is a
one-to-one relationship between documents and metadata. To use a
common paradigm, documents are the books in a library, metadata are
the card catalog cards, and docreg files are simply bundles of one or
more card catalog cards.
The URL of a document is its unique identifier. It is an error for one
URL to have multiple metadata. In so far as a file is a URL, it
follows that each document can have only one metadata attached to it.
In many cases, a file is actually comprised of a number of files (or
URLs), where the main file is simply the top-level file. This nuance
of the actual file-system level instantiation of a document is not
modeled by the system, nor does it need to be.
Documents relate to one another in various ways. For instance, a
document might be a specially formatted version of another source
document (the "IsFormatOf" relation). A document might be a
translation of another document into a new language ("IsBasedOn"), or,
more obscurely, a version of the work, perhaps interesting for
historical purposes ("IsVersionOf"). Relationships between documents
do not require actual package dependancies, however.
4.2.1. Where To Put the docreg File
------------------------------------
docreg files are under package maintainer control; they are never
altered by the Debian documentation system as a whole. The files
should be installed and removed by the package itself using the
standard means. The file may be autogenerated at the package
maintainers discretion, however, it may not be altered after
install-docs has run.
docreg files must be placed in the `/usr/share/doc-base/docreg/'
subdirectory. By convention, this file should be named the same as the
package, i.e., `/usr/share/doc-base/docreg/debian-policy'. This is not
enforced; however, these file names must be globally unique across all
packages.
At the convenience of the package maintainer, it is certainly
allowable to use more than one docreg file per package. In this case,
convention states that the files should be prefixed with the name of
the package, i.e,. `/usr/share/doc-base/docreg/debian-policy-ascii'.
4.2.2. Brief Comment on the Document Store
-------------------------------------------
The Document Store, in `/var/state/doc-base/docstore', is a file
containing the collected information about all documents currently on
the system. This file is in the same format as the docreg files.
The Document Store file may be processed by the doc-base system into a
more optimized system as well, such as Berkeley database file. To be
determined.
4.3. docreg File Format
------------------------
The format of the docreg file borrows from the Debian control file
format, which borrows from RFC 822.
First, some terminology. docreg files are composed of one or more
metadata sets, where each set describes a single document (URL,
actaully a file on disk). Metadata sets are composed of metadata
elements, or fields, which includes required elements, optional
elements, and ignored elements. These elements are treated in depth in
chapter 3, `Debian Metadata Elements '.
Elements are lines composed of a label (that is, the name of the
element), a colon (`:'), one or more optional qualifiers in
parentheses, and finally the contents of the element. Sets are
composed of elements separated by an empty line, or the top or bottom
of the file. These controlled vocabularies are specificed by the built
in implied `SCHEME', which is described in subsection 3.1.2, `The
SCHEME Qualifier '.
Any element's contents may continue into multiple lines, but
continuation lines must be indented from the left margin; this is
called "folding". In some cases the contents are restricted to a
controlled vocabulary, such as a URL, or a single value from a domain
of possible values.
An augmented BNF description of the file format, probably only of
interest to implementors, can be found below in subsection 4.3.4,
`Augmented BNF Description '.
4.3.1. Example Files
---------------------
Identifier: debian-metadata/debian-metadata.sgml
Title: Debian Metadata Manual
Title: (LANG=de) Debian Metadaten Handbuch
Subject: debian/policy
Format: text/sgml
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Christian Swartz <schwarz@monet.m.isar.de>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification
Identifier: debian-metadata/debian-metadata.html/index.html
Title: Debian Metadata Manual
Title: (LANG=de) Debian Metadaten Handbuch
Subject: debian/policy
Format: text/html
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Christian Swartz <schwarz@monet.m.isar.de>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification
Identifier: debian-metadata/debian-metadata.text
Title: Debian Metadata Manual
Title: (LANG=de) Debian Metadaten Handbuch
Subject: debian/policy
Format: text/plain
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Christian Swartz <schwarz@monet.m.isar.de>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification
4.3.2. Field Sizes
-------------------
Field size limits are imposed on fields in order to facilitate a
straight-forward database driven interface and hopefully help
security. These size limits are checked at install time
Identifier 80
Title 80
Subject 80 (multiple elements combined)
Format 40
Description 1024
Language 2
Creator 200 (multiple elements combined)
Contributor 200 (multiple elements combined)
Publisher 200 (multiple elements combined)
Date 80
Source 200 (multiple elements combined)
Relation.IsFormatOf 80
Relation.IsBasedOn 80
Type 80 (multiple elements combined)
Rights 80 (multiple elements combined)
Coverage 80
4.3.3. Weaknesses of the File Format
-------------------------------------
One weakness of the format is that there is a lot of repetetive
encoding of identical information. The `Description' field
4.3.4. Augmented BNF Description
---------------------------------
The following description uses augmented BNF as defined in RFC 822.
This standard meta-format lets us define the docreg format without
ambiguity. See also RFC 2068 for a description and example of
augmented BNF.
4.3.4.1. Basic Rules
---------------------
The following rules define fundamental building blocks used in the
rest of this specification.
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
ISOCHAR = <any ISO-8859-1 character>
CTL = <any ASCII control ; ( 0- 37, 0.- 31.)
character and DEL> ; ( 177, 127.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
SPACE = <ASCII SP, space> ; ( 40, 32.)
HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.)
LWSP-char = SPACE / HTAB ; semantics = SPACE
linear-white-space = 1*([LF] LWSP-char) ; semantics = SPACE
; LF => folding
specials = "(" / ")" / "<" / ">" / "@"
/ "," / ";" / ":" / "\" / <">
/ "." / "[" / "]" "="
atom =1*<any CHAR except specials, SPACE and CTLs>
; control fields
ctext = *<any ISOCHAR excluding "(", ; field contents
")", "\" & CR, & including
linear-white-space>
end-of-rec = < 2*LF or end of file >
4.3.4.2. Field Definitions
---------------------------
Field semantics are the same as defined as "Header Field Definitions"
in RFC 822 Section 3.1, with the exception that rather than CRLF we
use the standard Unix line separator, LF. Long header fields are
likewise supported, as specified in RFC 822 Section 3.1.1.
The following is the BNF composition of docreg fields syntax.
field = field-name ":" [*field-qualifier]
\ field-body LF LF
field-name = *atom
field-body = field-body-contents
[LF LWSP-char field-body] ; folding
field-body-contents = *ctext
field-qualifier = "(" *atom "=" *atom ")"
`field-names' are not case-sensitive. Both `field-names' and
`field-qualifier' are further constrained to the set of allowable
values. Furthermore, in some cases, `field-contents' are constrained
based on their qualifiers. For instance, a qualifier of `SCHEME=URL'
would indicate that the contents should be a valid URL.
For clarifications on the way that fields are composed, refer to RFC
822.[1]
[1] Please email me with any corrections or clarifications.
4.3.4.3. docreg Specification
------------------------------
docreg files contain any number of metadata sets.
docreg-file = *metadata-set
metadata-set = *fields end-of-rec
-------------------------------------------------------------------------------
5. Debian Metadata API
-----------------------
A simple C API, probably with Perl and Python wrappers, will be
provided for the benefit of programmers wishing to make use of the
local document store.
-------------------------------------------------------------------------------
Debian Metadata Project
Adam P. Harris <aph@debian.org>, The Debian-Doc List
<debian-doc@lists.debian.org> - $Revision: 1.4 $
--
To UNSUBSCRIBE, email to debian-doc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: