[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Debian Metadata Proposal -- draft rev.1.4



Ok one and all.  This is my proposal for Debian Metadata, which is
part of the doc-base (and probably doc-base-dev) package(s).  A
version on the web can be found at
http://va.debian.org/~aph/debian-metadata.html/ .

Marco, would you be willing to help me work on the API and thinking
about how to attach a database backend a la dhelp ?

Major changes:

  * flattened the entity modeling; we only model documents, not
    documents and formats
  * adopt Dublin Core element set, with changes for our RFC822 format
    and extensions for internationalization
  * explain the elements, provide an example
  * document IDs are URLs, with an implied BASE (in HTML parlance) of
    file://localhost/usr/doc
  * remove capability for multiple docreg files to update a single
    document metadata
  * stub out sections talking about autoconversion, local
    configurables, and the API for dhelp etc.

Todo:

  * more examples, i.e., for most interesting www.debian.org pages;
    volunteers wanted
  * Marcus is working on the DDH, it's looking very impressive!
  * work out the install-docs hooks mechanism for backwards compat
    support of dhelp and dwww
  * work out the API for a better (no shadowing of data required)
    dwww/dhelp etc working right out of the document store (I need
    lots of help here)
  * work out our storage system, which will lend itself well to a flat
    file-based db list berkeley db (I need lots of help here)
  * reimplement install-docs, which is easy compared to all this
    analysis/spec work

-- 
.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>


                          Debian Metadata Project
                          -----------------------
                      Adam P. Harris <aph@debian.org>
             The Debian-Doc List <debian-doc@lists.debian.org>
                             $Revision: 1.4 $

0.1 Abstract
------------

     This manual contains a guide and a reference to the Debian Metadata
     Project. The Project's purpose, and the purpose of this document, is
     to outline a set of metadata elements, to specify an interface for
     package maintainers use in order to provide metadata about resources
     in their packages, to specify a unified subject catalog for
     categorizing metadata, and to specify an API for developers who wish
     to make use of a system's metadata. This manual is intended to serve
     as sub-policy for the deployment and utilization of metadata in
     Debian. Currently, it carries no actual force and is for informational
     purposes only. The manual is intended for both package maintainers,
     Debian document writers, and those implementing document display
     systems such as dwww and dhelp. 

0.2 Contents
------------

     1.        Introduction 
     1.1.      Organization of this Document 
     1.2.      Contributing to the Project 

     2.        Local Configuration Options 
     2.1.      Automatic Document Conversion 

     3.        Debian Metadata Elements 
     3.1.      Metadata Element Structure 
     3.2.      Metadata Elements 

     4.        docreg File Format 
     4.1.      Design Rationale and Goals 
     4.2.      How To Use the docreg File 
     4.3.      docreg File Format 

     5.        Debian Metadata API 

0.3 Copyright Notice
--------------------

     Copyright ©1998 Adam P. Harris; some parts ©1998 Christian Swartz. 

     This documentation is free software; you may redistribute it and/or
     modify it under the terms of the GNU General Public License as
     published by the Free Software Foundation; either version 2, or (at
     your option) any later version. 

     However, even though you are empowered to modify this specification,
     please do not do so; as a standard, it loses power if there are
     alternate versions of it available. Methods for centralized management
     and modification of this specification are outlined below. 

     This manual is free software; you may redistribute it and/or modify it
     under the terms of the GNU General Public License as published by the
     Free Software Foundation; either version 2, or (at your option) any
     later version. 

     This is distributed in the hope that it will be useful, but *without
     any warranty*; without even the implied warranty of merchantability or
     fitness for a particular purpose. See the GNU General Public License
     for more details. 

     A copy of the GNU General Public License is available as
     `/usr/doc/copyright/GPL' in the Debian GNU/Linux distribution or on
     the World Wide Web at `http://www.gnu.org/copyleft/gpl.html'. You can
     also obtain it by writing to the Free Software Foundation, Inc., 675
     Mass Ave, Cambridge, MA 02139, USA. 


-------------------------------------------------------------------------------


1. Introduction 
----------------

     What is metadata? Metadata is information about information. The
     Debian Metadata Project is an attempt to provide a robust,
     standards-based metadata set, and the facilities to collect and
     display information about resources (usually, documents on a user's
     machine). Collected information includes the document's title, author,
     format, placement in a subject catalog, description, the language it
     is in, etc. 

     Why should anyone care about metadata? Well, primarily, it is useful
     in *resource discovery*. This is the process of finding out where to
     find information. You do this every time you run man -k or apropos;
     Altavista and HotBot are typical of the current technologies in
     resource discovery. But *metadata* allows you to find resources in
     different and better ways. You can search by title, by language, by
     author; you can traverse a subject heirarchy, like a book's index.
     Metadata allows a more intelligent was to organize and present the
     vast amount of documentation that Debian already provides. 

     Debian uses as their metadata entity definition a specialized
     application of the Dublin Core
     (http://purl.oclc.org/metadata/dublin_core/). The Dublin Core is an
     informal standard formulated by an international group of professions
     in the fields of library science, and the networking and digital
     library research communities. 


1.1. Organization of this Document 
-----------------------------------

     The document is split into three main sections. The first section
     contains information of interest to any Debian user, curious about the
     features and capabilities of our metadata system. The second section
     is of interest to package maintainers. The final section is mainly of
     interest to documentation system providers or metadata display system
     developers. 

     System administration controls provided by the Debian Metadata system
     is documented in chapter 2, `Local Configuration Options '. chapter 3,
     `Debian Metadata Elements ' defines the metadata elements, which are
     the data fields which can be populated for a given resource. 

     The part of this manual is primarily of interest to Debian package
     maintainers. It begins with chapter 4, `docreg File Format ', which
     describes the "docreg" file, the file that the package maintainer uses
     to *register* document metadata into the local document store.
     Finally, in *section not yet written*, the use of install-docs is
     covered. Developer tools to convert metadata to and from HTML is
     discussed in *section not yet written*. 

     In the final part, of interest to those who are working with Debian's
     metadata collection, we start in chapter 5, `Debian Metadata API '
     with a discussion of the API used to access the document store. In
     *section not yet written*, we discuss how developers can hook into
     install-docs in case they need to capture certain metadata events or
     shadow data (a deprecated practice). 


1.2. Contributing to the Project 
---------------------------------

     Discussions about the Debian Metadata Project generally take place on
     the <debian-doc@lists.debian.org> mailing list. This is an open
     project; all are invited. To subscribe to this list, go to
     http://www.debian.org/MailingLists/subscribe . 


-------------------------------------------------------------------------------


2. Local Configuration Options 
-------------------------------

     Providing knobs and dials for system administrators to control their
     documentation is possible once we have the data provided by the Debian
     Metadata scheme. None of this functionality is present yet; however,
     preliminary ideas of desirable configuration capabilities are
     discussed here. 

     Such options relate to a few major issues. The first issue is making
     decisions, based on local policy, whether or not to install the
     documentation. Here's a feature list: 

        * don't install particular formats ever, i.e., "I don't want any
          PostScript on my machine, this is a firewall" 

        * don't install particular languages, i.e., "I don't want any
          Spanish documentation installed". 

        * conditionally, don't install a particular language, i.e., if
          another language is available, i.e., "if a Spanish version of a
          document is available already, we don't need the English version,
          otherwise, we do." 



2.1. Automatic Document Conversion 
-----------------------------------

     Another major functional group has to do with the possiblity of
     auto-conversion of documentation, either on demand or at install time.
     Here's a possible feature list: 

        * autoconvert on install based on format, i.e., "I want all SGML
          files to be converted into PDF, A4 sized paper. Please retain the
          SGML." 

        * autoconvert on demand based on formats, i.e., provide a facility
          such that we could write a CGI to convert documents on demand,
          say, using content negotiation or user selection. 

        * Even though policy says don't gzip HTML files, I've setup my
          browsers to handle it, so go ahead and gzip them. 

     Autoconversion will eventually require its own separate document. It
     is a very complex issue. Packages being installed should be capable of
     registering their conversion capabilities with the system. For
     example, sdc can translate a particular set of DTDs into HTML, ASCII,
     nroff, or PostScript. gs can translate PostScript to PDF. The
     `docbook-stylesheets' package can translate documents written in the
     Docbook DTD to HTML, PostScript, or RTF. When conversions are done,
     the system should make new metadata for them and register this new
     metadata, probably with special fields to allow an audit-trail of the
     conversion actions. 

     But document formatting is a very complex issue. It can have
     dependancies on many different things in the system, such as fonts,
     obscure configuration settings, etc. For instance, if I change my
     papersize in `/etc/papersize/', do I need to recreate any documents
     which depended on that setting? Additionally, we might need to allow a
     facility for the document manager to associate processing instructions
     for files. 

     Finally, the logistics of package maintenace make autoconversion
     complex. Do we remove converted documents when the package from whence
     its source came is removed? purged? 


-------------------------------------------------------------------------------


3. Debian Metadata Elements 
----------------------------

     This chapter contains a description the the Debian metadata, which is
     used to describe human-legible texts in a consistent and coherent way.
     The Debian Metadata Project uses the Dublin Core standard metadata
     fields. Below, we reiterate these fields and describe their meaning
     and use. 


3.1. Metadata Element Structure 
--------------------------------

     The Dublin Core Element semantics can be found at
     http://purl.oclc.org/metadata/dublin_core_elements. In some cases, we
     have restricted the syntax for the benefit of simplicity of
     implementation. These restrictions noted. 

     Metadata elements consist of two required parts and an optional part.
     The required parts of a complete element are its *label* and its
     *content*. The optional part are its *qualifiers*. Labels are the name
     or label of an element, and are selected from the domain of possible
     lables listed below. Contents are the value for the element.
     Qualifiers either describe or restrict the content of an element, for
     instance, a qualifier is used to stipulate 

     In standard Dublin Core, each element is repeatable. However, we have
     restricted the repeatability of certain fields for simplicity of
     implementation; these restrictions may be lifted at a later date.
     Generally, if an element's contents are not free-text (i.e., if it
     doesn't make sense to talk of the *language* of the contents), we do
     not allow it to iterate. 

     Elements may occur in any order. Order is never significant. Case is
     never significant in labels or qualifiers; case is preserved in the
     content. 

     For the precise syntax of how the elements are encoded in docreg
     files, see chapter 4, `docreg File Format '. 

     The Debian flavor of Dublin Core also places restrictions on qualifier
     use. For instance, as the subject scheme, we have no use for Dewey
     Decimal schemes; instead, we require our own scheme. Unknown or
     unacceptable schemes are ignored as if they never appeared. 

3.1.1. The LANG Qualifier 
--------------------------

     The `LANG' qualifier indicates the language of the content of the
     element itself. For instances, if a `Description' element has a LANG
     qualifier value of <de>, the description itself is in German. 

     Language identifiers for the `LANG' qualifier are in RFC 1766, *Tags
     for the Identification of Languages*,
     http://ds.internic.net/rfc/rfc1766.txt. Examples include `en', `de',
     `es', `fi', `fr', `ja', `th', and `zh'. Currently, only the two-letter
     schemes are recognized, i.e., not `en-us'. 

     The default is `en'. This default does not indicated that English is
     better; it simply is this value for the benefit of maintainers, since
     the great majority of documentation is in English. 

3.1.2. The SCHEME Qualifier 
----------------------------

     The `SCHEME' qualifier indicates what notational scheme the content is
     encoded in. This qualifier is usually not available to the maintainer
     for manipulation, since there is only one reasonable scheme for an
     element in the Debian environment. 

     The default scheme is generally `freetext'. Other elements have a
     scheme of `URL' or `ISO.636'. 

3.1.2.1. Concerning the URL Scheme 
-----------------------------------

     In some cases, metadata refers to actual documents, which indicate the
     *resource* corresponding to the element set, or may be a different
     resource, which might have its own metadata elements associated with
     it. 

     The means by which metadata indicates a resource is by URL. Most
     everybody is already familiar with what a URL is; complete information
     can be found in RFC 1738. 

     There is one additional wrinkle. Often, an element needs to indicate a
     relationship with a local file (which may or may not be installed). In
     such cases, for the ease of the maintainer, a relative URL may be
     used. The implied basepath in such cases is
     `file://localhost/usr/doc/'. 

3.1.2.2. Concerning the Debian Type Scheme 
-------------------------------------------

     The Debian Type scheme is the set of allowable values (that is, the
     *domain*) for the content of the `Type' element. This scheme is very
     volatile and experimental, but we encourage you to use it and to offer
     comments on its use. Of course, the `Type' element is completely
     optional. 

     Resource types, it must be remembered, are orthogonal to both the
     `Subject' and `Format' elements. So, the `Type' element can be thought
     of as the generic *class* of resources, which is not related to its
     particular media or subject matter. 

     Currently, the domain of allowable values is 

        * howto 

        * faq 

        * manual *(i.e., manual for software)* 

        * reference 

        * specification 

        * tutorial 

        * figure 

        * mailbox 

        * homepage 

        * glossary 

        * collection 

        * package *(for future use, representing Debian package metadata)* 


3.1.2.3. Concerning the DDH Scheme 
-----------------------------------



3.2. Metadata Elements 
-----------------------

     In Debian Dublin Core, certain elements are required, some are
     optional, and some are ignored as insignificant. As a rule, the adage,
     "be liberal in what you accept and conservative in what you emit"
     applies to the system. 

     The following is a summary of the elements, which are described in
     detail below: 

        * *Required elements*

             * Identifier 

             * Title 

             * Subject 

             * Format 

        * *Optional elements*

             * Description 

             * Language 

             * Creator 

             * Contributor 

             * Publisher 

             * Date 

             * Source 

             * Relation.IsFormatOf 

             * Relation.IsBasedOn 

             * Type 

             * Rights 

        * Ignored Elements

             * Coverage 


3.2.1. Required Elements 
-------------------------

     These elements are required. Lacking these elements constitutes an
     error which will cause install-docs to reject the entire entry. 

     Identifier
          A URL used to uniquely identify the resource. If the URL is a
          relative URL, it is relative to the location
          `file://localhost/usr/doc/', as described in subsubsection
          3.1.2.1, `Concerning the URL Scheme '. 

          SCHEME
               URL 

          LANG
               ignored 

          repeatable?
               no 

     Title
          The title for the document, usually only a single line. If the
          document does not have a title, formulate the title as if it is
          the short selectable string of an HREF. 

          SCHEME
               freetext 

          LANG
               can set 

          repeatable?
               no 

     Subject
          Where this document is situated in the Subject Catalog. For
          Debian, this Subject Catalog is the *Debian Document Hierarchy*,
          which see. 

          SCHEME
               Debian, indicating the Debian Document Hierarchy 

          LANG
               ignored 

          repeatable?
               yes 

     Format
          The format of the document, indicated as a MIME type, for
          example, `text/html'. 

          SCHEME
               MIME 

          LANG
               cannot set 

          repeatable?
               no 


3.2.2. Optional Elements 
-------------------------

     These elements are optional. The content of these elements are
     captured by the system and should be displayed to the user by some
     means. 

     Description
          A description or abstract for the document. This gives the user
          more information about the document, so that they are able to
          decide whether it contains the information they are looking for. 

          SCHEME
               freetext 

          LANG
               can set 

          repeatable?
               no 

     Language
          The language of the intellectual content of the resource. If this
          element is not present, it defaults to `en', for English. 

          SCHEME
               RFC 1766 

          LANG
               cannot set 

          repeatable?
               no 

          example
               `de' 

     Creator
          The person or organization primarily responsible for creating the
          intellectual content of the resource. For example, authors in the
          case of written documents, artists, photographers, or
          illustrators in the case of visual resources. 

          SCHEME
               freetext, or email 

          LANG
               cannot set 

          repeatable?
               yes 

          example
               `A. P. Harris <aph@debian.org>' 

     Contributor
          Contributor to a document. For our purposes, this should only be
          used to indicate the translator of a document. Multiple authors
          for a document should simply use multiple Creator elements. 

          SCHEME
               freetext, or email 

          LANG
               can set 

          repeatable?
               yes 

          example
               `A. P. Harris <aph@debian.org>' 

     Publisher
          The element responsible for making the resource available in its
          present form, such as a publishing house, a university
          department, or a corporate entity.   

          SCHEME
               freetext, or email 

          LANG
               ignored 

          repeatable?
               yes 

     Date
          A date associated with the resource. For our purposes, this
          should indicated the last modification date of a resource. 

          SCHEME
               restricted ISO 8601 

          LANG
               ignored 

          repeatable?
               no 

          example
               `1997-11-05' or `1998' 

     Source
          Upstream location where a document originated. Generally this is
          a web site maintained by the document author, or the URL for a
          canonical upstream archive such as Sunsite. 

          SCHEME
               URL 

          LANG
               ignored 

          repeatable?
               yes 

          example
               `http://sunsite.unc.edu/mdw/HOWTOs/FOOBAR.html' 

     Relation.IsFormatOf, Relation.IsBasedOn
          Indicates a relationship to another resource. The content of this
          field is the URL to the resource related to, as in the Identifier
          element. (If the URL is relative, it is relative to the location
          `file://localhost/usr/doc'.) Relation.IsFormatOf indicates a
          format of the resource indicated in the content of this element,
          i.e., an HTML or ASCII version of an SGML file.
          Relation.IsBasedOn is used to indicated translations based on
          another document. Note that it is *not* an error for the
          content's URL to not exist on the users filesystem. 

          SCHEME
               URL 

          LANG
               ignored 

          repeatable?
               no 

          example
               `FAQ/Linux-FAQ' 

     Type
          The category of the resource, describing what sort of resource it
          is. This is orthogonal to both subject and format. The values of
          this field are constrained to the set of allowable values. 

          SCHEME
               Debian type, see subsubsection 3.1.2.2, `Concerning the
               Debian Type Scheme '. 

          LANG
               cannot set 

          repeatable?
               yes 

          example
               `howto' 

     Rights
          An identifier that links to a rights management statement, such
          as acceptable terms of use, the GPL, etc. 

          SCHEME
               URL 

          LANG
               cannot set 

          repeatable?
               yes 

          example
               `copyright/GPL' 


3.2.3. Ignored Elements 
------------------------

     The following elements are ignored. They are mentioned here because
     these fields are part of standard Dublin Core; they may some day
     become supported. 

     Coverage
          The spatial or temporal characteristics of the intellectual
          content of the resource. 



-------------------------------------------------------------------------------


4. docreg File Format 
----------------------

     The docreg file is the medium for the transmission of document
     metadata information to the local Document Store. As such, it is the
     package maintainer's way of attaching metadata to documents included
     in their package, and ensuring that metadata is available to the user
     who installed the package. 

     The docreg file is used in combination with install-docs as the
     complete interface that a document-provide package needs to worry
     about. End users need not be aware of docreg files at all; they are
     not end-user-editable. 


4.1. Design Rationale and Goals 
--------------------------------

     The docreg file is meant to be an easy, familiar mechanism for busy
     package maintainers. It uses a syntax similar to `control' files
     already used by package maintainers, namely an RFC-822 complaint
     syntax. 

     The docreg file format has the following design goals: 

        * Adherence to recognized metadata standards, namely, Dublin Core
          (see http://purl.oclc.org/metadata/dublin_core/). 

        * Easy to use for package maintainers; uses a very simple data
          model. 

        * Language-independant syntax, allowing for indication of the
          language of the document, as well as indication of the language
          of the metadata. 

        * Allow for flexiblity and inter-relationships between documents
          without imposing any additional dependancy complexity. 



4.2. How To Use the docreg File 
--------------------------------

     The docreg file itself is the file used by package maintainers to
     register documents into the Debian Document Registry. The doc-base
     packaging system (specifically the install-docs program) is
     responsible for processing the docreg file and adding the document's
     meta-information contained in the docreg file to the system's local
     Document Store. 

     Document metadata is all the information contained in the Debian
     Document Registry for a file. The composition of this metadata is
     directly related to the docreg file, since the docreg file is the sole
     transmitter of document metadata into the registry (via install-docs).
     While it is easy to confuse the difference between the document
     metadata and the docreg file, there is a distinction. 

     A docreg file may contain *metadata* for any number of distinct
     *documents*. A document is defined by a URL (generally a file in the
     `/usr/doc' area on the local machine). The metadata attached to this
     document describes this and only this document. Therefore, there is a
     one-to-one relationship between documents and metadata. To use a
     common paradigm, documents are the books in a library, metadata are
     the card catalog cards, and docreg files are simply bundles of one or
     more card catalog cards. 

     The URL of a document is its unique identifier. It is an error for one
     URL to have multiple metadata. In so far as a file is a URL, it
     follows that each document can have only one metadata attached to it.
     In many cases, a file is actually comprised of a number of files (or
     URLs), where the main file is simply the top-level file. This nuance
     of the actual file-system level instantiation of a document is not
     modeled by the system, nor does it need to be. 

     Documents relate to one another in various ways. For instance, a
     document might be a specially formatted version of another source
     document (the "IsFormatOf" relation). A document might be a
     translation of another document into a new language ("IsBasedOn"), or,
     more obscurely, a version of the work, perhaps interesting for
     historical purposes ("IsVersionOf"). Relationships between documents
     do not require actual package dependancies, however. 

4.2.1. Where To Put the docreg File 
------------------------------------

     docreg files are under package maintainer control; they are never
     altered by the Debian documentation system as a whole. The files
     should be installed and removed by the package itself using the
     standard means. The file may be autogenerated at the package
     maintainers discretion, however, it may not be altered after
     install-docs has run. 

     docreg files must be placed in the `/usr/share/doc-base/docreg/'
     subdirectory. By convention, this file should be named the same as the
     package, i.e., `/usr/share/doc-base/docreg/debian-policy'. This is not
     enforced; however, these file names must be globally unique across all
     packages. 

     At the convenience of the package maintainer, it is certainly
     allowable to use more than one docreg file per package. In this case,
     convention states that the files should be prefixed with the name of
     the package, i.e,. `/usr/share/doc-base/docreg/debian-policy-ascii'. 

4.2.2. Brief Comment on the Document Store 
-------------------------------------------

     The Document Store, in `/var/state/doc-base/docstore', is a file
     containing the collected information about all documents currently on
     the system. This file is in the same format as the docreg files. 

     The Document Store file may be processed by the doc-base system into a
     more optimized system as well, such as Berkeley database file. To be
     determined. 


4.3. docreg File Format 
------------------------

     The format of the docreg file borrows from the Debian control file
     format, which borrows from RFC 822. 

     First, some terminology. docreg files are composed of one or more
     metadata sets, where each set describes a single document (URL,
     actaully a file on disk). Metadata sets are composed of metadata
     elements, or fields, which includes required elements, optional
     elements, and ignored elements. These elements are treated in depth in
     chapter 3, `Debian Metadata Elements '. 

     Elements are lines composed of a label (that is, the name of the
     element), a colon (`:'), one or more optional qualifiers in
     parentheses, and finally the contents of the element. Sets are
     composed of elements separated by an empty line, or the top or bottom
     of the file. These controlled vocabularies are specificed by the built
     in implied `SCHEME', which is described in subsection 3.1.2, `The
     SCHEME Qualifier '. 

     Any element's contents may continue into multiple lines, but
     continuation lines must be indented from the left margin; this is
     called "folding". In some cases the contents are restricted to a
     controlled vocabulary, such as a URL, or a single value from a domain
     of possible values. 

     An augmented BNF description of the file format, probably only of
     interest to implementors, can be found below in subsection 4.3.4,
     `Augmented BNF Description '. 

4.3.1. Example Files 
---------------------

Identifier: debian-metadata/debian-metadata.sgml
Title: Debian Metadata Manual
Title: (LANG=de) Debian Metadaten Handbuch
Subject: debian/policy
Format: text/sgml
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Christian Swartz <schwarz@monet.m.isar.de>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification

Identifier: debian-metadata/debian-metadata.html/index.html
Title: Debian Metadata Manual
Title: (LANG=de) Debian Metadaten Handbuch
Subject: debian/policy
Format: text/html
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Christian Swartz <schwarz@monet.m.isar.de>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification

Identifier: debian-metadata/debian-metadata.text
Title: Debian Metadata Manual
Title: (LANG=de) Debian Metadaten Handbuch
Subject: debian/policy
Format: text/plain
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Christian Swartz <schwarz@monet.m.isar.de>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification

4.3.2. Field Sizes 
-------------------

     Field size limits are imposed on fields in order to facilitate a
     straight-forward database driven interface and hopefully help
     security. These size limits are checked at install time 
          Identifier		80
          Title			80
          Subject			80	(multiple elements combined)
          Format			40
          Description		1024
          Language		2
          Creator			200	(multiple elements combined)
          Contributor		200	(multiple elements combined)
          Publisher		200	(multiple elements combined)
          Date			80
          Source			200	(multiple elements combined)
          Relation.IsFormatOf	80
          Relation.IsBasedOn	80
          Type			80	(multiple elements combined)
          Rights			80	(multiple elements combined)
          Coverage		80

4.3.3. Weaknesses of the File Format 
-------------------------------------

     One weakness of the format is that there is a lot of repetetive
     encoding of identical information. The `Description' field 

4.3.4. Augmented BNF Description 
---------------------------------

     The following description uses augmented BNF as defined in RFC 822.
     This standard meta-format lets us define the docreg format without
     ambiguity. See also RFC 2068 for a description and example of
     augmented BNF. 

4.3.4.1. Basic Rules 
---------------------

     The following rules define fundamental building blocks used in the
     rest of this specification. 
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
     ISOCHAR     =  <any ISO-8859-1 character>
     CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
                     character and DEL>          ; (    177,     127.)
     LF          =  <ASCII LF, linefeed>         ; (     12,      10.)
     SPACE       =  <ASCII SP, space>            ; (     40,      32.)
     HTAB        =  <ASCII HT, horizontal-tab>   ; (     11,       9.)
     LWSP-char   =  SPACE / HTAB                 ; semantics = SPACE
     linear-white-space =  1*([LF] LWSP-char)    ; semantics = SPACE
                                                 ; LF => folding
     specials    =  "(" / ")" / "<" / ">" / "@"
                 /  "," / ";" / ":" / "\" / <">
                 /  "." / "[" / "]" "="
     atom        =1*<any CHAR except specials, SPACE and CTLs>
                                                     ; control fields
     ctext       = *<any ISOCHAR excluding "(",  ; field contents
                     ")", "\" & CR, & including
                     linear-white-space>
     end-of-rec  =  < 2*LF or end of file >

4.3.4.2. Field Definitions 
---------------------------

     Field semantics are the same as defined as "Header Field Definitions"
     in RFC 822 Section 3.1, with the exception that rather than CRLF we
     use the standard Unix line separator, LF. Long header fields are
     likewise supported, as specified in RFC 822 Section 3.1.1. 

     The following is the BNF composition of docreg fields syntax. 
     field               =  field-name ":" [*field-qualifier]
                         \  field-body LF LF
     field-name          = *atom
     field-body          =  field-body-contents
                            [LF LWSP-char field-body]         ; folding
     field-body-contents = *ctext
     field-qualifier     =  "(" *atom "=" *atom ")"

     `field-names' are not case-sensitive. Both `field-names' and
     `field-qualifier' are further constrained to the set of allowable
     values. Furthermore, in some cases, `field-contents' are constrained
     based on their qualifiers. For instance, a qualifier of `SCHEME=URL'
     would indicate that the contents should be a valid URL. 

     For clarifications on the way that fields are composed, refer to RFC
     822.[1] 

     [1]  Please email me with any corrections or clarifications.

4.3.4.3. docreg Specification 
------------------------------

     docreg files contain any number of metadata sets. 

               docreg-file         = *metadata-set
               metadata-set        = *fields end-of-rec


-------------------------------------------------------------------------------


5. Debian Metadata API 
-----------------------

     A simple C API, probably with Perl and Python wrappers, will be
     provided for the benefit of programmers wishing to make use of the
     local document store. 


-------------------------------------------------------------------------------


     Debian Metadata Project
     Adam P. Harris <aph@debian.org>, The Debian-Doc List
     <debian-doc@lists.debian.org> - $Revision: 1.4 $


--  
To UNSUBSCRIBE, email to debian-doc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: