[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

announcing 0.8.0 debian metadata standard



Sorry it's been an extra week; things got a little crazy around here.

An ASCII version of the new spec is included in this message.  For an
HTML version of the spec, see
<URL:http://va.debian.org/~aph/debian-metadata.html/ch-docreg-format.html>

Major changes:
 * incorporate both Marco's (starting with ./) and my relative URL
   system (maybe I should have a special token for that too).  Lets
   get on to implementing and see what makes more sense.
 * discuss metadata entity, element, resource relationship more clearly
 * loosen up docreg file placement; actually I'm suggesting it be
   named foo.docreg and put in the same dir with the document.
 * incorporate other tweaks etc from the list
 * stub out a discussion of the tools and install hooks
 * now using package number for my version number
 * chapters hoped to be complete: 1, 3, 4
   chapters still requiring deep work: 2 (quite a bit later), 5, 6

Basically I'm ready to start implementing.  I'm sending this to the
international lists as well for comment, since we really haven't dealt
with the charset issues at all yet, and I'm hoping they have useful
things to say.

-- 
.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>


                          Debian Metadata Project
                          -----------------------
                      Adam P. Harris <aph@debian.org>
             The Debian-Doc List <debian-doc@lists.debian.org>
              version 0.8.0, Sat, 18 Jul 1998 19:23:00 -0400 

0.1 Abstract
------------

     This manual contains a guide and a reference to the Debian Metadata
     Project. The Project's purpose, and the purpose of this document, is
     to outline a set of metadata elements, to specify an interface for
     package maintainers use in order to provide metadata about resources
     in their packages, to specify a unified subject catalog for
     categorizing metadata, and to specify an API for developers who wish
     to make use of a system's metadata. This manual is intended to serve
     as sub-policy for the deployment and utilization of metadata in
     Debian. Currently, it carries no actual force and is for informational
     purposes only. The manual is intended for both package maintainers,
     Debian document writers, and those implementing document display
     systems such as dwww and dhelp. 

0.2 Contents
------------

     1.        Introduction
     1.1.      Scope of this Document
     1.2.      Organization of this Document
     1.3.      Contributing to the Project

     2.        Local Configuration Options
     2.1.      Automatic Document Conversion

     3.        Debian Metadata Elements
     3.1.      Metadata Entities
     3.2.      Metadata Element Structure
     3.3.      Metadata Elements

     4.        docreg File Format
     4.1.      Design Rationale and Goals
     4.2.      How To Use the docreg File
     4.3.      docreg File Format

     5.        Tools for Maintainers
     5.1.      install-docs -- metadata installation and removal
     5.2.      validate-docreg -- metadata validation for maintainers
     5.3.      html2docreg -- convert HTML files to docreg
     5.4.      docreg2html -- convert docreg files to HTML

     6.        Debian Metadata for Implementors
     6.1.      Tracking Registered docreg Files
     6.2.      Augmented BNF Description for docreg Files
     6.3.      Hooking Into install-docs 

0.3 Copyright Notice
--------------------

     Copyright ©1998 Adam P. Harris, ©1997 Christian Schwarz. 

     This documentation is free software; you may redistribute it and/or
     modify it under the terms of the GNU General Public License as
     published by the Free Software Foundation; either version 2, or (at
     your option) any later version. 

     However, even though you are empowered to modify this specification,
     please do not do so; as a standard, it loses power if there are
     alternate versions of it available. Methods for centralized management
     and modification of this specification are outlined below. 

     This manual is free software; you may redistribute it and/or modify it
     under the terms of the GNU General Public License as published by the
     Free Software Foundation; either version 2, or (at your option) any
     later version. 

     This is distributed in the hope that it will be useful, but *without
     any warranty*; without even the implied warranty of merchantability or
     fitness for a particular purpose. See the GNU General Public License
     for more details. 

     A copy of the GNU General Public License is available as
     `/usr/doc/copyright/GPL' in the Debian GNU/Linux distribution or on
     the World Wide Web at http://www.gnu.org/copyleft/gpl.html. You can
     also obtain it by writing to the Free Software Foundation, Inc., 675
     Mass Ave, Cambridge, MA 02139, USA. 


-------------------------------------------------------------------------------


1. Introduction
---------------

     What is metadata? Metadata is information about information. The
     Debian Metadata Project is an attempt to provide a robust,
     standards-based metadata set, and the facilities to collect and
     display information about resources (usually, documents on a user's
     machine). Collected information includes the document's title, author,
     format, placement in a subject catalog, description, language, etc. 

     Why should anyone care about metadata? Primarily, metadata is useful
     in *resource discovery*. This is the process of finding out where to
     find information. You do this every time you run man -k or apropos;
     Altavista (http://www.altavista.digital.com) and HotBot
     (http://www.hotbot.com/) are typical of the current technologies in
     resource discovery. But *metadata* allows you to find resources in
     different and better ways. You can search by title, by language, by
     author; you can traverse a subject heirarchy, like a book's index.
     Metadata allows a more intelligent was to organize and present the
     vast amount of documentation that Debian already provides. 

     There are other benefits of having consistent metadata available. For
     instance, at document installation time, based on metadata,
     conversions may occur, or else fine-grained policies about what
     formats of documentation may be installed. Machines running Debian
     would be able to say things like, "if German and English versions of a
     document are available, remove the English version". 

     Debian uses as their metadata entity definition a specialized
     application of the Dublin Core
     (http://purl.oclc.org/metadata/dublin_core/). The Dublin Core is an
     informal standard formulated by an international group of professions
     in the fields of library science, and the networking and digital
     library research communities. 


1.1. Scope of this Document
---------------------------

     The purpose and scope of this document is to define a common baseline
     of metadata in Debian. Furthermore, this document is a manual meant to
     explain how to use metadata, for the benefit of curious users, package
     maintainers, or metadata integrators. As such, this manual covers the
     following issues: 

        * what the recognized metadata elements are 

        * how metadata is delivered 

        * what tools are available to help work with Debian's metadata
          standard 

        * how the system works, for the benefit of integrators 

     A related document is the Debian Documentation Hierarchy manual, which
     defines the standardized documentation subject tree. That document,
     not included here, describes the headings and subheadings under which
     documents may appear. 


1.2. Organization of this Document
----------------------------------

     The document is split into three main sections. The first section
     contains information of interest to any Debian user, curious about the
     features and capabilities of our metadata system. The second section
     is of interest to package maintainers. The final section is mainly of
     interest to documentation system providers or metadata display system
     developers. 

     System administration controls provided by the Debian Metadata system
     is documented in chapter 2, `Local Configuration Options'. chapter 3,
     `Debian Metadata Elements' defines the metadata elements, which are
     the data fields which can be populated for a given resource. 

     The next part of this manual is primarily of interest to Debian
     package maintainers. It begins with chapter 4, `docreg File Format',
     which describes the "docreg" file, the file that the package
     maintainer uses to *register* document metadata into the local
     document store. Finally, in chapter 5, `Tools for Maintainers', the
     use of install-docs and other tools to assist package maintainers. 

     The final part, chapter 6, `Debian Metadata for Implementors', is of
     interest to those who are working with Debian's metadata collection
     (implementors or integrators). This chapter contains a full BNF
     specification of docreg files, information on how developers can hook
     into install-docs for capturing certain metadata events, and
     information on the data provided for integrators by the doc-base
     system. 


1.3. Contributing to the Project
--------------------------------

     Discussions about the Debian Metadata Project generally take place on
     the Debian-Doc mailing list <debian-doc@lists.debian.org>. This is an
     open project; all are invited. To subscribe to this list, see
     http://www.debian.org/MailingLists/subscribe. 

     The newest version of the specification can be found, currently, at
     http://va.debian.org/~aph/debian-metadata.html/. This will be moving
     to a more standard location soon. 

     If you are interested in contributing code or text to the
     specification, please do! Read-only CVS access to the specification is
     publicly available at `cvs.debian.org'. CVS access ensures that you
     have the most up-to-date versions of the documentation and
     implementation source. 

     If you have a client/server capable `cvs' installed, do the following
     steps (note: the `>' represents your shell prompt, where you enter
     commands): 
> cvs -d :pserver:anonymous@cvs.debian.org:/cvs/doc-base login
(Logging in to anonymous@cvs.debian.org)
CVS password:  <hit return, i.e., a blank password>
> cvs -z9 -d :pserver:anonymous@cvs.debian.org:/cvs/doc-base co doc-base
cvs server: Updating doc-base
U doc-base/.cvsignore
U doc-base/Makefile
U doc-base/copyright.ent
[...]
     If you are a developer or for some other reason have an account on
     `cvs.debian.org', you can also use a `CVSROOT' (the part after the
     `-d') of `:ext:<username>@cvs.debian.org:/cvs/doc-base'.

     For more information on how to use CVS, see cvs(1). 


-------------------------------------------------------------------------------


2. Local Configuration Options
------------------------------

     Providing knobs and dials for system administrators to control local
     documentation is possible once we have the data provided by the Debian
     Metadata scheme. None of this functionality is present yet; however,
     preliminary ideas of desirable configuration capabilities are
     discussed here. 

     Such configuration possibilities can be categorized into a few major
     topics. The first topic is the ability to make decisions, based on
     local policy, whether or not to install the documentation. Here is a
     feature list: 

        * don't install particular formats ever, i.e., "I don't want any
          PostScript on my machine, this is a firewall" 

        * don't install particular languages, i.e., "I don't want any
          Spanish documentation installed". 

        * conditionally, don't install a particular language, i.e., if
          another language is available, i.e., "if a Spanish version of a
          document is available already, we don't need the English version,
          otherwise, we do." 



2.1. Automatic Document Conversion
----------------------------------

     Another major topic is the possibility of auto-conversion of
     documentation, either on demand or at install time. Here is a possible
     feature list: 

        * autoconvert on install based on format, i.e., "I want all SGML
          files to be converted into PDF, A4 sized paper. Please retain the
          SGML." 

        * autoconvert on demand based on formats, i.e., provide a facility
          such that we could write a CGI to convert documents on demand,
          say, using content negotiation or user selection 

        * "Even though policy says don't gzip HTML files, I've setup my
          browsers to handle it, so go ahead and gzip them." 

     Autoconversion is a very complex issue. Packages being installed
     should be able of registering their conversion capabilities with the
     system. For example, sdc can translate a particular set of DTDs into
     HTML, ASCII, nroff, or PostScript. gs can translate PostScript to PDF.
     The `docbook-stylesheets' package can translate documents written in
     the Docbook DTD to HTML, PostScript, or RTF. When conversions are
     done, the system should make new metadata for them and register this
     new metadata, probably with special fields to allow an audit-trail of
     the conversion actions. 

     Document formatting is a very complex issue. It can have dependencies
     on many different things in the system, such as fonts, obscure
     configuration settings, etc. For instance, if I change my paper-size
     in `/etc/papersize/', do I need to recreate any documents which
     depended on that setting? Additionally, we might need to allow a
     facility for the document manager to associate processing instructions
     for files. 

     Finally, the logistics of package maintenance make autoconversion
     complex. Do we remove converted documents when the package from whence
     its source came is removed? when it is purged? 


-------------------------------------------------------------------------------


3. Debian Metadata Elements
---------------------------

     This chapter contains a description Debian metadata, which is used to
     describe human-legible texts in a consistent and coherent way. The
     Debian Metadata Project uses the Dublin Core
     (http://purl.oclc.org/metadata/dublin_core/) set of metadata elements.
     Below we define logical structure of entities and elements, define how
     metadata relates to data, and describe the meaning and use of the
     elements individually. 


3.1. Metadata Entities
----------------------

     A metadata *entity* is composed of a set of *elements*, which are the
     individual bits of metadata. Every metadata entity describes one and
     only one *resource*, or document. However, a single resource may be
     described by more than one metadata entities. A *resource* is defined
     by a URL (generally a file in the documentation area of the package,
     on the local machine). 

     One can conceptualize this system using a library card catalog
     paradigm. Resources are the actual books in the library (or
     periodicals, or microfiche, etc.). Metadata entities are the cards in
     the card catalog. Metadata elements are the actual bits of information
     appearing on these cards. A single book may have more than one card;
     furthermore, it may appear in different parts of the card catalog. 


3.2. Metadata Element Structure
-------------------------------

     The Dublin Core Element semantics can be found at
     http://purl.oclc.org/metadata/dublin_core_elements. In some cases, we
     have restricted the syntax for the benefit of simplicity of
     implementation. These restrictions are always noted. 

     Metadata elements consist of two required parts: a *label*, and its
     *content*. Labels are the name or label of an element, and are
     selected from the domain of the possible lables listed below. Contents
     are the value for the element. 

     In standard Dublin Core, each element is repeatable. However, we have
     restricted the repeatability of certain fields for simplicity of
     implementation; these restrictions may be lifted at a later date.
     Generally, if an element's contents are not free text (i.e., if it
     doesn't make sense to talk of the *language* of the contents), we do
     not allow it to iterate. 

     Elements may occur in any order. Order is never significant. Case is
     never significant in labels or qualifiers; case is preserved in the
     content. 

     For the precise syntax of how elements are encoded in docreg files,
     see chapter 4, `docreg File Format'. 

     The Debian flavor of Dublin Core also places restrictions on
     qualifiers. *Qualifiers* are attributes which attach to elements in
     order to additionally define, or *qualify*, what the element is or
     what it refers to. For instance, the LANG qualifier defines the
     language that the actual metadata is written in (not the resource). In
     the Debian Metadata scheme, we have eliminated the necessity (or even
     possibility) for metadata maintainers to use qualifiers. For instance,
     as the subject scheme, we have no use for Dewey Decimal schemes;
     instead, we require our own scheme. As such, the Debian scheme uses
     *required implied qualifiers*. Unknown or unacceptable schemes are
     ignored as if they never appeared. As such, we only deal with
     qualifiers when converting in and out of docreg formats into foreign
     formats, which have different meanings and purposes. 

3.2.1. The LANG Qualifier
-------------------------

     The `LANG' qualifier indicates the language of the content of the
     element itself. For instances, if a `Description' element has a LANG
     qualifier value of <de>, the description itself is in German. 

     Language qualifiers are not settable. For many elements, content is
     described in formal structure such as a date field or a URL. For other
     elements which use natural language (that is, "Title" and
     "Description"), there is an implied LANG qualifier which is the same
     as the setting of the Language element.[1] 

     [1]  This restriction may be lifted at some point; for more details
          see the "Language" element description.

3.2.2. The SCHEME Qualifier
---------------------------

     The `SCHEME' qualifier indicates what notational scheme the content of
     a given element is encoded in. Like all qualifiers, this qualifier is
     not available to the maintainer for manipulation. There is only one
     reasonable scheme for a given element in the Debian environment.
     However, knowing the scheme for an element is important so you know
     how the content of the element should be encoded. 

     The default scheme is generally `free text'. Other elements have a
     scheme of `URL' or others, as described in section 3.3, `Metadata
     Elements'. 


3.3. Metadata Elements
----------------------

     In Debian Dublin Core, certain elements are required, some are
     optional, and some are ignored as insignificant. As a rule, the adage,
     "be liberal in what you accept and conservative in what you emit"
     applies to the system. 

     The following is a summary of the elements, which are described in
     detail below: 

        * Required elements

             * Identifier 

             * Title 

             * Subject 

             * Format 

        * Optional elements

             * Description 

             * Language 

             * Creator 

             * Contributor 

             * Publisher 

             * Date 

             * Source 

             * Relation.IsFormatOf 

             * Relation.IsBasedOn 

             * Type 

             * Rights 

        * Ignored Elements

             * Coverage 


3.3.1. Required Elements
------------------------

     These elements are required. Lacking these elements constitutes an
     error which will cause install-docs to reject the entire entry. 

     Identifier
          A URL used to uniquely identify the resource. Usually, the
          resource a local file on the user's file system (which may or may
          not be installed). In such cases, it would be beneficial for
          maintainers to be able to refer to the resource using a URL
          relative to a certain path. However, the actual path to be used
          is under debate. There are two proposed solutions: 

          1.   If the URL is a relative URL, it is relative to the location
               of the packages documentation area. Namely, it is relative
               to either `file://localhost/usr/share/doc/<package>' or
               `file://localhost/usr/doc/<package>'. 

          2.   If the URL is a relative URL, it is relative to the location
               of the docreg file itself. 

          In order to resolve this system, the following scheme is
          temporarily adopted. If the URL starts with `./' it is considered
          to be relative to the position of the docreg file which contains
          this entity. If the URL is a normal relative URL, it is
          considered to be relative to the package documentation area as
          described above. This scheme is a temporary comprise in order to
          accommodate both sides of the debate; perhaps when we have actual
          implementations in place, one or the other shall win out. 

          *Future directions.* We have perceived that it would be a good
          thing for certain documents to be identifiable by tokens which
          are less volatile than file names. Given this facility, our
          internal documentation could have persistant inter-document
          cross-references. 

          The IETF-blessed facility to accommodate this purpose is URNs.
          URNs are unique tokens defined by a central authority (such as
          the Debian Documentation Project) to which the organization have
          made a long-term commitment to. For instance, the DDP might
          decide to create a URN `debian-doc:policy' to represent the
          Debian Policy document. To implement this system, we would need
          to setup a central naming authority to coordinate and maintain
          the Debian URN list. Associated with this list could be a set or
          URLs and/or URCs, such as
          "http://www.debian.org/debian-policy/index.html";, mirrored
          locations, and even "the file index.html in the documentation
          area of the debian-policy package". Central, and centrally
          distributed (i.e., packaged) CGI scripts could be provided to
          dynamically interpret and support these URNs (i.e,. convert URNs
          to URLs on the fly). 

          When and if this facility is in place, the Debian Metadata system
          can be used to implement it and to support it. However, it has
          been decided that the project should not at this time wait for
          that facility. 

          SCHEME
               URL 

          repeatable?
               no 

     Title
          The title for the document, usually only a single line. If the
          document does not have a title, formulate the title as if it is
          the short selectable string of an HREF. The language that this
          field is expressed in must be the same as the language indicated
          in the "Language" element. 

          SCHEME
               free text 

          repeatable?
               no 

     Subject
          Where this document is situated in the subject catalog. A subject
          catalog is a way of hierarchically organizing documents based on
          the subject matter covered by the document. For Debian, this
          Subject Catalog is the *Debian Document Hierarchy*, or DDH for
          short. See the Debian Documentation Hierarchy manual for
          specifics. 

          SCHEME
               Debian, indicating the Debian Document Hierarchy 

          repeatable?
               yes 

     Format
          The format of the document, indicated as a MIME type, for
          example, `text/html'. 

          SCHEME
               RFC 1522 etc (MIME) 

          repeatable?
               no 


3.3.2. Optional Elements
------------------------

     These elements are optional. The content of these elements are
     captured by the system and should be displayed to the user by some
     means. 

     Description
          A description, or abstract, for the resource. This gives the user
          more information about the resource, so that they are able to
          decide whether it contains the information they are looking for.
          The language that this field is expressed in must be the same as
          the language indicated in the "Language" element. 

          *Future directions.* We may wish to define a subset of HTML
          elements to allow in the content of this element. For instance:
          `<bf>', `<em>', `<tt>', `<a href=...>', `<code>', `<p>', `<var>',

          SCHEME
               free text 

          repeatable?
               no 

     Language
          The language of the intellectual content of the resource. If this
          element is not present, it defaults to `en', for English. 

          SCHEME
               RFC 1766 

          repeatable?
               no 

          example
               `de' 

     Creator
          The person or organization primarily responsible for creating the
          intellectual content of the resource. For example, authors in the
          case of written documents, artists, photographers, or
          illustrators in the case of visual resources. 

          SCHEME
               free text, or RFC 822 Address specification 

          repeatable?
               yes 

          example
               `A. P. Harris <aph@debian.org>' 

     Contributor
          Contributor to a document. For our purposes, this should only be
          used to indicate the translator of a document. Multiple authors
          for a document should simply use multiple Creator elements. 

          SCHEME
               free text, or email 

          repeatable?
               yes 

          example
               `A. P. Harris <aph@debian.org>' 

     Publisher
          The element responsible for making the resource available in its
          present form, such as a publishing house, a university
          department, or a corporate entity. 

          SCHEME
               free text, or email 

          repeatable?
               yes 

     Date
          A date associated with the resource. For our purposes, this
          should indicated the last modification date of a resource. 

          SCHEME
               ISO 8601 Profile, found at
               http://www.w3.org/TR/NOTE-datetime-970915 

          repeatable?
               no 

          example
               recommended to use only year-month-day granularity such as
               `1997-11-05' or `1997-11'; more granular formats such as
               `1997-07-16T19:20:30+01:00' are also available. 

     Source
          Upstream location where a document originated. Generally this is
          a web site maintained by the document author, or the URL for a
          canonical upstream archive such as Sunsite. 

          SCHEME
               URL 

          repeatable?
               yes 

          example
               `http://sunsite.unc.edu/mdw/HOWTOs/FOOBAR.html' 

     Relation.IsFormatOf, Relation.IsBasedOn
          Indicates a relationship to another resource. The content of this
          field is the URL to the resource related to, as in the Identifier
          element. Relation.IsFormatOf indicates a format of the resource
          indicated in the content of this element, i.e., an HTML or ASCII
          version of an SGML file. Relation.IsBasedOn is used to indicated
          translations based on another document. Note that it is *not* an
          error for the content's URL to not exist on the users filesystem.
          *Future directions.* We ought to define a nice standard way to
          refer to files from other packages, i.e., file `index.html' from
          the documentation area of the package `foobar'. 

          SCHEME
               URL 

          repeatable?
               no 

          example
               `FAQ/Linux-FAQ' 

     Type
          The category of the resource, describing what sort of resource it
          is. Resource types are orthogonal to both the `Subject' and
          `Format' elements. So, the `Type' element can be thought of as
          the generic *class* of resources, which is not related to its
          particular media or subject matter. The value you select for this
          element should be selected from the following list if
          appropriate: 

             * howto 

             * faq 

             * manual *(i.e., manual for software)* 

             * reference 

             * specification *(i.e., a Policy document)* 

             * tutorial 

             * figure 

             * homepage 

             * glossary 

             * collection 

             * discussion group *(i.e., a mailing list or Usenet group, or
               archives of the same)* 

             * package *(for future use, representing Debian package
               metadata)* 

          If the Type you are looking for is not on the list, but you feel
          it should be, please email <doc-base@packages.debian.org>. 

          SCHEME
               Debian type 

          repeatable?
               yes 

          example
               `howto' 

     Rights
          An identifier that links to a rights management statement, such
          as acceptable terms of use, the GPL, etc. 

          SCHEME
               URL 

          repeatable?
               yes 

          example
               `copyright/GPL' 


3.3.3. Ignored Elements
-----------------------

     The following elements are ignored. They are mentioned here because
     these fields are part of standard Dublin Core; they may some day
     become supported. 

     Coverage
          The spatial or temporal characteristics of the intellectual
          content of the resource. 



-------------------------------------------------------------------------------


4. docreg File Format
---------------------

     The docreg file is the medium for the transmission of document
     metadata information to the local Document Store. As such, it is the
     package maintainer's way of attaching metadata to documents included
     in their package, and ensuring that metadata is available to the user
     who installed the package. 

     The docreg file is used in combination with install-docs as the
     complete interface that a document-provide package needs to worry
     about. End users need not be aware of docreg files at all; they are
     not end-user editable. 


4.1. Design Rationale and Goals
-------------------------------

     The docreg file is meant to be an easy, familiar mechanism for busy
     package maintainers. It uses a syntax similar to `control' files
     already used by package maintainers, namely an RFC-822 complaint
     syntax. 

     The docreg file format has the following design goals: 

        * Adherence to recognized metadata standards, namely, the Dublin
          Core (http://purl.oclc.org/metadata/dublin_core/) element set. 

        * Easy to use for package maintainers; uses a very simple data
          model. 

        * Language-independent syntax, allowing for indication of the
          language of the document, as well as indication of the language
          of the metadata. 

        * Allow for flexibility and inter-relationships between documents
          without imposing any dependency or entity modeling complexity. 



4.2. How To Use the docreg File
-------------------------------

     The docreg file itself is the file used by package maintainers to
     register documents into the Debian Document Registry. The doc-base
     packaging system (specifically the install-docs program) is
     responsible for processing the docreg file and adding the document's
     meta-information contained in the docreg file to the system's local
     Document Store. 

     Document metadata is all the information contained in the Debian
     Document Registry for a file. The composition of this metadata is
     directly related to the docreg file, since the docreg file is the sole
     transmitter of document metadata into the registry (via install-docs).
     While it is easy to confuse the difference between the document
     metadata and the docreg file, there is a distinction. 

     A docreg file may contain one or more *metadata entities*, as
     described in section 3.1, `Metadata Entities'. To extend the paradigm
     from that section, documents are the books in a library, metadata
     entities are the cards in the card catalog, and docreg files are
     simply bundles of one or more card catalog cards which are delivered
     to the library. 

     While each metadata entity refers to one and only one resource (local
     or otherwise), it does not follow that each resource has one and only
     one bit of metadata.[1] It is possible, although unusual, that a
     resource may have more than one metadata entity referring to it. 

     [1]  It follows that the Identifier for a metadata entity is not
          necessarily a unique identifier for that entity.

     Documents can relate to one another in various ways. For instance, a
     document might be a specially formatted version of another source
     document (the "IsFormatOf" relation). A document might be a
     translation of another document into a new language ("IsBasedOn"), or,
     more obscurely, a version of the work, perhaps interesting for
     historical purposes ("IsVersionOf"). Relationships between documents
     do not require actual package dependencies, however. 

4.2.1. Where To Put the docreg File
-----------------------------------

     docreg files are under package maintainer control; they are never
     altered by the Debian documentation system as a whole. The files
     should be installed and removed by the package itself using the
     standard means. The file may be automatically generated at the package
     maintainers discretion, however, it may not be altered after
     install-docs has run. 

     At the convenience of the package maintainer, it is allowable to use
     more than one docreg file per package. 

     docreg files may be placed in any location. It is suggested docreg
     files are placed in the directory containing the resource they
     describe. Moreover, the file should have `.docreg' as their suffix.
     However, maintainers may name or place documents whereever they wish. 

     Alternatively, some have suggested `/usr/share/doc-base/docreg/'
     subdirectory. In which case, the docreg file should be named the same
     as the package, or prefixed with the same, i.e.,
     `/usr/share/doc-base/docreg/debian-policy'. Whatever the file name,
     the names must be globally unique across all packages. Prefixing them
     with the package name helps ensure against collisions. 


4.3. docreg File Format
-----------------------

     The format of the docreg file borrows from the Debian control file
     format, which borrows from RFC 822. 

     First, some terminology. docreg files are composed of one or more
     metadata entities, where each entity describes a single document (URL,
     actually a file on disk). Metadata entities are composed of elements,
     or fields, which includes required elements, optional elements, and
     ignored elements. These elements are treated in depth in chapter 3,
     `Debian Metadata Elements'. 

     Elements are lines composed of a label (that is, the name of the
     element), a colon (`:'), one or more optional qualifiers in
     parentheses, and finally the contents of the element. Elements are
     composed of elements separated by an empty line, or the top or bottom
     of the file. These controlled vocabularies are specified by the built
     in implied `SCHEME', which is described in subsection 3.2.2, `The
     SCHEME Qualifier'. 

     Any element's contents may continue into multiple lines, but
     continuation lines must be indented from the left margin; this is
     called "folding". In some cases the contents are restricted to a
     controlled vocabulary, such as a URL, or a single value from a domain
     of possible values. 

     An augmented BNF description of the file format, probably only of
     interest to implementors, can be found in section 6.2, `Augmented BNF
     Description for docreg Files'. 

4.3.1. Example Files
--------------------

     The following is an example for the current document. There are three
     formats provided: SGML, ASCII, and HTML. 
Identifier: debian-metadata/debian-metadata.sgml
Format: text/sgml
Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification

Identifier: debian-metadata/debian-metadata.html/index.html
Format: text/html
Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification

Identifier: debian-metadata/debian-metadata.text
Format: text/plain
Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification

     As the reader can see, there is a lot of repetition between the
     different elements. Therefore, it is suggested that docreg files take
     advantage of a preprocessor, such as m4. Here is a much shorter
     version of the docreg file, which is processed by m4 to make the above
     entries: 
changequote([, ])dnl
define([common_elements], [Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
  Debian Metadata Project. The Project's purpose, and the purpose of
  this document, is to outline a set of metadata elements, to specify
  an interface for package maintainers use in order to provide
  metadata about resources in their packages, to specify a unified
  subject catalog for categorizing metadata, and to specify an API for
  developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification])dnl

Identifier: debian-metadata/debian-metadata.sgml
Format: text/sgml
common_elements

Identifier: debian-metadata/debian-metadata.html/index.html
Format: text/html
common_elements

Identifier: debian-metadata/debian-metadata.text
Format: text/plain
common_elements

4.3.2. Field Sizes
------------------

     Field size limits are imposed on fields in order to facilitate a
     straight-forward database-driven storage system (not yet in place). 
          Identifier		256
          Title			80
          Subject			160	(multiple elements combined)
          Format			40
          Description		512
          Language		2
          Creator			200	(multiple elements combined)
          Contributor		200	(multiple elements combined)
          Publisher		200	(multiple elements combined)
          Date			10
          Source			256	(multiple elements combined)
          Relation.IsFormatOf	256
          Relation.IsBasedOn	256
          Type			80	(multiple elements combined)
          Rights			256
          Coverage		80


-------------------------------------------------------------------------------


5. Tools for Maintainers
------------------------

     Maintainer tools are applications which are available to maintainers
     to assist them in managing metadata which they are responsible for.
     This chapter contains an overview of the available tools; for full
     information about these applications, please see the manual pages
     provided for the programs. 


5.1. install-docs -- metadata installation and removal
------------------------------------------------------

     install-docs is used from the package maintainer scripts to install or
     remove a docreg file from the local store of registered metadata.
     *Examples of how to invoke from maintainer scripts.* 

     Metadata integrators can extend install-docs functionality by using
     the techniques described in section 6.3, `Hooking Into install-docs '.

     It is part of the standard doc-base package. 


5.2. validate-docreg -- metadata validation for maintainers
-----------------------------------------------------------

     More extensive validation can be used by package maintainers to ensure
     that their metadata is well formed. Examples of the validation done: 

        * Relations actually exist (would depend whether the package
          related to is installed). 

        * Translations of a document use same subject as the metadata of
          what they've translated from. 

        * Validate fields, such as date, language, etc. 

     *Examples of how to invoke from debian/rules.* 

     This utility is part of the doc-base-dev package. 


5.3. html2docreg -- convert HTML files to docreg
------------------------------------------------

     Converts standard Dublin Core HTML `META' tags into docreg syntax.
     Supports both HTML v3 and v4 META syntax. 

     This utility is part of the doc-base-dev package. 


5.4. docreg2html -- convert docreg files to HTML
------------------------------------------------

     Converts standard docreg files to Dublin Core standard HTML `META'
     elements. Can switch between HTML v3 and v4 META syntax. Qualifiers
     are added automatically. 

     This utility is part of the doc-base-dev package. 


-------------------------------------------------------------------------------


6. Debian Metadata for Implementors
-----------------------------------

     This chapter is for those who are implementing interfaces or
     extensions to the Debian Metadata infrastructure. 


6.1. Tracking Registered docreg Files
-------------------------------------

     Currently, the only means by which an implementor can access metadata
     is through the docreg files themselves. 

     The Debian Metadata project feels that direct access to docreg files
     is a temporary state of affairs, since direct access to docreg files
     will ultimately place too many constraints on the file format and
     contents. For instance, moving to XML, or offering multiple docreg
     file format, would be doubly difficult to implement. Abstraction is
     needed between the "container" file (transmitting the information) and
     the "store" of locally known metadata. However, delaying
     implementation until we have a local storage system is not acceptable,
     since we want to get the system to be actually used in the world
     before investing too deeply in an infrastructure. 

     Therefore, for now, implementors should use the file
     `/var/state/doc-base/registered-docreg-files' to discover the list of
     installed docreg files. 


6.2. Augmented BNF Description for docreg Files
-----------------------------------------------

     The following description uses augmented BNF as defined in RFC 822.
     This standard meta-format lets us define the docreg format without
     ambiguity. See also RFC 2068 for a description and example of
     augmented BNF. 

6.2.1. Basic Rules 
-------------------

     The following rules define fundamental building blocks used in the
     rest of this specification. 
     CHAR        =  <any ASCII character>        ; (  0-177,  0.-127.)
     ISOCHAR     =  <any ISO-8859-1 character>
     CTL         =  <any ASCII control           ; (  0- 37,  0.- 31.)
                     character and DEL>          ; (    177,     127.)
     LF          =  <ASCII LF, linefeed>         ; (     12,      10.)
     SPACE       =  <ASCII SP, space>            ; (     40,      32.)
     HTAB        =  <ASCII HT, horizontal-tab>   ; (     11,       9.)
     LWSP-char   =  SPACE / HTAB                 ; semantics = SPACE
     linear-white-space =  1*([LF] LWSP-char)    ; semantics = SPACE
                                                 ; LF => folding
     specials    =  "(" / ")" / "<" / ">" / "@"
                 /  "," / ";" / ":" / "\" / <">
                 /  "." / "[" / "]" "="
     atom        =1*<any CHAR except specials, SPACE and CTLs>
                                                     ; control fields
     ctext       = *<any ISOCHAR excluding "(",  ; field contents
                     ")", "\" & CR, & including
                     linear-white-space>
     end-of-rec  =  < 2*LF or end of file >

6.2.2. Field Definitions 
-------------------------

     Field semantics are the same as defined as "Header Field Definitions"
     in RFC 822 Section 3.1, with the exception that rather than CRLF we
     use the standard Unix line separator, LF. Long header fields are
     likewise supported, as specified in RFC 822 Section 3.1.1. 

     The following is the BNF composition of docreg fields syntax. 
     field               =  field-name ":" [*field-qualifier]
                         \  field-body LF LF
     field-name          = *atom
     field-body          =  field-body-contents
                            [LF LWSP-char field-body]         ; folding
     field-body-contents = *ctext
     field-qualifier     =  "(" *atom "=" *atom ")"

     `field-names' are not case-sensitive. Both `field-names' and
     `field-qualifier' are further constrained to the set of allowable
     values. Furthermore, in some cases, `field-contents' are constrained
     based on their qualifiers. For instance, a qualifier of `SCHEME=URL'
     would indicate that the contents should be a valid URL. 

     For clarifications on the way that fields are composed, refer to RFC
     822.[1] 

     [1]  Please email me with any corrections or clarifications.

6.2.3. docreg Definition 
-------------------------

     docreg files contain any number of metadata sets. 

               docreg-file         = *metadata-set
               metadata-set        = *fields end-of-rec


6.3. Hooking Into install-docs 
-------------------------------

     This section specifies a proposed method of allowing packages to hook
     into install-docs invocations. It is not yet decided whether this
     functionality is necessary. If you are a metadata implementor, and you
     find that you do need this functionality, or find that this
     functionality is not sufficient for your needs, then please email
     <doc-base@packages.debian.org>. 

     Metadata implementors can *hook* into state changes in metadata by
     providing scripts in `/usr/share/doc-base/methods/' which are
     executable. These hooks must be platform-independant (or else maybe we
     should move this to `/usr/lib/doc-base/methods'). Most likely they
     will be wrappers around the actual programs. 

     The following states have hooks; the hooks are indicated by the first
     argument passed to the scripts. 

     install <docreg_file>
          call to register or re-register the docreg file located at
          <docreg_file> 

     remove <docreg_file>
          call to unregister the docreg file located at <docreg_file> 

     rebuild
          rebuild local caches of metadata; essentially this implies
          clearing out all data and reinstalling each of the registered
          docreg files



-------------------------------------------------------------------------------


     Debian Metadata Project
     Adam P. Harris <aph@debian.org>, The Debian-Doc List
     <debian-doc@lists.debian.org> - version 0.8.0, Sat, 18 Jul 1998
     19:23:00 -0400 


--  
To UNSUBSCRIBE, email to debian-doc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org


Reply to: