announcing 0.8.0 debian metadata standard
Sorry it's been an extra week; things got a little crazy around here.
An ASCII version of the new spec is included in this message. For an
HTML version of the spec, see
<URL:http://va.debian.org/~aph/debian-metadata.html/ch-docreg-format.html>
Major changes:
* incorporate both Marco's (starting with ./) and my relative URL
system (maybe I should have a special token for that too). Lets
get on to implementing and see what makes more sense.
* discuss metadata entity, element, resource relationship more clearly
* loosen up docreg file placement; actually I'm suggesting it be
named foo.docreg and put in the same dir with the document.
* incorporate other tweaks etc from the list
* stub out a discussion of the tools and install hooks
* now using package number for my version number
* chapters hoped to be complete: 1, 3, 4
chapters still requiring deep work: 2 (quite a bit later), 5, 6
Basically I'm ready to start implementing. I'm sending this to the
international lists as well for comment, since we really haven't dealt
with the charset issues at all yet, and I'm hoping they have useful
things to say.
--
.....A. P. Harris...apharris@onShore.com...<URL:http://www.onShore.com/>
Debian Metadata Project
-----------------------
Adam P. Harris <aph@debian.org>
The Debian-Doc List <debian-doc@lists.debian.org>
version 0.8.0, Sat, 18 Jul 1998 19:23:00 -0400
0.1 Abstract
------------
This manual contains a guide and a reference to the Debian Metadata
Project. The Project's purpose, and the purpose of this document, is
to outline a set of metadata elements, to specify an interface for
package maintainers use in order to provide metadata about resources
in their packages, to specify a unified subject catalog for
categorizing metadata, and to specify an API for developers who wish
to make use of a system's metadata. This manual is intended to serve
as sub-policy for the deployment and utilization of metadata in
Debian. Currently, it carries no actual force and is for informational
purposes only. The manual is intended for both package maintainers,
Debian document writers, and those implementing document display
systems such as dwww and dhelp.
0.2 Contents
------------
1. Introduction
1.1. Scope of this Document
1.2. Organization of this Document
1.3. Contributing to the Project
2. Local Configuration Options
2.1. Automatic Document Conversion
3. Debian Metadata Elements
3.1. Metadata Entities
3.2. Metadata Element Structure
3.3. Metadata Elements
4. docreg File Format
4.1. Design Rationale and Goals
4.2. How To Use the docreg File
4.3. docreg File Format
5. Tools for Maintainers
5.1. install-docs -- metadata installation and removal
5.2. validate-docreg -- metadata validation for maintainers
5.3. html2docreg -- convert HTML files to docreg
5.4. docreg2html -- convert docreg files to HTML
6. Debian Metadata for Implementors
6.1. Tracking Registered docreg Files
6.2. Augmented BNF Description for docreg Files
6.3. Hooking Into install-docs
0.3 Copyright Notice
--------------------
Copyright ©1998 Adam P. Harris, ©1997 Christian Schwarz.
This documentation is free software; you may redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2, or (at
your option) any later version.
However, even though you are empowered to modify this specification,
please do not do so; as a standard, it loses power if there are
alternate versions of it available. Methods for centralized management
and modification of this specification are outlined below.
This manual is free software; you may redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
This is distributed in the hope that it will be useful, but *without
any warranty*; without even the implied warranty of merchantability or
fitness for a particular purpose. See the GNU General Public License
for more details.
A copy of the GNU General Public License is available as
`/usr/doc/copyright/GPL' in the Debian GNU/Linux distribution or on
the World Wide Web at http://www.gnu.org/copyleft/gpl.html. You can
also obtain it by writing to the Free Software Foundation, Inc., 675
Mass Ave, Cambridge, MA 02139, USA.
-------------------------------------------------------------------------------
1. Introduction
---------------
What is metadata? Metadata is information about information. The
Debian Metadata Project is an attempt to provide a robust,
standards-based metadata set, and the facilities to collect and
display information about resources (usually, documents on a user's
machine). Collected information includes the document's title, author,
format, placement in a subject catalog, description, language, etc.
Why should anyone care about metadata? Primarily, metadata is useful
in *resource discovery*. This is the process of finding out where to
find information. You do this every time you run man -k or apropos;
Altavista (http://www.altavista.digital.com) and HotBot
(http://www.hotbot.com/) are typical of the current technologies in
resource discovery. But *metadata* allows you to find resources in
different and better ways. You can search by title, by language, by
author; you can traverse a subject heirarchy, like a book's index.
Metadata allows a more intelligent was to organize and present the
vast amount of documentation that Debian already provides.
There are other benefits of having consistent metadata available. For
instance, at document installation time, based on metadata,
conversions may occur, or else fine-grained policies about what
formats of documentation may be installed. Machines running Debian
would be able to say things like, "if German and English versions of a
document are available, remove the English version".
Debian uses as their metadata entity definition a specialized
application of the Dublin Core
(http://purl.oclc.org/metadata/dublin_core/). The Dublin Core is an
informal standard formulated by an international group of professions
in the fields of library science, and the networking and digital
library research communities.
1.1. Scope of this Document
---------------------------
The purpose and scope of this document is to define a common baseline
of metadata in Debian. Furthermore, this document is a manual meant to
explain how to use metadata, for the benefit of curious users, package
maintainers, or metadata integrators. As such, this manual covers the
following issues:
* what the recognized metadata elements are
* how metadata is delivered
* what tools are available to help work with Debian's metadata
standard
* how the system works, for the benefit of integrators
A related document is the Debian Documentation Hierarchy manual, which
defines the standardized documentation subject tree. That document,
not included here, describes the headings and subheadings under which
documents may appear.
1.2. Organization of this Document
----------------------------------
The document is split into three main sections. The first section
contains information of interest to any Debian user, curious about the
features and capabilities of our metadata system. The second section
is of interest to package maintainers. The final section is mainly of
interest to documentation system providers or metadata display system
developers.
System administration controls provided by the Debian Metadata system
is documented in chapter 2, `Local Configuration Options'. chapter 3,
`Debian Metadata Elements' defines the metadata elements, which are
the data fields which can be populated for a given resource.
The next part of this manual is primarily of interest to Debian
package maintainers. It begins with chapter 4, `docreg File Format',
which describes the "docreg" file, the file that the package
maintainer uses to *register* document metadata into the local
document store. Finally, in chapter 5, `Tools for Maintainers', the
use of install-docs and other tools to assist package maintainers.
The final part, chapter 6, `Debian Metadata for Implementors', is of
interest to those who are working with Debian's metadata collection
(implementors or integrators). This chapter contains a full BNF
specification of docreg files, information on how developers can hook
into install-docs for capturing certain metadata events, and
information on the data provided for integrators by the doc-base
system.
1.3. Contributing to the Project
--------------------------------
Discussions about the Debian Metadata Project generally take place on
the Debian-Doc mailing list <debian-doc@lists.debian.org>. This is an
open project; all are invited. To subscribe to this list, see
http://www.debian.org/MailingLists/subscribe.
The newest version of the specification can be found, currently, at
http://va.debian.org/~aph/debian-metadata.html/. This will be moving
to a more standard location soon.
If you are interested in contributing code or text to the
specification, please do! Read-only CVS access to the specification is
publicly available at `cvs.debian.org'. CVS access ensures that you
have the most up-to-date versions of the documentation and
implementation source.
If you have a client/server capable `cvs' installed, do the following
steps (note: the `>' represents your shell prompt, where you enter
commands):
> cvs -d :pserver:anonymous@cvs.debian.org:/cvs/doc-base login
(Logging in to anonymous@cvs.debian.org)
CVS password: <hit return, i.e., a blank password>
> cvs -z9 -d :pserver:anonymous@cvs.debian.org:/cvs/doc-base co doc-base
cvs server: Updating doc-base
U doc-base/.cvsignore
U doc-base/Makefile
U doc-base/copyright.ent
[...]
If you are a developer or for some other reason have an account on
`cvs.debian.org', you can also use a `CVSROOT' (the part after the
`-d') of `:ext:<username>@cvs.debian.org:/cvs/doc-base'.
For more information on how to use CVS, see cvs(1).
-------------------------------------------------------------------------------
2. Local Configuration Options
------------------------------
Providing knobs and dials for system administrators to control local
documentation is possible once we have the data provided by the Debian
Metadata scheme. None of this functionality is present yet; however,
preliminary ideas of desirable configuration capabilities are
discussed here.
Such configuration possibilities can be categorized into a few major
topics. The first topic is the ability to make decisions, based on
local policy, whether or not to install the documentation. Here is a
feature list:
* don't install particular formats ever, i.e., "I don't want any
PostScript on my machine, this is a firewall"
* don't install particular languages, i.e., "I don't want any
Spanish documentation installed".
* conditionally, don't install a particular language, i.e., if
another language is available, i.e., "if a Spanish version of a
document is available already, we don't need the English version,
otherwise, we do."
2.1. Automatic Document Conversion
----------------------------------
Another major topic is the possibility of auto-conversion of
documentation, either on demand or at install time. Here is a possible
feature list:
* autoconvert on install based on format, i.e., "I want all SGML
files to be converted into PDF, A4 sized paper. Please retain the
SGML."
* autoconvert on demand based on formats, i.e., provide a facility
such that we could write a CGI to convert documents on demand,
say, using content negotiation or user selection
* "Even though policy says don't gzip HTML files, I've setup my
browsers to handle it, so go ahead and gzip them."
Autoconversion is a very complex issue. Packages being installed
should be able of registering their conversion capabilities with the
system. For example, sdc can translate a particular set of DTDs into
HTML, ASCII, nroff, or PostScript. gs can translate PostScript to PDF.
The `docbook-stylesheets' package can translate documents written in
the Docbook DTD to HTML, PostScript, or RTF. When conversions are
done, the system should make new metadata for them and register this
new metadata, probably with special fields to allow an audit-trail of
the conversion actions.
Document formatting is a very complex issue. It can have dependencies
on many different things in the system, such as fonts, obscure
configuration settings, etc. For instance, if I change my paper-size
in `/etc/papersize/', do I need to recreate any documents which
depended on that setting? Additionally, we might need to allow a
facility for the document manager to associate processing instructions
for files.
Finally, the logistics of package maintenance make autoconversion
complex. Do we remove converted documents when the package from whence
its source came is removed? when it is purged?
-------------------------------------------------------------------------------
3. Debian Metadata Elements
---------------------------
This chapter contains a description Debian metadata, which is used to
describe human-legible texts in a consistent and coherent way. The
Debian Metadata Project uses the Dublin Core
(http://purl.oclc.org/metadata/dublin_core/) set of metadata elements.
Below we define logical structure of entities and elements, define how
metadata relates to data, and describe the meaning and use of the
elements individually.
3.1. Metadata Entities
----------------------
A metadata *entity* is composed of a set of *elements*, which are the
individual bits of metadata. Every metadata entity describes one and
only one *resource*, or document. However, a single resource may be
described by more than one metadata entities. A *resource* is defined
by a URL (generally a file in the documentation area of the package,
on the local machine).
One can conceptualize this system using a library card catalog
paradigm. Resources are the actual books in the library (or
periodicals, or microfiche, etc.). Metadata entities are the cards in
the card catalog. Metadata elements are the actual bits of information
appearing on these cards. A single book may have more than one card;
furthermore, it may appear in different parts of the card catalog.
3.2. Metadata Element Structure
-------------------------------
The Dublin Core Element semantics can be found at
http://purl.oclc.org/metadata/dublin_core_elements. In some cases, we
have restricted the syntax for the benefit of simplicity of
implementation. These restrictions are always noted.
Metadata elements consist of two required parts: a *label*, and its
*content*. Labels are the name or label of an element, and are
selected from the domain of the possible lables listed below. Contents
are the value for the element.
In standard Dublin Core, each element is repeatable. However, we have
restricted the repeatability of certain fields for simplicity of
implementation; these restrictions may be lifted at a later date.
Generally, if an element's contents are not free text (i.e., if it
doesn't make sense to talk of the *language* of the contents), we do
not allow it to iterate.
Elements may occur in any order. Order is never significant. Case is
never significant in labels or qualifiers; case is preserved in the
content.
For the precise syntax of how elements are encoded in docreg files,
see chapter 4, `docreg File Format'.
The Debian flavor of Dublin Core also places restrictions on
qualifiers. *Qualifiers* are attributes which attach to elements in
order to additionally define, or *qualify*, what the element is or
what it refers to. For instance, the LANG qualifier defines the
language that the actual metadata is written in (not the resource). In
the Debian Metadata scheme, we have eliminated the necessity (or even
possibility) for metadata maintainers to use qualifiers. For instance,
as the subject scheme, we have no use for Dewey Decimal schemes;
instead, we require our own scheme. As such, the Debian scheme uses
*required implied qualifiers*. Unknown or unacceptable schemes are
ignored as if they never appeared. As such, we only deal with
qualifiers when converting in and out of docreg formats into foreign
formats, which have different meanings and purposes.
3.2.1. The LANG Qualifier
-------------------------
The `LANG' qualifier indicates the language of the content of the
element itself. For instances, if a `Description' element has a LANG
qualifier value of <de>, the description itself is in German.
Language qualifiers are not settable. For many elements, content is
described in formal structure such as a date field or a URL. For other
elements which use natural language (that is, "Title" and
"Description"), there is an implied LANG qualifier which is the same
as the setting of the Language element.[1]
[1] This restriction may be lifted at some point; for more details
see the "Language" element description.
3.2.2. The SCHEME Qualifier
---------------------------
The `SCHEME' qualifier indicates what notational scheme the content of
a given element is encoded in. Like all qualifiers, this qualifier is
not available to the maintainer for manipulation. There is only one
reasonable scheme for a given element in the Debian environment.
However, knowing the scheme for an element is important so you know
how the content of the element should be encoded.
The default scheme is generally `free text'. Other elements have a
scheme of `URL' or others, as described in section 3.3, `Metadata
Elements'.
3.3. Metadata Elements
----------------------
In Debian Dublin Core, certain elements are required, some are
optional, and some are ignored as insignificant. As a rule, the adage,
"be liberal in what you accept and conservative in what you emit"
applies to the system.
The following is a summary of the elements, which are described in
detail below:
* Required elements
* Identifier
* Title
* Subject
* Format
* Optional elements
* Description
* Language
* Creator
* Contributor
* Publisher
* Date
* Source
* Relation.IsFormatOf
* Relation.IsBasedOn
* Type
* Rights
* Ignored Elements
* Coverage
3.3.1. Required Elements
------------------------
These elements are required. Lacking these elements constitutes an
error which will cause install-docs to reject the entire entry.
Identifier
A URL used to uniquely identify the resource. Usually, the
resource a local file on the user's file system (which may or may
not be installed). In such cases, it would be beneficial for
maintainers to be able to refer to the resource using a URL
relative to a certain path. However, the actual path to be used
is under debate. There are two proposed solutions:
1. If the URL is a relative URL, it is relative to the location
of the packages documentation area. Namely, it is relative
to either `file://localhost/usr/share/doc/<package>' or
`file://localhost/usr/doc/<package>'.
2. If the URL is a relative URL, it is relative to the location
of the docreg file itself.
In order to resolve this system, the following scheme is
temporarily adopted. If the URL starts with `./' it is considered
to be relative to the position of the docreg file which contains
this entity. If the URL is a normal relative URL, it is
considered to be relative to the package documentation area as
described above. This scheme is a temporary comprise in order to
accommodate both sides of the debate; perhaps when we have actual
implementations in place, one or the other shall win out.
*Future directions.* We have perceived that it would be a good
thing for certain documents to be identifiable by tokens which
are less volatile than file names. Given this facility, our
internal documentation could have persistant inter-document
cross-references.
The IETF-blessed facility to accommodate this purpose is URNs.
URNs are unique tokens defined by a central authority (such as
the Debian Documentation Project) to which the organization have
made a long-term commitment to. For instance, the DDP might
decide to create a URN `debian-doc:policy' to represent the
Debian Policy document. To implement this system, we would need
to setup a central naming authority to coordinate and maintain
the Debian URN list. Associated with this list could be a set or
URLs and/or URCs, such as
"http://www.debian.org/debian-policy/index.html", mirrored
locations, and even "the file index.html in the documentation
area of the debian-policy package". Central, and centrally
distributed (i.e., packaged) CGI scripts could be provided to
dynamically interpret and support these URNs (i.e,. convert URNs
to URLs on the fly).
When and if this facility is in place, the Debian Metadata system
can be used to implement it and to support it. However, it has
been decided that the project should not at this time wait for
that facility.
SCHEME
URL
repeatable?
no
Title
The title for the document, usually only a single line. If the
document does not have a title, formulate the title as if it is
the short selectable string of an HREF. The language that this
field is expressed in must be the same as the language indicated
in the "Language" element.
SCHEME
free text
repeatable?
no
Subject
Where this document is situated in the subject catalog. A subject
catalog is a way of hierarchically organizing documents based on
the subject matter covered by the document. For Debian, this
Subject Catalog is the *Debian Document Hierarchy*, or DDH for
short. See the Debian Documentation Hierarchy manual for
specifics.
SCHEME
Debian, indicating the Debian Document Hierarchy
repeatable?
yes
Format
The format of the document, indicated as a MIME type, for
example, `text/html'.
SCHEME
RFC 1522 etc (MIME)
repeatable?
no
3.3.2. Optional Elements
------------------------
These elements are optional. The content of these elements are
captured by the system and should be displayed to the user by some
means.
Description
A description, or abstract, for the resource. This gives the user
more information about the resource, so that they are able to
decide whether it contains the information they are looking for.
The language that this field is expressed in must be the same as
the language indicated in the "Language" element.
*Future directions.* We may wish to define a subset of HTML
elements to allow in the content of this element. For instance:
`<bf>', `<em>', `<tt>', `<a href=...>', `<code>', `<p>', `<var>',
SCHEME
free text
repeatable?
no
Language
The language of the intellectual content of the resource. If this
element is not present, it defaults to `en', for English.
SCHEME
RFC 1766
repeatable?
no
example
`de'
Creator
The person or organization primarily responsible for creating the
intellectual content of the resource. For example, authors in the
case of written documents, artists, photographers, or
illustrators in the case of visual resources.
SCHEME
free text, or RFC 822 Address specification
repeatable?
yes
example
`A. P. Harris <aph@debian.org>'
Contributor
Contributor to a document. For our purposes, this should only be
used to indicate the translator of a document. Multiple authors
for a document should simply use multiple Creator elements.
SCHEME
free text, or email
repeatable?
yes
example
`A. P. Harris <aph@debian.org>'
Publisher
The element responsible for making the resource available in its
present form, such as a publishing house, a university
department, or a corporate entity.
SCHEME
free text, or email
repeatable?
yes
Date
A date associated with the resource. For our purposes, this
should indicated the last modification date of a resource.
SCHEME
ISO 8601 Profile, found at
http://www.w3.org/TR/NOTE-datetime-970915
repeatable?
no
example
recommended to use only year-month-day granularity such as
`1997-11-05' or `1997-11'; more granular formats such as
`1997-07-16T19:20:30+01:00' are also available.
Source
Upstream location where a document originated. Generally this is
a web site maintained by the document author, or the URL for a
canonical upstream archive such as Sunsite.
SCHEME
URL
repeatable?
yes
example
`http://sunsite.unc.edu/mdw/HOWTOs/FOOBAR.html'
Relation.IsFormatOf, Relation.IsBasedOn
Indicates a relationship to another resource. The content of this
field is the URL to the resource related to, as in the Identifier
element. Relation.IsFormatOf indicates a format of the resource
indicated in the content of this element, i.e., an HTML or ASCII
version of an SGML file. Relation.IsBasedOn is used to indicated
translations based on another document. Note that it is *not* an
error for the content's URL to not exist on the users filesystem.
*Future directions.* We ought to define a nice standard way to
refer to files from other packages, i.e., file `index.html' from
the documentation area of the package `foobar'.
SCHEME
URL
repeatable?
no
example
`FAQ/Linux-FAQ'
Type
The category of the resource, describing what sort of resource it
is. Resource types are orthogonal to both the `Subject' and
`Format' elements. So, the `Type' element can be thought of as
the generic *class* of resources, which is not related to its
particular media or subject matter. The value you select for this
element should be selected from the following list if
appropriate:
* howto
* faq
* manual *(i.e., manual for software)*
* reference
* specification *(i.e., a Policy document)*
* tutorial
* figure
* homepage
* glossary
* collection
* discussion group *(i.e., a mailing list or Usenet group, or
archives of the same)*
* package *(for future use, representing Debian package
metadata)*
If the Type you are looking for is not on the list, but you feel
it should be, please email <doc-base@packages.debian.org>.
SCHEME
Debian type
repeatable?
yes
example
`howto'
Rights
An identifier that links to a rights management statement, such
as acceptable terms of use, the GPL, etc.
SCHEME
URL
repeatable?
yes
example
`copyright/GPL'
3.3.3. Ignored Elements
-----------------------
The following elements are ignored. They are mentioned here because
these fields are part of standard Dublin Core; they may some day
become supported.
Coverage
The spatial or temporal characteristics of the intellectual
content of the resource.
-------------------------------------------------------------------------------
4. docreg File Format
---------------------
The docreg file is the medium for the transmission of document
metadata information to the local Document Store. As such, it is the
package maintainer's way of attaching metadata to documents included
in their package, and ensuring that metadata is available to the user
who installed the package.
The docreg file is used in combination with install-docs as the
complete interface that a document-provide package needs to worry
about. End users need not be aware of docreg files at all; they are
not end-user editable.
4.1. Design Rationale and Goals
-------------------------------
The docreg file is meant to be an easy, familiar mechanism for busy
package maintainers. It uses a syntax similar to `control' files
already used by package maintainers, namely an RFC-822 complaint
syntax.
The docreg file format has the following design goals:
* Adherence to recognized metadata standards, namely, the Dublin
Core (http://purl.oclc.org/metadata/dublin_core/) element set.
* Easy to use for package maintainers; uses a very simple data
model.
* Language-independent syntax, allowing for indication of the
language of the document, as well as indication of the language
of the metadata.
* Allow for flexibility and inter-relationships between documents
without imposing any dependency or entity modeling complexity.
4.2. How To Use the docreg File
-------------------------------
The docreg file itself is the file used by package maintainers to
register documents into the Debian Document Registry. The doc-base
packaging system (specifically the install-docs program) is
responsible for processing the docreg file and adding the document's
meta-information contained in the docreg file to the system's local
Document Store.
Document metadata is all the information contained in the Debian
Document Registry for a file. The composition of this metadata is
directly related to the docreg file, since the docreg file is the sole
transmitter of document metadata into the registry (via install-docs).
While it is easy to confuse the difference between the document
metadata and the docreg file, there is a distinction.
A docreg file may contain one or more *metadata entities*, as
described in section 3.1, `Metadata Entities'. To extend the paradigm
from that section, documents are the books in a library, metadata
entities are the cards in the card catalog, and docreg files are
simply bundles of one or more card catalog cards which are delivered
to the library.
While each metadata entity refers to one and only one resource (local
or otherwise), it does not follow that each resource has one and only
one bit of metadata.[1] It is possible, although unusual, that a
resource may have more than one metadata entity referring to it.
[1] It follows that the Identifier for a metadata entity is not
necessarily a unique identifier for that entity.
Documents can relate to one another in various ways. For instance, a
document might be a specially formatted version of another source
document (the "IsFormatOf" relation). A document might be a
translation of another document into a new language ("IsBasedOn"), or,
more obscurely, a version of the work, perhaps interesting for
historical purposes ("IsVersionOf"). Relationships between documents
do not require actual package dependencies, however.
4.2.1. Where To Put the docreg File
-----------------------------------
docreg files are under package maintainer control; they are never
altered by the Debian documentation system as a whole. The files
should be installed and removed by the package itself using the
standard means. The file may be automatically generated at the package
maintainers discretion, however, it may not be altered after
install-docs has run.
At the convenience of the package maintainer, it is allowable to use
more than one docreg file per package.
docreg files may be placed in any location. It is suggested docreg
files are placed in the directory containing the resource they
describe. Moreover, the file should have `.docreg' as their suffix.
However, maintainers may name or place documents whereever they wish.
Alternatively, some have suggested `/usr/share/doc-base/docreg/'
subdirectory. In which case, the docreg file should be named the same
as the package, or prefixed with the same, i.e.,
`/usr/share/doc-base/docreg/debian-policy'. Whatever the file name,
the names must be globally unique across all packages. Prefixing them
with the package name helps ensure against collisions.
4.3. docreg File Format
-----------------------
The format of the docreg file borrows from the Debian control file
format, which borrows from RFC 822.
First, some terminology. docreg files are composed of one or more
metadata entities, where each entity describes a single document (URL,
actually a file on disk). Metadata entities are composed of elements,
or fields, which includes required elements, optional elements, and
ignored elements. These elements are treated in depth in chapter 3,
`Debian Metadata Elements'.
Elements are lines composed of a label (that is, the name of the
element), a colon (`:'), one or more optional qualifiers in
parentheses, and finally the contents of the element. Elements are
composed of elements separated by an empty line, or the top or bottom
of the file. These controlled vocabularies are specified by the built
in implied `SCHEME', which is described in subsection 3.2.2, `The
SCHEME Qualifier'.
Any element's contents may continue into multiple lines, but
continuation lines must be indented from the left margin; this is
called "folding". In some cases the contents are restricted to a
controlled vocabulary, such as a URL, or a single value from a domain
of possible values.
An augmented BNF description of the file format, probably only of
interest to implementors, can be found in section 6.2, `Augmented BNF
Description for docreg Files'.
4.3.1. Example Files
--------------------
The following is an example for the current document. There are three
formats provided: SGML, ASCII, and HTML.
Identifier: debian-metadata/debian-metadata.sgml
Format: text/sgml
Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification
Identifier: debian-metadata/debian-metadata.html/index.html
Format: text/html
Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification
Identifier: debian-metadata/debian-metadata.text
Format: text/plain
Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification
As the reader can see, there is a lot of repetition between the
different elements. Therefore, it is suggested that docreg files take
advantage of a preprocessor, such as m4. Here is a much shorter
version of the docreg file, which is processed by m4 to make the above
entries:
changequote([, ])dnl
define([common_elements], [Title: Debian Metadata Manual
Subject: debian/policy
Description: This manual contains a guide and a reference to the
Debian Metadata Project. The Project's purpose, and the purpose of
this document, is to outline a set of metadata elements, to specify
an interface for package maintainers use in order to provide
metadata about resources in their packages, to specify a unified
subject catalog for categorizing metadata, and to specify an API for
developers who wish to make use of a system's metadata.
Language: en
Creator: A. P. Harris <aph@debian.org>
Creator: Marcus Brinkmann <Marcus.Brinkmann@ruhr-uni-bochum.de>
Date: 1998-06-29
Rights: copyright/GPL
Type: specification])dnl
Identifier: debian-metadata/debian-metadata.sgml
Format: text/sgml
common_elements
Identifier: debian-metadata/debian-metadata.html/index.html
Format: text/html
common_elements
Identifier: debian-metadata/debian-metadata.text
Format: text/plain
common_elements
4.3.2. Field Sizes
------------------
Field size limits are imposed on fields in order to facilitate a
straight-forward database-driven storage system (not yet in place).
Identifier 256
Title 80
Subject 160 (multiple elements combined)
Format 40
Description 512
Language 2
Creator 200 (multiple elements combined)
Contributor 200 (multiple elements combined)
Publisher 200 (multiple elements combined)
Date 10
Source 256 (multiple elements combined)
Relation.IsFormatOf 256
Relation.IsBasedOn 256
Type 80 (multiple elements combined)
Rights 256
Coverage 80
-------------------------------------------------------------------------------
5. Tools for Maintainers
------------------------
Maintainer tools are applications which are available to maintainers
to assist them in managing metadata which they are responsible for.
This chapter contains an overview of the available tools; for full
information about these applications, please see the manual pages
provided for the programs.
5.1. install-docs -- metadata installation and removal
------------------------------------------------------
install-docs is used from the package maintainer scripts to install or
remove a docreg file from the local store of registered metadata.
*Examples of how to invoke from maintainer scripts.*
Metadata integrators can extend install-docs functionality by using
the techniques described in section 6.3, `Hooking Into install-docs '.
It is part of the standard doc-base package.
5.2. validate-docreg -- metadata validation for maintainers
-----------------------------------------------------------
More extensive validation can be used by package maintainers to ensure
that their metadata is well formed. Examples of the validation done:
* Relations actually exist (would depend whether the package
related to is installed).
* Translations of a document use same subject as the metadata of
what they've translated from.
* Validate fields, such as date, language, etc.
*Examples of how to invoke from debian/rules.*
This utility is part of the doc-base-dev package.
5.3. html2docreg -- convert HTML files to docreg
------------------------------------------------
Converts standard Dublin Core HTML `META' tags into docreg syntax.
Supports both HTML v3 and v4 META syntax.
This utility is part of the doc-base-dev package.
5.4. docreg2html -- convert docreg files to HTML
------------------------------------------------
Converts standard docreg files to Dublin Core standard HTML `META'
elements. Can switch between HTML v3 and v4 META syntax. Qualifiers
are added automatically.
This utility is part of the doc-base-dev package.
-------------------------------------------------------------------------------
6. Debian Metadata for Implementors
-----------------------------------
This chapter is for those who are implementing interfaces or
extensions to the Debian Metadata infrastructure.
6.1. Tracking Registered docreg Files
-------------------------------------
Currently, the only means by which an implementor can access metadata
is through the docreg files themselves.
The Debian Metadata project feels that direct access to docreg files
is a temporary state of affairs, since direct access to docreg files
will ultimately place too many constraints on the file format and
contents. For instance, moving to XML, or offering multiple docreg
file format, would be doubly difficult to implement. Abstraction is
needed between the "container" file (transmitting the information) and
the "store" of locally known metadata. However, delaying
implementation until we have a local storage system is not acceptable,
since we want to get the system to be actually used in the world
before investing too deeply in an infrastructure.
Therefore, for now, implementors should use the file
`/var/state/doc-base/registered-docreg-files' to discover the list of
installed docreg files.
6.2. Augmented BNF Description for docreg Files
-----------------------------------------------
The following description uses augmented BNF as defined in RFC 822.
This standard meta-format lets us define the docreg format without
ambiguity. See also RFC 2068 for a description and example of
augmented BNF.
6.2.1. Basic Rules
-------------------
The following rules define fundamental building blocks used in the
rest of this specification.
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
ISOCHAR = <any ISO-8859-1 character>
CTL = <any ASCII control ; ( 0- 37, 0.- 31.)
character and DEL> ; ( 177, 127.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
SPACE = <ASCII SP, space> ; ( 40, 32.)
HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.)
LWSP-char = SPACE / HTAB ; semantics = SPACE
linear-white-space = 1*([LF] LWSP-char) ; semantics = SPACE
; LF => folding
specials = "(" / ")" / "<" / ">" / "@"
/ "," / ";" / ":" / "\" / <">
/ "." / "[" / "]" "="
atom =1*<any CHAR except specials, SPACE and CTLs>
; control fields
ctext = *<any ISOCHAR excluding "(", ; field contents
")", "\" & CR, & including
linear-white-space>
end-of-rec = < 2*LF or end of file >
6.2.2. Field Definitions
-------------------------
Field semantics are the same as defined as "Header Field Definitions"
in RFC 822 Section 3.1, with the exception that rather than CRLF we
use the standard Unix line separator, LF. Long header fields are
likewise supported, as specified in RFC 822 Section 3.1.1.
The following is the BNF composition of docreg fields syntax.
field = field-name ":" [*field-qualifier]
\ field-body LF LF
field-name = *atom
field-body = field-body-contents
[LF LWSP-char field-body] ; folding
field-body-contents = *ctext
field-qualifier = "(" *atom "=" *atom ")"
`field-names' are not case-sensitive. Both `field-names' and
`field-qualifier' are further constrained to the set of allowable
values. Furthermore, in some cases, `field-contents' are constrained
based on their qualifiers. For instance, a qualifier of `SCHEME=URL'
would indicate that the contents should be a valid URL.
For clarifications on the way that fields are composed, refer to RFC
822.[1]
[1] Please email me with any corrections or clarifications.
6.2.3. docreg Definition
-------------------------
docreg files contain any number of metadata sets.
docreg-file = *metadata-set
metadata-set = *fields end-of-rec
6.3. Hooking Into install-docs
-------------------------------
This section specifies a proposed method of allowing packages to hook
into install-docs invocations. It is not yet decided whether this
functionality is necessary. If you are a metadata implementor, and you
find that you do need this functionality, or find that this
functionality is not sufficient for your needs, then please email
<doc-base@packages.debian.org>.
Metadata implementors can *hook* into state changes in metadata by
providing scripts in `/usr/share/doc-base/methods/' which are
executable. These hooks must be platform-independant (or else maybe we
should move this to `/usr/lib/doc-base/methods'). Most likely they
will be wrappers around the actual programs.
The following states have hooks; the hooks are indicated by the first
argument passed to the scripts.
install <docreg_file>
call to register or re-register the docreg file located at
<docreg_file>
remove <docreg_file>
call to unregister the docreg file located at <docreg_file>
rebuild
rebuild local caches of metadata; essentially this implies
clearing out all data and reinstalling each of the registered
docreg files
-------------------------------------------------------------------------------
Debian Metadata Project
Adam P. Harris <aph@debian.org>, The Debian-Doc List
<debian-doc@lists.debian.org> - version 0.8.0, Sat, 18 Jul 1998
19:23:00 -0400
--
To UNSUBSCRIBE, email to debian-doc-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Reply to: