[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Advocating the use of RDF for Debian's published metadata - Was: Re: Proposal for additional metadata in Debian archives (DEP-11)



Hi.

I'm not subscribed to ftpmasters, so feel free to CC me in response, and
I hope d-d@l.d.o is the proper place it will be debated, then (even
thouh -project already holds some bits too).

Le vendredi 14 octobre 2011 à 19:08 +0200, Matthias Klumpp a écrit :

> AppStream features XML to store metadata. Because we don't use XML
> somewhere in Debian, DEP-11 features a well-known RFC822-style format.

May I suggest to implement some (standardized) variant of RDF [0] to
represent this meta-data ? 


I think it would help here, to adopt standards for more interoperability
of Debian's metadata with others'. 
The "package metadata" could even be delivered on the Web of Data
(Linked Open Data), right from the Debian servers, to allow any
application to be created, that would consume such metadata.

If RDF/XML (as seems to be proposed by SPDX, to be verified once the
Linux Foundation site is back) is not suitable, then another format
would be great as long as it relies on some explicit prefix+suffix
combination, in order to allow for extensibility, for instance some JSON
variant of RDF like Turtle [1].

If a package can both be described with some generic purpose
"ontology"/standard/schema (for instance the one you envisioned
initially in DEP 11), and also, depending on context (embedded or
science, for instance) with another set of metadata (spdx or whatever
else), you'd be able to mix in the same file, metadata relating to
different contexts.

Still, I'm not sure RFC822-style is perfectly compliant with the habit
of RDF to separate prefix and suffix with a column character ':'. Maybe
'_' could act as such a separator (must say I haven't checked the RFC
for allowed tokens in the grammar) ?

Let's try with an example (btw, the DEP
http://wiki.debian.org/AppStreamDebianProposal *lacks* examples IMHO) :

In turtle representation format for RDF, one would have a document that
looks like this :
        @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
        @prefix dep11: <http://www.debian.org/whatever/dep11#>.
        @prefix debbugs: <http://www.debian.org/whatever/depxx#>.
        @prefix spdx: <http://spdx.org/ontology#>.
        
        <http://packages.qa.debian.org/iceweasel> 
          a dep11:DebianPackage;
          dep11:application "Iceweasel";
          dep11:package "iceweasel";
          spdx:license "MPL-1.1"
          debbugs:bugs <http://bugs.debian.org/iceweasel>.

(Maybe I didn't understand very well the Application and Package
meanings in your DEP11 proposal, btw.)

Anyway, as you can see, here we could have several "domains" of metadata
sources (ontologies / prefixes) to describe the same package combined in
a single document.
        
In RFC822-style, this could be something like :

DEP11_Application: Iceweasel
DEP11_Package: iceweasel
spdx_license: MPL-1.1
debbugs_bugs: http://bugs.debian.org/iceweasel

etc.

But clearly, not reinventing the wheel should be a goal, and adopting
existing standards for meta-data representation would be my choice, i.e.
Semantic Web standards (namely RDF).


Of course, translators from/to different syntaxes will be trivial to
develop, but if, from the source, a proper standard is used, it can be
readily delivered to the Web without any transformation needed. Such an
approach (often called Linked Data), clearly favors interoperability
(more at http://linkeddata.org/guides-and-tutorials if I failed to make
my point).


Again, in case you'd doubt it, RDF is just a model, which can be written
in a number of different formats (not only XML), but the key here is the
embedded identification of the reference of the ontologies/prefixes
which render the documents self described and extensible, out of the
box.

Note that the same rationale stands for all metadata to be eventually
published on the Web by Debian servers.

Hope this helps.

Best regards,

[0] http://www.w3.org/RDF/
[1] http://www.w3.org/TeamSubmission/turtle/
-- 
Olivier BERGER <olivier.berger@it-sudparis.eu>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 2048R/5819D7E8
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


Reply to: