Re: Advocating the use of RDF for Debian's published metadata - Was: Re: Proposal for additional metadata in Debian archives (DEP-11)
2011/10/17 Olivier Berger <email@example.com>:
> I'm not subscribed to ftpmasters, so feel free to CC me in response, and
> I hope firstname.lastname@example.org is the proper place it will be debated, then (even
> thouh -project already holds some bits too).
Unfortunately, ftpmasters don't have a mailinglist - I hope it is okay
to CC ftpmasters, if this is *not* okay, please write a short mail. -
I haven't got any reply from ftpmasters on this yet.
> Le vendredi 14 octobre 2011 à 19:08 +0200, Matthias Klumpp a écrit :
>> AppStream features XML to store metadata. Because we don't use XML
>> somewhere in Debian, DEP-11 features a well-known RFC822-style format.
> May I suggest to implement some (standardized) variant of RDF  to
> represent this meta-data ?
> I think it would help here, to adopt standards for more interoperability
> of Debian's metadata with others'.
> The "package metadata" could even be delivered on the Web of Data
> (Linked Open Data), right from the Debian servers, to allow any
> application to be created, that would consume such metadata.
> If RDF/XML (as seems to be proposed by SPDX, to be verified once the
> Linux Foundation site is back) is not suitable, then another format
> would be great as long as it relies on some explicit prefix+suffix
> combination, in order to allow for extensibility, for instance some JSON
> variant of RDF like Turtle .
I would like this very much - the proposal is extensible too, but it
also has a few limitations, if someone decides to extend it in futur.
The reason to propose a RFC822-style format for this data is, that
this format is already well-known inside Debian and we have nothing
else using XML yet. Because RFC is already used widely, it should be
easier to implement for ftpmasters.
RDF would work too, as long as it stores the same information as
described in the DEP-11 proposal. (But as far as I can see, it was
designed to do that)
> If a package can both be described with some generic purpose
> "ontology"/standard/schema (for instance the one you envisioned
> initially in DEP 11), and also, depending on context (embedded or
> science, for instance) with another set of metadata (spdx or whatever
> else), you'd be able to mix in the same file, metadata relating to
> different contexts.
This sounds like an overkill to me... Better pick only one format for
that instead of mixing stuff.
> Still, I'm not sure RFC822-style is perfectly compliant with the habit
> of RDF to separate prefix and suffix with a column character ':'. Maybe
> '_' could act as such a separator (must say I haven't checked the RFC
> for allowed tokens in the grammar) ?
We don't have prefix/suffix yet, because we haven't seen a need for it...
> Let's try with an example (btw, the DEP
> http://wiki.debian.org/AppStreamDebianProposal *lacks* examples IMHO) :
Right, maybe I should add one soon :P
> In turtle representation format for RDF, one would have a document that
> looks like this :
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
> @prefix dep11: <http://www.debian.org/whatever/dep11#>.
> @prefix debbugs: <http://www.debian.org/whatever/depxx#>.
> @prefix spdx: <http://spdx.org/ontology#>.
> a dep11:DebianPackage;
> dep11:application "Iceweasel";
> dep11:package "iceweasel";
> spdx:license "MPL-1.1"
> debbugs:bugs <http://bugs.debian.org/iceweasel>.
> (Maybe I didn't understand very well the Application and Package
> meanings in your DEP11 proposal, btw.)
This looks very clean and extensible :) A package is the thing you
install via Synaptic/apt-get/aptitude, while an application is
everything which has a desktop-file and appears in the application
menu. (at least that's how it is defined at time)
A "component" is something a package provides, e.g. a shared library.
E.g. package "libgee2" provides the component "libgee.so.2" of type
shared library. Same applies for Python-modules, Plasma-Engines,
GNOME-Shell extensions etc.
> Anyway, as you can see, here we could have several "domains" of metadata
> sources (ontologies / prefixes) to describe the same package combined in
> a single document.
> In RFC822-style, this could be something like :
> DEP11_Application: Iceweasel
> DEP11_Package: iceweasel
> spdx_license: MPL-1.1
> debbugs_bugs: http://bugs.debian.org/iceweasel
> But clearly, not reinventing the wheel should be a goal, and adopting
> existing standards for meta-data representation would be my choice, i.e.
> Semantic Web standards (namely RDF).
Agree. Your proposal looks very clean. But again the question is: Do
we want RDF in debian? This is mainly a policy-decision and has
nothing to do with technical details.
> Again, in case you'd doubt it, RDF is just a model, which can be written
> in a number of different formats (not only XML), but the key here is the
> embedded identification of the reference of the ontologies/prefixes
> which render the documents self described and extensible, out of the
For us, it is necessary that APT can process this data (will be
implemented if DEP-11 can make it) and that parts of it can be written
into a Xapian-DB for fast searching. - Both would work perfectly well
with any format.
It would be very nice, if ftpmasters could tell if they would accept a
new format in the archive or if we should stay with RFC822 which is
used for nearly everything else already.
> Note that the same rationale stands for all metadata to be eventually
> published on the Web by Debian servers.
> Hope this helps.
Thank you for the information... I think RDF would be much more "open"
for other people and apps to use, as the
data wouldn't be in a Debian-specific format. (I can't imagine yet
what others would do with this data, but if more people would use RDF,
e.g. other distributors too, having it all in one standardized and
extensible format would be something valuable)
>  http://www.w3.org/RDF/
>  http://www.w3.org/TeamSubmission/turtle/