Re: Advocating the use of RDF for Debian's published metadata
>>>>> Matthias Klumpp <firstname.lastname@example.org> writes:
> It would be very nice, if ftpmasters could tell if they would accept
> a new format in the archive or if we should stay with RFC822 which is
> used for nearly everything else already.
>> Note that the same rationale stands for all metadata to be
>> eventually published on the Web by Debian servers.
>> Hope this helps.
> Thank you for the information... I think RDF would be much more
> "open" for other people and apps to use, as the data wouldn't be in a
> Debian-specific format. (I can't imagine yet what others would do
> with this data, but if more people would use RDF, e.g. other
> distributors too, having it all in one standardized and extensible
> format would be something valuable)
Well, having this data aligned with the RDF model will help
interoperability, I guess.
One application I have in mind is that it becomes possible to
query the Debian Packages and Sources databases using the
powerful SPARQL language. In particular, one may quickly check
if there're any packages that are transitively dependent on A,
while also immediately dependent on B. (Yes, grep-dctrl(1)
helps, but it's not quite as powerful a language as the recent
edition of SPARQL, not to mention that it's yet another query
language to learn.)
However, I believe that it's infeasible to change the native
format the aforementioned databases, as both it isn't going to
be easy to implement, and it may bring considerable burden on
both the Debian users and maintainers.
Thus, my opinion is that there should be a tool performing
conversion from the Debian's native database format to some RDF
representation. In particular, rdfproc(1) could become such a
tool, provided that Raptor will be extended to parse RFC 822.
That being said, I don't see such a conversion as a simple and
straight-forward process. In particular, should a package
stanza be transformed into a named (as per Package:) or blank
node (with Package: as an explicit relation)? The Depends: a, b
and Depends: a | b may both have an RDF list in the object
position, but how to distinguish between them? And how a line
such as Depends: a (>> 0.1) should be expressed? Should the
package names be encoded as string literals, or should they be
transformed into URI's instead? There're quite a few choices to
be made by the one volunteering for this.
These questions were in my TODO list for some time (filed under
Category: nifty hack, as I'm yet to see any serious practical
uses for such a thing), but I'm short of spare time these days,
and won't probably be able to do much, apart from participation
in the discussions on this subject.
FSF associate member #7257