[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#749416: RFP: libhdt-it -- Library for RDF HDT file manipulation



Package: wnpp
Severity: wishlist

* Package name    : libhdt-it
  Version         : 1.0rc1
* URL             : http://www.rdfhdt.org/
* License         : LGPL
  Programming Lang: C++
  Description     : Library for RDF HDT file manipulation


RDF HDT (Header, Dictionary, Triples) is a compact data structure and
binary serialization format for RDF that keeps big datasets compressed
to save space while maintaining search and browse operations.

This is achieved by organizing and representing the RDF graph in terms
of three main components: Header, Dictionary and Triples
structure. The Header includes extensible metadata required to
describe the RDF data set and its organization. The Dictionary gathers
all the terms present in the RDF graph in a manner that permits rapid
search and high levels of compression. The Triples component
represents the structure of relationships of the RDF graph in a
compressed form.

It has previously been discussed whether HDT makes sense as just
another serialization format, in which performance is a relatively
minor issue, or to use it to store RDF data and query it directly.

Recent work by Ruben Verborgh et al on http://linkeddatafragments.org/
makes a convincing case for the latter, in which case the C++
libraries would be excellent to have in Debian.

Cheers,

Kjetil


Reply to: