[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

ELIXIR tools registry participation - richer metadata for our packages



Dear all,

This is Steffen and Tim sharing a desk at the Copenhagen Hackathon
of the ELIXIR-DK Catalog of resources in computational biology.
There is a general excitement of the collection of tools that
are associated with Debian Med and its derivatives and we are
here to

 * help the ELIXIR folks to fill their database
   - avoid redundancies
   - render the catalog immediately functional to see Debian packages
 * help ourselves
   - gain extra expressiveness in our own descriptions
     o by adopting the EDAM ontology [1]
     o to have separate annotations for the packages as a whole
       and individual tools (selected binaries in /usr/bin)
   - have some extra visibility
   - find additional users (bare metal and virtualised)
   - explain to the world how inviting Debian is to have one's
     software redistributed

Catalog entries are meant to be provided by the maintainers of
the software tools in the ELIXIR network. For resources (binaries)
provided through the Linux distros, we could certainly just fall
back to the information we already have, but we would like you
(this list) to comment on the extension of the
  debian/upstream/metadata
file to accommodate also structured references to semantical
catalogs like the EDAM ontology. There is a related effort by
Matus to annotate the DebTags. The format we think about is like

Ontology: http://prefix.of.ontology.org
 feature_name: ontological_description_of_that_feature
 another_feature: id<blank>human_readable
 scope: <list of binaries> | summary
  feature_name: ...
  another_feature: ...

The features may differ between ontologies. We had a look at bowtie to see how it goes and we ended up with:

Ontology: http://edamontology.org
 topic: topic_0622 Genomics
 scope: summary
   function: operation_3212 Genome indexing (Burrows-Wheeler)
   function: operation_0292 Sequence alignment generation
   input: data_2975 Nucleic acid sequence (raw)
   input: format_1929 FASTA
   input: format_1930 FASTQ
   output: data_1383 Sequence alignment (nucleic acid)
   output: format_2573 SAM
 scope: bowtie-build
   function: operation_3212 Genome indexing (Burrows-Wheeler)
   input: data_2975 Nucleic acid sequence (raw)
   input: format_1929 FASTA
   output: data_3210 Genome index
   output: ??? Bowtie index format EBWT
   output: ??? Bowtie long index format EBTWL
 scope: bowtie-inspect
   function: operation_1813 Sequence retrieval
   function: operation_0304 Metadata retrieval
   function: operation_0228 Data index analysis
   input: data_3210 Genome index
   input: ??? Bowtie index format EBWT
   input: ??? Bowtie long index format EBTWL
   output: data_2975 Nucleic acid sequence (raw)
   output: format_1929 FASTA
   output: format_1964 plain text format (unformatted)
 scope: bowtie
   function: operation_0350 Sequence database search (by sequence using word-based methods)
   function: operation_0292 Sequence alignment generation
   input: data_3210 Genome index
   input: data_2975 Nucleic acid sequence (raw)
   input: format_1964 plain text format (unformatted)
   input: format_1929 FASTA
   input: format_1930 FASTQ
   output: data_1383 Sequence alignment (nucleic acid)
   output: data_0867 Sequence alignment report
   output: format_2573 SAM
   output: ??? Bowtie alignment report format

or if we want to reduce the level of detail to just the summary this could be compressed to:

Ontology: http://edamontology.org
 topic: topic_0622 Genomics
 function: operation_3212 Genome indexing (Burrows-Wheeler)
 function: operation_0292 Sequence alignment generation
 input: data_2975 Nucleic acid sequence (raw)
 input: format_1929 FASTA
 input: format_1930 FASTQ
 output: data_1383 Sequence alignment (nucleic acid)
 output: format_2573 SAM

If the list likes this approach, then we can continue annotating a bit more and amend our task pages for it all.
Some tools and suites (eg. EMBOSS) have existing annotations from other projects that we can inherit.
We are not yet confident about what this effectively means e.g. for the Ultimate Debian Database. @Charles, can you
direct us, please?

Best regards from Copenhagen

Steffen and Tim

[1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM

This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.


Reply to: