ELIXIR tools registry participation - richer metadata for our packages
Dear all,
This is Steffen and Tim sharing a desk at the Copenhagen Hackathon
of the ELIXIR-DK Catalog of resources in computational biology.
There is a general excitement of the collection of tools that
are associated with Debian Med and its derivatives and we are
here to
* help the ELIXIR folks to fill their database
- avoid redundancies
- render the catalog immediately functional to see Debian packages
* help ourselves
- gain extra expressiveness in our own descriptions
o by adopting the EDAM ontology [1]
o to have separate annotations for the packages as a whole
and individual tools (selected binaries in /usr/bin)
- have some extra visibility
- find additional users (bare metal and virtualised)
- explain to the world how inviting Debian is to have one's
software redistributed
Catalog entries are meant to be provided by the maintainers of
the software tools in the ELIXIR network. For resources (binaries)
provided through the Linux distros, we could certainly just fall
back to the information we already have, but we would like you
(this list) to comment on the extension of the
debian/upstream/metadata
file to accommodate also structured references to semantical
catalogs like the EDAM ontology. There is a related effort by
Matus to annotate the DebTags. The format we think about is like
Ontology: http://prefix.of.ontology.org
feature_name: ontological_description_of_that_feature
another_feature: id<blank>human_readable
scope: <list of binaries> | summary
feature_name: ...
another_feature: ...
The features may differ between ontologies. We had a look at bowtie to see how it goes and we ended up with:
Ontology: http://edamontology.org
topic: topic_0622 Genomics
scope: summary
function: operation_3212 Genome indexing (Burrows-Wheeler)
function: operation_0292 Sequence alignment generation
input: data_2975 Nucleic acid sequence (raw)
input: format_1929 FASTA
input: format_1930 FASTQ
output: data_1383 Sequence alignment (nucleic acid)
output: format_2573 SAM
scope: bowtie-build
function: operation_3212 Genome indexing (Burrows-Wheeler)
input: data_2975 Nucleic acid sequence (raw)
input: format_1929 FASTA
output: data_3210 Genome index
output: ??? Bowtie index format EBWT
output: ??? Bowtie long index format EBTWL
scope: bowtie-inspect
function: operation_1813 Sequence retrieval
function: operation_0304 Metadata retrieval
function: operation_0228 Data index analysis
input: data_3210 Genome index
input: ??? Bowtie index format EBWT
input: ??? Bowtie long index format EBTWL
output: data_2975 Nucleic acid sequence (raw)
output: format_1929 FASTA
output: format_1964 plain text format (unformatted)
scope: bowtie
function: operation_0350 Sequence database search (by sequence using word-based methods)
function: operation_0292 Sequence alignment generation
input: data_3210 Genome index
input: data_2975 Nucleic acid sequence (raw)
input: format_1964 plain text format (unformatted)
input: format_1929 FASTA
input: format_1930 FASTQ
output: data_1383 Sequence alignment (nucleic acid)
output: data_0867 Sequence alignment report
output: format_2573 SAM
output: ??? Bowtie alignment report format
or if we want to reduce the level of detail to just the summary this could be compressed to:
Ontology: http://edamontology.org
topic: topic_0622 Genomics
function: operation_3212 Genome indexing (Burrows-Wheeler)
function: operation_0292 Sequence alignment generation
input: data_2975 Nucleic acid sequence (raw)
input: format_1929 FASTA
input: format_1930 FASTQ
output: data_1383 Sequence alignment (nucleic acid)
output: format_2573 SAM
If the list likes this approach, then we can continue annotating a bit more and amend our task pages for it all.
Some tools and suites (eg. EMBOSS) have existing annotations from other projects that we can inherit.
We are not yet confident about what this effectively means e.g. for the Ultimate Debian Database. @Charles, can you
direct us, please?
Best regards from Copenhagen
Steffen and Tim
[1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM
This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
Reply to: