Re: ELIXIR tools registry participation - richer metadata for our packages

To: debian-med@lists.debian.org
Subject: Re: ELIXIR tools registry participation - richer metadata for our packages
From: Sascha Steinbiss <sascha@steinbiss.name>
Date: Fri, 21 Nov 2014 12:08:58 +0000
Message-id: <[🔎] 546F2B5A.7040509@steinbiss.name>
Reply-to: sascha@steinbiss.name
In-reply-to: <[🔎] 8C33D34D431A974EAEA913101DDA19440781235B42@nerckwmbc.ad.nerc.ac.uk>
References: <trinity-3c191cd1-e8aa-47e8-85f9-70a9681df86f-1416497366507@3capp-gmx-bs54> <[🔎] 8C33D34D431A974EAEA913101DDA19440781235B42@nerckwmbc.ad.nerc.ac.uk>

Dear Steffen and Tim,

sounds cool and useful to help people find the right tool for a job in
the long run! I for one would prefer the long representation, with one
entry per binary. If I can find the time, I would even help updating the
metadata files with this kind of information once the format of the
upstream/metadata file you are proposing is stable and documented.

Cheers
Sascha

On 20/11/2014 17:37, Booth, Timothy G. wrote:
> Dear all,
> 
> This is Steffen and Tim sharing a desk at the Copenhagen Hackathon
> of the ELIXIR-DK Catalog of resources in computational biology.
> There is a general excitement of the collection of tools that
> are associated with Debian Med and its derivatives and we are
> here to
> 
>  * help the ELIXIR folks to fill their database
>    - avoid redundancies
>    - render the catalog immediately functional to see Debian packages
>  * help ourselves
>    - gain extra expressiveness in our own descriptions
>      o by adopting the EDAM ontology [1]
>      o to have separate annotations for the packages as a whole
>        and individual tools (selected binaries in /usr/bin)
>    - have some extra visibility
>    - find additional users (bare metal and virtualised)
>    - explain to the world how inviting Debian is to have one's
>      software redistributed
> 
> Catalog entries are meant to be provided by the maintainers of
> the software tools in the ELIXIR network. For resources (binaries)
> provided through the Linux distros, we could certainly just fall
> back to the information we already have, but we would like you
> (this list) to comment on the extension of the
>   debian/upstream/metadata
> file to accommodate also structured references to semantical
> catalogs like the EDAM ontology. There is a related effort by
> Matus to annotate the DebTags. The format we think about is like
> 
> Ontology: http://prefix.of.ontology.org
>  feature_name: ontological_description_of_that_feature
>  another_feature: id<blank>human_readable
>  scope: <list of binaries> | summary
>   feature_name: ...
>   another_feature: ...
> 
> The features may differ between ontologies. We had a look at bowtie to see how it goes and we ended up with:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  scope: summary
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    function: operation_0292 Sequence alignment generation
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: format_2573 SAM
>  scope: bowtie-build
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    output: data_3210 Genome index
>    output: ??? Bowtie index format EBWT
>    output: ??? Bowtie long index format EBTWL
>  scope: bowtie-inspect
>    function: operation_1813 Sequence retrieval
>    function: operation_0304 Metadata retrieval
>    function: operation_0228 Data index analysis
>    input: data_3210 Genome index
>    input: ??? Bowtie index format EBWT
>    input: ??? Bowtie long index format EBTWL
>    output: data_2975 Nucleic acid sequence (raw)
>    output: format_1929 FASTA
>    output: format_1964 plain text format (unformatted)
>  scope: bowtie
>    function: operation_0350 Sequence database search (by sequence using word-based methods)
>    function: operation_0292 Sequence alignment generation
>    input: data_3210 Genome index
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1964 plain text format (unformatted)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: data_0867 Sequence alignment report
>    output: format_2573 SAM
>    output: ??? Bowtie alignment report format
> 
> or if we want to reduce the level of detail to just the summary this could be compressed to:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  function: operation_3212 Genome indexing (Burrows-Wheeler)
>  function: operation_0292 Sequence alignment generation
>  input: data_2975 Nucleic acid sequence (raw)
>  input: format_1929 FASTA
>  input: format_1930 FASTQ
>  output: data_1383 Sequence alignment (nucleic acid)
>  output: format_2573 SAM
> 
> If the list likes this approach, then we can continue annotating a bit more and amend our task pages for it all.
> Some tools and suites (eg. EMBOSS) have existing annotations from other projects that we can inherit.
> We are not yet confident about what this effectively means e.g. for the Ultimate Debian Database. @Charles, can you
> direct us, please?
> 
> Best regards from Copenhagen
> 
> Steffen and Tim
> 
> [1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM
> 
> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.

Reply to:

References:
- ELIXIR tools registry participation - richer metadata for our packages
  - From: "Booth, Timothy G." <tbooth@ceh.ac.uk>

Prev by Date: Re: kmc and fastaq - New upstream release
Next by Date: [iva] - Python 3 code depends on pysam
Previous by thread: ELIXIR tools registry participation - richer metadata for our packages
Next by thread: Re: ELIXIR tools registry participation - richer metadata for our packages
Index(es):
- Date
- Thread