[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: ELIXIR tools registry participation - richer metadata for our packages



Hi,

I'd like to add to this old thread some results of a later meeting in
Copenhagen - our Debian Med sprint last week kindly sponsored by DTU.

I confirm that I took over edam data files in Debian Med packages into
UDD.  You can easily query all these data by a script I provided on
Github:

    https://github.com/bio-tools/biotoolsConnect/blob/master/DebianMed/edam.sh

Feel free to run this script on any Linux machine with a psql client.

In case you are lacking any such machine but you are a member of the
Debian Med team you can do

    rsync edam.sh alioth.debian.org:
    ssh alioth.debian.org
    ./edam.sh

I'd be happy if some of the EDAM people could confirm that this works
for them.

Kind regards

      Andreas.

On Thu, Nov 20, 2014 at 05:37:10PM +0000, Booth, Timothy G. wrote:
> Dear all,
> 
> This is Steffen and Tim sharing a desk at the Copenhagen Hackathon
> of the ELIXIR-DK Catalog of resources in computational biology.
> There is a general excitement of the collection of tools that
> are associated with Debian Med and its derivatives and we are
> here to
> 
>  * help the ELIXIR folks to fill their database
>    - avoid redundancies
>    - render the catalog immediately functional to see Debian packages
>  * help ourselves
>    - gain extra expressiveness in our own descriptions
>      o by adopting the EDAM ontology [1]
>      o to have separate annotations for the packages as a whole
>        and individual tools (selected binaries in /usr/bin)
>    - have some extra visibility
>    - find additional users (bare metal and virtualised)
>    - explain to the world how inviting Debian is to have one's
>      software redistributed
> 
> Catalog entries are meant to be provided by the maintainers of
> the software tools in the ELIXIR network. For resources (binaries)
> provided through the Linux distros, we could certainly just fall
> back to the information we already have, but we would like you
> (this list) to comment on the extension of the
>   debian/upstream/metadata
> file to accommodate also structured references to semantical
> catalogs like the EDAM ontology. There is a related effort by
> Matus to annotate the DebTags. The format we think about is like
> 
> Ontology: http://prefix.of.ontology.org
>  feature_name: ontological_description_of_that_feature
>  another_feature: id<blank>human_readable
>  scope: <list of binaries> | summary
>   feature_name: ...
>   another_feature: ...
> 
> The features may differ between ontologies. We had a look at bowtie to see how it goes and we ended up with:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  scope: summary
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    function: operation_0292 Sequence alignment generation
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: format_2573 SAM
>  scope: bowtie-build
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    output: data_3210 Genome index
>    output: ??? Bowtie index format EBWT
>    output: ??? Bowtie long index format EBTWL
>  scope: bowtie-inspect
>    function: operation_1813 Sequence retrieval
>    function: operation_0304 Metadata retrieval
>    function: operation_0228 Data index analysis
>    input: data_3210 Genome index
>    input: ??? Bowtie index format EBWT
>    input: ??? Bowtie long index format EBTWL
>    output: data_2975 Nucleic acid sequence (raw)
>    output: format_1929 FASTA
>    output: format_1964 plain text format (unformatted)
>  scope: bowtie
>    function: operation_0350 Sequence database search (by sequence using word-based methods)
>    function: operation_0292 Sequence alignment generation
>    input: data_3210 Genome index
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1964 plain text format (unformatted)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: data_0867 Sequence alignment report
>    output: format_2573 SAM
>    output: ??? Bowtie alignment report format
> 
> or if we want to reduce the level of detail to just the summary this could be compressed to:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  function: operation_3212 Genome indexing (Burrows-Wheeler)
>  function: operation_0292 Sequence alignment generation
>  input: data_2975 Nucleic acid sequence (raw)
>  input: format_1929 FASTA
>  input: format_1930 FASTQ
>  output: data_1383 Sequence alignment (nucleic acid)
>  output: format_2573 SAM
> 
> If the list likes this approach, then we can continue annotating a bit more and amend our task pages for it all.
> Some tools and suites (eg. EMBOSS) have existing annotations from other projects that we can inherit.
> We are not yet confident about what this effectively means e.g. for the Ultimate Debian Database. @Charles, can you
> direct us, please?
> 
> Best regards from Copenhagen
> 
> Steffen and Tim
> 
> [1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM
> 
> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
> 
> 
> --
> To UNSUBSCRIBE, email to debian-med-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> Archive: https://lists.debian.org/8C33D34D431A974EAEA913101DDA19440781235B42@nerckwmbc.ad.nerc.ac.uk
> 
> 

-- 
http://fam-tille.de


Reply to: