Aw: Re: debian med packages in the ELIXIR registry

To: "Hervé Ménager" <hmenager@pasteur.fr>
Cc: debian-med@lists.debian.org
Subject: Aw: Re: debian med packages in the ELIXIR registry
From: "Steffen Möller" <steffen_moeller@gmx.de>
Date: Wed, 4 Feb 2015 14:22:42 +0100
Message-id: <[🔎] trinity-24ceebbe-c21e-429c-9fde-c4bfc4cb892b-1423056162434@3capp-gmx-bs67>
In-reply-to: <[🔎] 20150204104344.GD5688@an3as.eu>
References: <[🔎] CAKMTKZJ2rrvQ66bbU_fvLsY_LGsPjLixry1JoC3-7BZSuu969A@mail.gmail.com>, <[🔎] 20150204104344.GD5688@an3as.eu>

Hello Andreas,

> Hi Hervé,
> 
> thanks for your explanation.  When I wrote my previous mail in response
> to Steffen I have not yet read this since I'm reading usually
> thread-based.

I had also replied to the initial thread while Hervé's introduction
came arrived in my inbox :)
 
> On Wed, Feb 04, 2015 at 08:46:07AM +0000, Hervé Ménager wrote:
> > Dear all,
> > 
> > As some of you already know, the ELIXIR registry (
> > http://elixir-registry.cbs.dtu.dk) is a project that aims at gathering an
> > extensive list of bioinformatics tools and services, and publish them as a
> > web-based database where users can search and locate relevant
> > bioinformatics resources. There is a work, initiated by Steffen Möller and
> > Tim Booth, to enable the automatic registration of debian-med packages as
> > resources in this registry. I worked a bit with Steffen and Tim on this
> > interface during the debian-med sprint.
> 
> Thanks for working on this.

And please also all have a look at
https://lists.debian.org/debian-med/2014/11/msg00070.html
  
> > The way it is currently done a new metadata file, called edam (for now), in
> > the "upstream" directory of the package source, contains additional
> > information not present in other files such as metadata, control, etc. The
> > script parses all these files to produce a JSON file used to register the
> > debian package in the registry. However, there are, as Andreas pointed out
> > today, at least two problems with this approach:
> > - it includes the creation of an additional file to store information,
> > potentially breaking downstream operations where this file is unexpected,
> 
> I think this is not a problem.  As Charles said in my response he just
> needed to do a "minimum diff upload" to get an RC bug fix accepted by
> the release team.  Dumping another file into debian/upstream would
> otherwise be considered harmless.  My major point was that the effort
> itself will be more successful if you point people to it in advance to
> let them contribute to this effort as well.

The placing was yet only technically motivated, not socially, and, yes, 
of course, sure, certainly, ...

> > - it requires parsing multiple files with different formats, long and
> > cumbersome task...
> > As a complete stranger to the debian packaging process, I would like
> > your opinion on these points:
> > - should we create this additional file or add the information in
> > another existing files? The goal here is obviously to reduce the
> > number of files which have to be edited, while minimizing the risk of
> > breaking anything in the packaging architecture.
> 
> The debian/upstream dir itself is quite new and not yet used by many
> teams.  Charles started it for injecting publication data and over time
> and is documented in the Wiki[1].  I'm personally not sure whether we
> should invent a new file (edam) there or whether it is fine to use the
> just existing metadata file for this kind of information.  It mainly
> depends from the planed application and the way it should be maintained.

I expressed my strong preference for the separate file. At the sprint
I supported the concept to have an option to have multiple such files
when there are multiple packages that differ significantly in what
these provide.

> > - should we try, rather than parsing these files, to retrieve
> > the information from the UDD? I'd personally prefer this option to the
> > "parse n files" one, but it would also require to add the new information
> > to the DB.
> 
> At some point in time the n files need to be parsed.  However, this
> is a solved problem for debian/upstream/metadata.  For the moment I
> just extract the Reference data from it but I also intend to take over
> fields Cite-As, Funding and others.  The decision what field is parsed
> from my point of view is application-driven:  I needed to put the
> ciatation data (field "Reference") online on the tasks pages and thus
> I spendet my time to do the needed work.  So if we have a reasonable
> application for further data we should invent a sensible table layout
> and import these data.  From my point of view we can thus put edam
> data right into the metadata file (*after* documenting it on the Wiki
> page[1]) or we can add another file (*and* create an according Wiki
> page).  It would be simple to gather also these additional files in
> the same job as other machine readable files are processed.

Only now after the sprint something emerges that can be documented, IMHO.

> What continuosely remains unclear to me is for what purpose we gather
> these data.  The following random questions are popping up in my mind:
> 
>   0. Is it just fun to collect metadata?

The EDAM to me is a simplistic language to describe what our packages
are capable to help with. It is somewhat rewarding to prepare such a
formal description ... but only for the first few packages. The larger
motivation lies in using those terms to describe workflows and then
find tools for the job - to actually chain those tools up with the
correct command line options to process the data properly is yet another
task.

>   1. Do we just gather them to help the EDAM database get even more
>      metadata than we have (like descriptions, dependencies, etc.)?
>      That's fine but than we should provide them in the best possible
>      form *for* EDAM to be accessed (whatever this might be).

Our subversion and git repositories, or the source packages, are
perfectly acceptable.
 
>   2. Do we want to base installation methods on a certain set of
>      EDAM fields?  (I remember times when it was possible to install
>      packages based on DebTags but I can't find this any more :-()

Yes. That and I envision containers (VMs, Docker, Cloud instances)
to exploit the annotation.

>   3. Do we want to change our Debian Med task design on EDAM tags?

The consistency across our blends is more important than any fancy
gimmicks, I tend to think.  But if it fits - I would not mind. But
hoping for a suitable presentation of the availability of Debian packages
for particular tasks/software at a central page like the ELIXIR catalog
would help us more than any fiddling with our package presentation.

> I think we should made up our mind what exactly we want to approach
> to finally enhance the user experience.

My prime ambition is to come straight from the external-to-Debian
catalog of software in computational biology to "us" - and this shall
mean Bio-Linux, Ubuntu and Debian alike. A landing page for any such
external pointer to a single or multiple Debian packages we still
need to decide upon - an early stage could be the "apt-get install"
command plus descriptions of our packages to return, or a pointer
to a VM featuring those packages, or ... Many options I see. 

Best,

Steffen

Reply to:

Follow-Ups:
- Re: debian med packages in the ELIXIR registry
  - From: Andreas Tille <andreas@an3as.eu>
- Re: Re: debian med packages in the ELIXIR registry
  - From: Charles Plessy <plessy@debian.org>

References:
- debian med packages in the ELIXIR registry
  - From: Hervé Ménager <herve.menager@gmail.com>
- Re: debian med packages in the ELIXIR registry
  - From: Andreas Tille <andreas@an3as.eu>

Prev by Date: Bug#777043: ITP: libshark -- Shark Machine Learning Library
Next by Date: New git repos
Previous by thread: Re: debian med packages in the ELIXIR registry
Next by thread: Re: debian med packages in the ELIXIR registry
Index(es):
- Date
- Thread