[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: debian med packages in the ELIXIR registry

Hi Hervé,

thanks for your explanation.  When I wrote my previous mail in response
to Steffen I have not yet read this since I'm reading usually

On Wed, Feb 04, 2015 at 08:46:07AM +0000, Hervé Ménager wrote:
> Dear all,
> As some of you already know, the ELIXIR registry (
> http://elixir-registry.cbs.dtu.dk) is a project that aims at gathering an
> extensive list of bioinformatics tools and services, and publish them as a
> web-based database where users can search and locate relevant
> bioinformatics resources. There is a work, initiated by Steffen Möller and
> Tim Booth, to enable the automatic registration of debian-med packages as
> resources in this registry. I worked a bit with Steffen and Tim on this
> interface during the debian-med sprint.

Thanks for working on this.
> The way it is currently done a new metadata file, called edam (for now), in
> the "upstream" directory of the package source, contains additional
> information not present in other files such as metadata, control, etc. The
> script parses all these files to produce a JSON file used to register the
> debian package in the registry. However, there are, as Andreas pointed out
> today, at least two problems with this approach:
> - it includes the creation of an additional file to store information,
> potentially breaking downstream operations where this file is unexpected,

I think this is not a problem.  As Charles said in my response he just
needed to do a "minimum diff upload" to get an RC bug fix accepted by
the release team.  Dumping another file into debian/upstream would
otherwise be considered harmless.  My major point was that the effort
itself will be more successful if you point people to it in advance to
let them contribute to this effort as well.

> - it requires parsing multiple files with different formats, long and
> cumbersome task...
> As a complete stranger to the debian packaging process, I would like
> your opinion on these points:
> - should we create this additional file or add the information in
> another existing files? The goal here is obviously to reduce the
> number of files which have to be edited, while minimizing the risk of
> breaking anything in the packaging architecture.

The debian/upstream dir itself is quite new and not yet used by many
teams.  Charles started it for injecting publication data and over time
and is documented in the Wiki[1].  I'm personally not sure whether we
should invent a new file (edam) there or whether it is fine to use the
just existing metadata file for this kind of information.  It mainly
depends from the planed application and the way it should be maintained.

> - should we try, rather than parsing these files, to retrieve
> the information from the UDD? I'd personally prefer this option to the
> "parse n files" one, but it would also require to add the new information
> to the DB.

At some point in time the n files need to be parsed.  However, this
is a solved problem for debian/upstream/metadata.  For the moment I
just extract the Reference data from it but I also intend to take over
fields Cite-As, Funding and others.  The decision what field is parsed
from my point of view is application-driven:  I needed to put the
ciatation data (field "Reference") online on the tasks pages and thus
I spendet my time to do the needed work.  So if we have a reasonable
application for further data we should invent a sensible table layout
and import these data.  From my point of view we can thus put edam
data right into the metadata file (*after* documenting it on the Wiki
page[1]) or we can add another file (*and* create an according Wiki
page).  It would be simple to gather also these additional files in
the same job as other machine readable files are processed.

What continuosely remains unclear to me is for what purpose we gather
these data.  The following random questions are popping up in my mind:

  0. Is it just fun to collect metadata?

  1. Do we just gather them to help the EDAM database get even more
     metadata than we have (like descriptions, dependencies, etc.)?
     That's fine but than we should provide them in the best possible
     form *for* EDAM to be accessed (whatever this might be).

  2. Do we want to base installation methods on a certain set of
     EDAM fields?  (I remember times when it was possible to install
     packages based on DebTags but I can't find this any more :-()

  3. Do we want to change our Debian Med task design on EDAM tags?

I think we should made up our mind what exactly we want to approach
to finally enhance the user experience.
> Thanks a lot in advance for sharing your remarks and opinion on this.

Thanks also to you and your contribution to the sprint.  It was nice to
learn you to know.

Kind regards


[1] https://wiki.debian.org/UpstreamMetadata


Reply to: