[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: About storing edam files in Debian packaging



Hi Andreas,


On 07/02/2017 15:54, Andreas Tille wrote:
> Hi Matus and Steffen,
>
> I was again stumbling upon totally broken edam yaml files (bowties and
> bowtie2 was totally broken, I fixed freecontact and clustalo which had
> "topics" instead of "topic" and were breaking UDD import due to this).
Firstly, thanks for caring.
> To verify how many other edam files might be affected I did some
>
>    find . -name "*.edam" -exec yamllint \{\} \; > ../yamllint_result
>
> in the UDD importer dir which is featuring all those files.  I have
> attached the result here - please note that the packages mentioned
> above are fixed meanwhile *syntactically*.
>
> When trying to fix the syntax I also noted that you sometimes are
> using input / inputs or output / outputs - so there is no consistent
> use of the keywords.
There are different ways to look at this. Input and Output could be used
in singular but an array of inputs/outputs is always expected. That is
why we
added the "s" when we recently sat over this in Bucharest. I am not
completely sure about topic.
> From my point of view the data quality
Please be a bit more quiet on this front. You expressed your concern about
syntactical compatibility with the parser you helped with - taken. For what
we were after that parser was not used, as helpful as it is and is
likely to
become.
> makes these files pretty useless.
No.
> My question is:
>
>   1. Are these data really used?
Not at the moment.
>   2. What is the plan to use the data?
It is the paths into bio.tools. And then it shall help with the assembly
of workflows.
>   3. How can we enhance the data quality?
>      (lintian comes to mind as we are using it for upstream files)
For your syntactical concerns, the files should be without warnings or
errors of yamllint already and are parsed with what Hervé, Matúš and I
crafted. Data quality for me primarily concerns the description of
topic, function, input and output with the right selection of terms from
the EDAM ontology.
> At Debian Med sprint you both asked me to add an additional column in
> UDD storing the version of the Ontology.  I admit I'm not really
> motivated to work on bad quality data
Please do not presidentially repeat something exaggerated all over.
> with unknown use case.  I'd assume
> if there would be any consumers of these data they would have thrown
> errors.  If this is the case, I wonder why you did not contacted the
> authors of the edam files to fix these (or just fix these yourself).
The ball is with bio.tools to follow our idea for an integration we
prepared or to come up with a different one. From my perspective there
is nothing for us to fix upfront, basically for some subset of the
reasons you describe. The parser we used in some human-compatible way
tolerates both "input" and "inputs" (which fixes your syntax concerns). 
If the import is not happening then all the edam files go again, with
"s" or not (which is your concern for a use case). Grant them another
six to eight months.

Steffen


Reply to: