[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Entry: NA" in debian/upstream/metadata

Hey all again, and thanks for your thoughts Andrius and Andreas!

On 2021-03-03 09:36, Andreas Tille wrote:
Hi Andrius,

On 2021-03-03 08:54, Andrius Merkys wrote:
Dear Matus,

On 2021-03-02 19:56, Matus Kalas wrote:
I'd suggest hearing from the folks who have done the most of the work
with manually including those IDs, and letting them approve/decide.


Steffen et al., your opninions on this matter?

I can imagine that for purely practical reasons in the process of the
manual curation, it might make sense to allow explicitly:
 - Name: OMICtools
Entry: N/A (Meaning: I have checked and there was no record)
 - Name: bio.tools
Entry: "" (Meaning: I or someone else should check this out;
or perhaps: I checked but wasn't conclusive yet)

The latter might be useful for contributors who aren't used to all those
IDs, to make them more visible (including where the gaps are). But on
the other hand, if those are well present in an upstream/metadata
template and very clear in the documentation of upstream/metadata, then it is not necessary and I'd then tend to like your suggestion Andrius.

To me, three flavors of "unknown" looks like an overkill. Most of the
metadata in Debian does not even have the two flavors of "unknown":
missing Bug-Submit field in d/u/metadata, Homepage in d/control and
Upstream-Contact in d/copyright means that this piece of information is
either nonexistent or simply not entered (for example, due to the lack
of time). Thus I am not sure whether the added value is worth the
infrastructure/effort here. But again, this is solely my opinion,
certainly not aimed at reflecting those of the people who enter and use
the data in d/u/metadata.

I wrote the UDD importer for the metadata files and thus look at the
data as a "consumer" of the provided information.  From this side those
different meanings of unknown are all turned into "ignore this value".
So in this respect differentiating between those unknowns is basically
helpful for those who edit the metadata files. Flagging something as "I
was here and have checked" is probably kind of helpful.  However, it
might perfectly be that some registry will include that specific
software later and re-checking makes sense.

For this reason I was recommending to not make those simple things to
complex since making it complex just drains time from the people who are
working on it with no visible effect to the users.

If three flavors option would be preferred, I would also suggest adding date fields for each entry to signal at which point in time the registry
was inspected.

As I wrote above later addition of some software to some registry can
spoil the different meanings of unknown.  This could be cured by such a
date field but I don't think it is of any better value than draining
time from people maintaining that extra field.  Thus I do not think we
should do this.

We definitely don't need a date, git blame does that. Also in the form of the Blame button in Salsa. Without a possibility for inconsistency.

Thanks a lot for your work on this




There is one closely related issue, which we just briefly touched upon with Steffen and Hervé in a telcon: What to do with those "NA" packages that are missing in e.g. bio.tools?

The regitration in bio.tools (and surely also SciCrunch) could be automated, but there are at least a couple of things needing human curation:

- Which src packages represent one tool (often e.g. libs | language bindings form separate Debian pkgs). How to mark this and where? Is there an exisiting Debian mechanism? Or do we need to abuse the d/u/metadata "Entry" for that, before they're added? (3rd or 4th flavour of info then 😀 ; btw. git branches could help here 😉 ; and not in google spreadsheet perhaps 😜 as it has to be machine-readable)

- Choosing an available, reasonable biotoolsID and tool name. Ideally tool name and biotoolsID are identical with ID having all small case and spaces removed/replaced.

  - Any other things needing human curation?

Thank you all, I'm very happy seeing this progressing!

P.S.: Could you please leave all the contents in when replying to the thread, so that others can reply to previously mentioned points without having to read every single email in the thread and possibly breaking linearity of it? I agree that's it not ecological to broadcast the same text all around the globe again and again, but there are other solutions than emails that handle that without compromising. Many thanks!

Reply to: