[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Entry: NA" in debian/upstream/metadata

Am 03.03.21 um 17:39 schrieb Matus Kalas:
> Hey all again, and thanks for your thoughts Andrius and Andreas!
> On 2021-03-03 09:36, Andreas Tille wrote:
>> Hi Andrius,
>> On 2021-03-03 08:54, Andrius Merkys wrote:
>>> Dear Matus,
>>> On 2021-03-02 19:56, Matus Kalas wrote:
>>>> I'd suggest hearing from the folks who have done the most of the work
>>>> with manually including those IDs, and letting them approve/decide.
>>> Absolutely!
> Steffen et al., your opninions on this matter?

Sorry for being late on this.

So, "NA" indeed means like "hey, I checked but this was not found". This
information should not be lost.

An empty entry, as if from a template, does not have the same meaning.
If NA (which is how R expects it  and I found it likely to be easier to
parse) or N/A - I would not be bother to do all these changes and would
just leave it. Indeed, on the Excel sheet I am using N/A.

As it happens, we had a quick thought exchange on zoom today and I tend
to think that the general idea is that these NAs have to disappear, i.e.
add these entries to bio.tools.

>>>> I can imagine that for purely practical reasons in the process of the
>>>> manual curation, it might make sense to allow explicitly:
>>>>  - Name: OMICtools
>>>>    Entry: N/A        (Meaning: I have checked and there was no record)
>>>>  - Name: bio.tools
>>>>    Entry: ""         (Meaning: I or someone else should check this
>>>> out;
>>>> or perhaps: I checked but wasn't conclusive yet)
>>>> The latter might be useful for contributors who aren't used to all
>>>> those
>>>> IDs, to make them more visible (including where the gaps are). But on
>>>> the other hand, if those are well present in an upstream/metadata
>>>> template and very clear in the documentation of upstream/metadata,
>>>> then
>>>> it is not necessary and I'd then tend to like your suggestion Andrius.
>>> To me, three flavors of "unknown" looks like an overkill. Most of the
>>> metadata in Debian does not even have the two flavors of "unknown":
>>> missing Bug-Submit field in d/u/metadata, Homepage in d/control and
>>> Upstream-Contact in d/copyright means that this piece of information is
>>> either nonexistent or simply not entered (for example, due to the lack
>>> of time). Thus I am not sure whether the added value is worth the
>>> infrastructure/effort here. But again, this is solely my opinion,
>>> certainly not aimed at reflecting those of the people who enter and use
>>> the data in d/u/metadata.

Hm.  I see the following:

 * empty - nobody cared, yet
 * "N/A" or "NA" or "<N/A>" or "<NA>" the latter two I would prefer but
do not really care, may be too difficult in YAML since < is a special
character - checked but not found
 * "<rejected>" - bio.tools decided against referencing that package. We
are likely to see a few of these in near future.

>> <all easy for Andreas>
>>> If three flavors option would be preferred, I would also suggest adding
>>> date fields for each entry to signal at which point in time the
>>> registry
>>> was inspected.
>> As I wrote above later addition of some software to some registry can
>> spoil the different meanings of unknown.  This could be cured by such a
>> date field but I don't think it is of any better value than draining
>> time from people maintaining that extra field.  Thus I do not think we
>> should do this.
> We definitely don't need a date, git blame does that. Also in the form
> of the Blame button in Salsa. Without a possibility for inconsistency.

This may be material for another paper: Means to synchronize between
volunteer databases.

 * Provenance is accepted
 * data transfer status - this is not yet happening in routine but this
is what we are doing here.

@Andrius - If I do not need to be involved and if no information is
lost, then I promise to be very happy with whatever you come up with,
whatever this may be. The chance to have a reference named "NA", though,
especially with all caps, that is darn close to zero and I wish you
would invest/sink your valuable time into something else.



>> -- 
>> http://fam-tille.de
>>> Best,
>>> Andrius
> There is one closely related issue, which we just briefly touched upon
> with Steffen and Hervé in a telcon: What to do with those "NA"
> packages that are missing in e.g. bio.tools?
> The regitration in bio.tools (and surely also SciCrunch) could be
> automated, but there are at least a couple of things needing human
> curation:
>   - Which src packages represent one tool (often e.g. libs | language
> bindings form separate Debian pkgs). How to mark this and where? Is
> there an exisiting Debian mechanism? Or do we need to abuse the
> d/u/metadata "Entry" for that, before they're added? (3rd or 4th
> flavour of info then 😀 ; btw. git branches could help here 😉 ; and
> not in google spreadsheet perhaps 😜 as it has to be machine-readable)
>   - Choosing an available, reasonable biotoolsID and tool name.
> Ideally tool name and biotoolsID are identical with ID having all
> small case and spaces removed/replaced.
>   - Any other things needing human curation?
> Thank you all, I'm very happy seeing this progressing!
> Matus
> P.S.: Could you please leave all the contents in when replying to the
> thread, so that others can reply to previously mentioned points
> without having to read every single email in the thread and possibly
> breaking linearity of it? I agree that's it not ecological to
> broadcast the same text all around the globe again and again, but
> there are other solutions than emails that handle that without
> compromising. Many thanks!

Reply to: