[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: "Entry: NA" in debian/upstream/metadata



On 2021-03-03 18:39, Matus Kalas wrote:
> Hey all again, and thanks for your thoughts Andrius and Andreas!
> 
> On 2021-03-03 09:36, Andreas Tille wrote:
>> Hi Andrius,
>>
>> On 2021-03-03 08:54, Andrius Merkys wrote:
>>> Dear Matus,
>>>
>>> On 2021-03-02 19:56, Matus Kalas wrote:
>>>> I'd suggest hearing from the folks who have done the most of the work
>>>> with manually including those IDs, and letting them approve/decide.
>>>
>>> Absolutely!
> 
> Steffen et al., your opninions on this matter?
> 
>>>
>>>> I can imagine that for purely practical reasons in the process of the
>>>> manual curation, it might make sense to allow explicitly:
>>>>  - Name: OMICtools
>>>>    Entry: N/A        (Meaning: I have checked and there was no record)
>>>>  - Name: bio.tools
>>>>    Entry: ""         (Meaning: I or someone else should check this out;
>>>> or perhaps: I checked but wasn't conclusive yet)
>>>>
>>>> The latter might be useful for contributors who aren't used to all
>>>> those
>>>> IDs, to make them more visible (including where the gaps are). But on
>>>> the other hand, if those are well present in an upstream/metadata
>>>> template and very clear in the documentation of upstream/metadata, then
>>>> it is not necessary and I'd then tend to like your suggestion Andrius.
>>>
>>> To me, three flavors of "unknown" looks like an overkill. Most of the
>>> metadata in Debian does not even have the two flavors of "unknown":
>>> missing Bug-Submit field in d/u/metadata, Homepage in d/control and
>>> Upstream-Contact in d/copyright means that this piece of information is
>>> either nonexistent or simply not entered (for example, due to the lack
>>> of time). Thus I am not sure whether the added value is worth the
>>> infrastructure/effort here. But again, this is solely my opinion,
>>> certainly not aimed at reflecting those of the people who enter and use
>>> the data in d/u/metadata.
>>
>> I wrote the UDD importer for the metadata files and thus look at the
>> data as a "consumer" of the provided information.  From this side those
>> different meanings of unknown are all turned into "ignore this value".
>> So in this respect differentiating between those unknowns is basically
>> helpful for those who edit the metadata files.  Flagging something as "I
>> was here and have checked" is probably kind of helpful.  However, it
>> might perfectly be that some registry will include that specific
>> software later and re-checking makes sense.
>>
>> For this reason I was recommending to not make those simple things to
>> complex since making it complex just drains time from the people who are
>> working on it with no visible effect to the users.
>>
>>>
>>> If three flavors option would be preferred, I would also suggest adding
>>> date fields for each entry to signal at which point in time the registry
>>> was inspected.
>>
>> As I wrote above later addition of some software to some registry can
>> spoil the different meanings of unknown.  This could be cured by such a
>> date field but I don't think it is of any better value than draining
>> time from people maintaining that extra field.  Thus I do not think we
>> should do this.
> 
> We definitely don't need a date, git blame does that. Also in the form
> of the Blame button in Salsa. Without a possibility for inconsistency.

Agree.

>> Thanks a lot for your work on this
>>
>>      Andreas.
>>
>> -- 
>> http://fam-tille.de
>>>
>>> Best,
>>> Andrius
> 
> There is one closely related issue, which we just briefly touched upon
> with Steffen and Hervé in a telcon: What to do with those "NA" packages
> that are missing in e.g. bio.tools?
> 
> The regitration in bio.tools (and surely also SciCrunch) could be
> automated, but there are at least a couple of things needing human
> curation:
> 
>   - Which src packages represent one tool (often e.g. libs | language
> bindings form separate Debian pkgs). How to mark this and where? Is
> there an exisiting Debian mechanism? Or do we need to abuse the
> d/u/metadata "Entry" for that, before they're added? (3rd or 4th flavour
> of info then 😀 ; btw. git branches could help here 😉 ; and not in
> google spreadsheet perhaps 😜 as it has to be machine-readable)

Maybe a separate field could be introduced for that? I would prefer
leaving "Entry" for IDs only, so that an URL inside the registry could
be formulated in a straightforward manner. Imposing internal structure
on fields (i.e., abusing "Entry") introduces both negative effect on
machine-readability and possible namespace collisions. Should there be a
need for free-form storage for information, I would better introduce a
"Comment" field for each entry, where a maintainer could store anything
one believes is important about that entry.

>   - Choosing an available, reasonable biotoolsID and tool name. Ideally
> tool name and biotoolsID are identical with ID having all small case and
> spaces removed/replaced.
> 
>   - Any other things needing human curation?
> 
> 
> 
> Thank you all, I'm very happy seeing this progressing!
> Matus
> 
> 
> P.S.: Could you please leave all the contents in when replying to the
> thread, so that others can reply to previously mentioned points without
> having to read every single email in the thread and possibly breaking
> linearity of it? I agree that's it not ecological to broadcast the same
> text all around the globe again and again, but there are other solutions
> than emails that handle that without compromising. Many thanks!

OK!

Best,
Andrius


Reply to: