Bug#960154: Feed UDD with just-in-time packaging hints from Lintian
On 13/04/21 at 18:45 +0200, Mattia Rizzolo wrote:
> [ Adding lucas@ to CC since he is the main person behind UDD after all ]
>
> On Sun, Apr 11, 2021 at 12:45:14PM -0700, Felix Lechner wrote:
> > On Sat, May 9, 2020 at 5:33 PM Mattia Rizzolo <mattia@debian.org> wrote:
> > > have lintian decide on a nice machine-parsable (text!) format
> > > then udd will adapt its importer.
> >
> > As you know, both of these already happened several months ago.
>
> Indeed, I consider that done by now.
>
> > I have
> > not commented here because I am still chewing on a related, but much
> > harder problem:
>
> I'd have probably used a different bug, but guess we'll cope.
>
> > Lintian will soon cease to run blindly across the archive and instead
> > produce packaging hints on demand, as uploads are received by the
> > archive. There is no batch process anymore that will produce files for
> > the entire archive the way you expect. Instead, Lintian's new website
> > https://lintian.debian.*net* offers a JSON interface [1] to get up to
> > date information similar to DAKweb. [2]
>
> So, if we really go down this route, I think we need to:
>
> * Have the importer able to run a full import of everything, which means
> looping through all sources (which means running some ~30k HTTP GETs)
> and storing them.
> * Figure out a way for UDD to know it needs to check the status of a
> package. This likely means a job that compares the set of known
> (package, version, suite) (is the tuple right?) with what is available
> in the lintian table: if something is missing query the lintian
> website for new data.
> * perhaps have the lintian website *push* new data to udd.d.o. I'm
> conflicted if this should be just a trigger ("hey I've just processed
> this, check it out yourself") or if it should carry the actual data as
> well. I'm sure you'd like a HTTP post or such, but I can tell you
> that we'd likely prefer something through SSH.
>
>
> Since after all you did look at udd several times, I believe you should
> already be able to implement the first 2?
>
>
>
> All this said, I still don't understand why you wouldn't be able to
> provide a view of everything. Since you set up that API, couldn't you
> have a endpoint with *all* packages and everything, like the current
> dump? That sounds much more trivial than what you are proposing…
>From the UDD point of view, I would very much prefer to get a full dump
something I can import every few hours, than having to deal with a
stream of updates or with querying a per-package API.
Currently the full import (that runs twice a day) takes about 10 minutes
(and I don't remember if it has been optimized, so there might be space
for improvement).
Lucas
Reply to: