[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#966649: Request for feedback on upload_history re-implementation



On Thu, Aug 20, 2020, 05:45 Lucas Nussbaum <lucas@debian.org> wrote:
Hi Asheesh,

Hi! :)



I think that the changes compared to the current table structure should
be minimized, to avoid rewrite all tools that use this data.
Improvements are welcomed of course, but please don't make changes if
there's no good reason for them.

Good call. I'll prioritize that.


Did you confirm with DSA that parsing the online list archives is the
preferred way to go? I fear that we will hit some HTTP rate limiting at
some point and will have to reconsider the implementation.

I haven't yet! I can do so. I will try to optimize the current approach first since I'm enthusiastic about it, but good call on checking with DSA.


How optimized is your code for running every few minutes? Ideally we
would like near-real-time updates of this data, we will require polling
the list archives (previously, email was received directly on
ullmann.debian.org via a special email address)

It's a good question. Let me update you about that once I've optimized further. I think I can get down to one HTTP call at start when nothing changes (mailing list index page) and down to 2 (index page plus message page) if there is a change.

Running every 2 min (say) would mean 24*30 = 720 requests per day, which seems well below any rate limit I can think of, but obviously 0 unnecessary requests is nicer. It's a good topic to discuss with DSA, and I can do that.

Even if the inbound email is used for fresh data, historic data needs to come from somewhere. I think the email archives on the web are a good place to import those, based on my preference to develop in a context that doesn't require any special setup.

Hope you're doing well!

Asheesh.

Reply to: