[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DebianBug links (was Re: Complex conversion issues)



On Sat, 2025-07-26 at 17:56 +0100, Andrew Sayers wrote:
[...]
> If the current solution doesn't cache, we're presumably looking to replace
> a solution that hits the BTS every time someone visits a page with a BTS
> link on it.  That suggests...
> 
> a) users will see any caching as a regression

I don't think users will mind if a few bug links on the wiki are
outdated by a maximum of 1 hour.

> b) the BTS admins will see any caching as an improvement

As I said before, the BTS should be able to handle any load coming its
way, so that shouldn't be a concern.

> c) if we have to give up and use a JS solution, browsers could
>    contact the BTS directly, avoiding the need for the Perl script?

I don't think JS can use the SOAP API directly. Even if there is a
library for it, bundling a JS library isn't an option when the Perl
script achieves the same thing.

> > > Speaking of caching, it would be nice to have a solution that's updated
> > > regularly without spamming the BTS server, but I don't see a
> > > "bugs updated since <date>" request in the SOAP interface.
> > > Anyone object to me submitting a wishlist bug against debbugs?
> > > Or am I better off asking on #debbugs instead?
> 
> I wasn't aware of the UDD before, and my question is answered with e.g.:
> 
>     https://udd.debian.org/bugs/?merged=ign&fnewerval=7&flastmodval=7&rc=1&sortby=last_modified&sorto=desc&format=json

I'm thinking of direct database access to allow for more complex/niche
queries: https://udd-mirror.debian.net/ and
https://udd.debian.org/schema/udd.html

> > > maytham explained on IRC that scary transclusion has a setting to control
> > > how often MW polls the service[12].  That seems like a good plan if we have
> > > to use the existing debbugs API, but if debbugs is upgraded to list recent
> > > changes, it would be nice to push those to the site a bit faster.
> > > How about a solution like this:
> > > 
> > > 1. when the service is queried, it returns the result then edits
> > >    Template:Debbugs/<number> with the same result
> > > 2. the service polls debbugs every 60 seconds for recent updates,
> > >    and updates any existing Template:Debbugs/<number>
> > > 3. Template:DebianBug uses Template:Debbugs/<number> if it exists,
> > >    or else scary-transcludes the service
> > > 
> > > ... which would update links within a minute, without putting much load
> > > on either the BTS or wiki servers.
> > 
> > Wouldn't that just push caching away from the wiki's builtin system to
> > the wiki pages? Also seems like it would cause *more* traffic by
> > checking debbugs every 60 seconds, when MediaWiki only fetches
> > information on demand and caches information for up to 1 hour (by
> > default).
> 
> I'm not sure I understand the distinction - surely wiki pages *are* the wiki's
> builtin caching system?

At least for External Data, it handles caching in a separate table in
the database, which is superior to maintaining all this information by
hand as pages in the wiki. See [17].

> For clarity, here are some terms I'll try to use consistently in this thread:
> 
> * a "pull-based solution" is something like scary transclusion, which 
>   makes one small request per record per hour (or day, or whatever)

I think this is the best approach. It's not a regression since the
current wiki already does this, and MediaWiki will be able to cache
information to reduce page load times.

> * a "polling-based solution" is something like the script I'm proposing,
>   which makes one big request total per minute (or hour or whatever)
>   and scans that request for matches
> * a "push-based solution" would be something like a webhook where the remote
>   server notifies us when events occur

With these, we're just making a local copy of the data, which I don't
understand the purpose of. The BTS is accessible and doesn't have any
issues handling the requests, and the UDD is probably even faster and
also does not have any traffic problems.

> A pull-based solution would check at most once per hour per bug, whereas a
> polling-based solution would guarantee exactly one redundant request site-wide
> per minute.  So it's not immediately obvious which have higher total traffic.
> 
> You mentioned before that you had access to the current wiki's log
> files - could you look for requests to /cgi-bin/bugstatus to see how much the
> current wiki is accessing the BTS, and how many unique bugs it's asking about?

I checked the Apache access logs and couldn't find requests to the
bugstatus endpoint.. I'm either looking in the wrong place or equests to
it are not logged.

> > The BTS handles high traffic well, so I don't think this is an issue
> > that needs to be handled, as well as the fact caching already happens.
> > 
> > The caching period can be decreased, and InterWiki comes with a
> > maintenance script to clear all of its cache if needed.
> 
> Reducing the cache time to one minute would work as well as polling once per
> minute, but I would expect it to trigger frequent re-renders of those pages.
> If we start tweaking the timeout, we should keep an eye on MW server load.
> 
> > > Finally, how about making the template returned by the service look like:
> > > 
> > > {{
> > >    {{{1}}}
> > >    |summary=...
> > >    |pending=...
> > >    |id=...
> > >    |severity=...
> > >    ...
> > > }}}
> > > 
> > > You could then call it like `{{raw:wiki:debbugs|<number>|MyHandler}}`,
> > > which would in turn call Template:MyHandler with the relevant parameters.
> > 
> > Do you mean add the ability to fetch different parameters from the bug
> > system? I don't think this is necessary since no wiki pages currently do
> > this and it would only duplicate information. Except maybe the bug title
> > can be in the link text when an option is passed?
> 
> Having now looked at External Data, I was suggesting a homebrew version of
> #display_external_table - let's look at that instead :)

This functionality could be used for TODO trackers like
https://wiki.debian.org/Javascript/Nodejs/Tasks/electron maybe?

One of the things I really think this could be useful for is the "Debian
Status" column at https://wiki.debian.org/FreedomBox/LeavingTheCloud ,
where it solely relies on manual updates to packaging status.

> > Yet another interesting possibility that doesn't require running another
> > service is the External Data[13] extension, which can pretty much
> > achieve the same thing by accessing the BTS SOAP API directly and
> > fetching information. It also supports caching[14] and allows for
> > different caching expiry times for different URLs and hosts.
> > 
> > It can even fetch data from databases[15], which opens the possibility
> > for querying the UDD mirror (which is a PostreSQL database).
> > 
> > We can do some really cool stuff with this :)
> 
> Agreed!
> 
> External Data is still a pull-based solution, but if circumstances conspired to
> need polling (or even pushing), we'd just make a mechanism to purge individual
> page caches when values changed.
> 
> At a pinch, it might even be possible to get data *out* of MediaWiki this way.
> For example, I previously mentioned creating a table of ToDo items.
[...]

I don't think we really need any magic or external service for things
like todo lists. At most, maybe the Page Forms extension to do what the
current wiki does, where form values are used to add text to the page or
create a new page e.g. [16]. If we needed some way to query data, then
it would be Cargo all the way rather than setting up another service.

Overall, I really, really don't think it's worth our time to try and
minimise traffic to the BTS and store a copy of data that is readily
accessible, outside of enabling caching and setting an expiry time (1
hour seems fine to me) to ensure pages don't take too long to load.

It's not essential that information on the wiki is updated the moment
some change is received by the BTS, so I don't think some
synchronization mechanism or reducing cache time is necessary.

Here's what I've worked on on my local MW:
( If we use Scribunto, which is basically Lua scripting in MW, then the
  template could probably be done a lot cleaner. )

# /etc/mediawiki/LocalSettings.php
wfLoadExtension( 'ExternalData' );
$wgExternalDataSources['udd'] = [
	'server' => 'udd-mirror.debian.net',
	'type' => 'postgres',
	'name' => 'udd',
	'user' => 'udd-mirror',
	'password' => 'udd-mirror',
];

# Template:DebianBug
<includeonly>[https://bugs.debian.org/{{{id|{{{1}}}}}} {{#ifeq:
  {{#external_value:status
   |db=udd
   |from=all_bugs
   |where=id='{{{id|{{{1}}}}}}'
   |limit=1
  }}
  |done
  |<s><nowiki>#</nowiki>{{{id|{{{1}}}}}}</s>
  |<nowiki>#</nowiki>{{{id|{{{1}}}}}}}}]</includeonly>

# Demo 
Bug {{DebianBug|1076281}} is closed, but {{DebianBug|977964}} remains open.

Attached is a screenshot of what it renders as.

Thanks for the discussion, the points you're raising are very interesting ones.
--
Maytham

> > > > > [2] https://www.mediawiki.org/wiki/Extension:Gadgets
> > > > [9]  https://salsa.debian.org/Maytha8/iwservice
> > > > [10] https://www.mediawiki.org/wiki/Manual:$wgEnableScaryTranscluding
> > > [11] https://wiki.debian.org/DebbugsSoapInterface
> > > [12] https://www.mediawiki.org/wiki/Manual:$wgTranscludeCacheExpiry
> > [13] https://www.mediawiki.org/wiki/Extension:External_Data
> > [14] https://www.mediawiki.org/wiki/Extension:External_Data/Caching_data
> > [15] https://www.mediawiki.org/wiki/Extension:External_Data/Databases
[16] https://wiki.debian.org/ReleasePartyTrixie#Add_your_city
[17] https://www.mediawiki.org/wiki/Extension:External_Data/Caching_data

Attachment: Image-7NM792.png
Description: PNG image

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: