Re: pragma supplementation-page
On Mon, Sep 08, 2025 at 03:24:33PM +0100, Jonathan Dowland wrote:
> On Mon Sep 8, 2025 at 12:50 PM BST, Andrew Sayers wrote:
> > I've been playing around with `bin/get-interesting-strings.pl` today.
> > I'll make it easier to use and add it to the README once I've slept on it,
> > but for now you need to create a `data` symlink in the repo's base directory,
> > pointing to the dump's `data` directory. Then `make interesting-strings.txt`
> > will create a tab-separated value file with interesting snippets from the wiki.
> > The HEAD commit adds /Discussion links, and finds 1,059 of them :s
>
> Argh that's a lot.
>
> Eyeballing the list, many (not sure *how* many) are translations, with the
> Discussion link embedded in a table with the translation links (the
> "translation header").
>
> Current best practice for the translation header is for translated pages to
> transclude it from the parent page. But, implementing that for existing
> pages is more work than just fixing the Discussion link: it means first
> making sure the parent page has the header markers, then replacing the table
> in the translated pages with the transclusion.
That's a good point, but relative links in <<Include>>d blocks are interpreted
relative to the original page - for example, the discussion link on
it/Aptitude points to Aptitude/Discussion, but the same link on
es/Aptitude points to es/Aptitude/Discussion, because the former
uses an <<Include>> while the latter copy/pastes.
So long as we check that e.g. es/Aptitude/Discussion doesn't exist,
I figure it should be safe to change that link.
> I guess we'd also need to check for any discrepancies in the list of
> languages in the translation headers. I suppose it would not be impossible
> for a parent page to be missing a link to a translation.
>
> Which is more pragmatic: updating/fixing these translation headers now, or
> teaching our conversion script to ignore the whole translation header
> (since, iirc, it's not necessary at all on Mediawiki)?
Short answer - the first is more pragmatic, but will need a different approach.
Any solution involves teaching a script to replace translation headers,
doing it now just means we have the opportunity to undo our mistakes :)
To edit that many pages, with dependencies between them, how about:
1. generate a big JSON document like this from the existing dump:
{
"Aptitude": {
"rev": <page-revision>,
"source": "... original contents ..."
},
"es/Aptitude": {
"rev": <page-revision>,
"source": "... original contents ..."
},
...
}
2. write a Perl script to generate all the new versions we need
* outputs a new JSON document with an extra "dest" key per page
* mm2mw.pl is in Perl, so may need to borrow from this script
3. write some more sanity-checking scripts
* e.g. something to `diff <source> <dest>` for each page
4. write something to POST the new page contents and check the response
* MoinMoin seems to check the revision on POST - will need to check,
but we can probably handle edit conflicts easily enough
* possible solutions:
* paste the whole document into GreaseMonkey
* use Selenium to remote-control the browser
* export cookies from Firefox and give them to curl
5. gather all the POST results and update the JSON document
* update the source contents for pages where the edit was accepted
* download the latest revision of pages where the edit was rejected
6. for complex edits (e.g. changing the English page before its translations),
go to 2
Reply to: