[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: pragma supplementation-page



On Mon, Sep 08, 2025 at 03:24:33PM +0100, Jonathan Dowland wrote:
> On Mon Sep 8, 2025 at 12:50 PM BST, Andrew Sayers wrote:
> > I've been playing around with `bin/get-interesting-strings.pl` today.
> > I'll make it easier to use and add it to the README once I've slept on it,
> > but for now you need to create a `data` symlink in the repo's base directory,
> > pointing to the dump's `data` directory.  Then `make interesting-strings.txt`
> > will create a tab-separated value file with interesting snippets from the wiki.
> > The HEAD commit adds /Discussion links, and finds 1,059 of them :s
> 
> Argh that's a lot.
> 
> Eyeballing the list, many (not sure *how* many) are translations, with the
> Discussion link embedded in a table with the translation links (the
> "translation header").
> 
> Current best practice for the translation header is for translated pages to
> transclude it from the parent page. But, implementing that for existing
> pages is more work than just fixing the Discussion link: it means first
> making sure the parent page has the header markers, then replacing the table
> in the translated pages with the transclusion.

That's a good point, but relative links in <<Include>>d blocks are interpreted
relative to the original page - for example, the discussion link on
it/Aptitude points to Aptitude/Discussion, but the same link on
es/Aptitude points to es/Aptitude/Discussion, because the former
uses an <<Include>> while the latter copy/pastes.

So long as we check that e.g. es/Aptitude/Discussion doesn't exist,
I figure it should be safe to change that link.
 
> I guess we'd also need to check for any discrepancies in the list of
> languages in the translation headers. I suppose it would not be impossible
> for a parent page to be missing a link to a translation.
> 
> Which is more pragmatic: updating/fixing these translation headers now, or
> teaching our conversion script to ignore the whole translation header
> (since, iirc, it's not necessary at all on Mediawiki)?

Short answer - the first is more pragmatic, but will need a different approach.

Any solution involves teaching a script to replace translation headers,
doing it now just means we have the opportunity to undo our mistakes :)

To edit that many pages, with dependencies between them, how about:

1. generate a big JSON document like this from the existing dump:
   {
     "Aptitude": {
       "rev": <page-revision>,
       "source": "... original contents ..."
     },
     "es/Aptitude": {
       "rev": <page-revision>,
       "source": "... original contents ..."
     },
     ...
   }
2. write a Perl script to generate all the new versions we need
 * outputs a new JSON document with an extra "dest" key per page
 * mm2mw.pl is in Perl, so may need to borrow from this script
3. write some more sanity-checking scripts
 * e.g. something to `diff <source> <dest>` for each page
4. write something to POST the new page contents and check the response
 * MoinMoin seems to check the revision on POST - will need to check,
   but we can probably handle edit conflicts easily enough
 * possible solutions:
   * paste the whole document into GreaseMonkey
   * use Selenium to remote-control the browser
   * export cookies from Firefox and give them to curl
5. gather all the POST results and update the JSON document
 * update the source contents for pages where the edit was accepted
 * download the latest revision of pages where the edit was rejected
6. for complex edits (e.g. changing the English page before its translations),
   go to 2


Reply to: