Re: pragma supplementation-page

To: debian-wiki@lists.debian.org
Subject: Re: pragma supplementation-page
From: Andrew Sayers <2025-bugs.debian.org@pileofstuff.org>
Date: Mon, 8 Sep 2025 16:39:55 +0100
Message-id: <[🔎] aL74yw--58ZRoih3@andrews-2024-laptop.sayers>
In-reply-to: <[🔎] DCNHK7N993GL.3GC0HIEOJLZ5M@debian.org>
References: <aH94kwHlL6rj_2ng@thunder.hadrons.org> <aIIaD98FJEC13W-_@thunder.hadrons.org> <aKhR-butw7NVkiRy@thunder.hadrons.org> <aK3HlgcUtIgYkxyJ@andrews-2024-laptop.sayers> <aLSc9mHv-wsW3yz2@andrews-2024-laptop.sayers> <[🔎] DCN8YEDCGGH9.2EMRTW8QZXLSW@debian.org> <[🔎] aL6w614Uz082jhNF@andrews-2024-laptop.sayers> <[🔎] DCNDNY2QB2HE.2U2H9FFSEYXCQ@debian.org> <[🔎] aL7C88qwbd8w7tKj@andrews-2024-laptop.sayers> <[🔎] DCNHK7N993GL.3GC0HIEOJLZ5M@debian.org>

On Mon, Sep 08, 2025 at 03:24:33PM +0100, Jonathan Dowland wrote:
> On Mon Sep 8, 2025 at 12:50 PM BST, Andrew Sayers wrote:
> > I've been playing around with `bin/get-interesting-strings.pl` today.
> > I'll make it easier to use and add it to the README once I've slept on it,
> > but for now you need to create a `data` symlink in the repo's base directory,
> > pointing to the dump's `data` directory.  Then `make interesting-strings.txt`
> > will create a tab-separated value file with interesting snippets from the wiki.
> > The HEAD commit adds /Discussion links, and finds 1,059 of them :s
> 
> Argh that's a lot.
> 
> Eyeballing the list, many (not sure *how* many) are translations, with the
> Discussion link embedded in a table with the translation links (the
> "translation header").
> 
> Current best practice for the translation header is for translated pages to
> transclude it from the parent page. But, implementing that for existing
> pages is more work than just fixing the Discussion link: it means first
> making sure the parent page has the header markers, then replacing the table
> in the translated pages with the transclusion.

That's a good point, but relative links in <<Include>>d blocks are interpreted
relative to the original page - for example, the discussion link on
it/Aptitude points to Aptitude/Discussion, but the same link on
es/Aptitude points to es/Aptitude/Discussion, because the former
uses an <<Include>> while the latter copy/pastes.

So long as we check that e.g. es/Aptitude/Discussion doesn't exist,
I figure it should be safe to change that link.
 
> I guess we'd also need to check for any discrepancies in the list of
> languages in the translation headers. I suppose it would not be impossible
> for a parent page to be missing a link to a translation.
> 
> Which is more pragmatic: updating/fixing these translation headers now, or
> teaching our conversion script to ignore the whole translation header
> (since, iirc, it's not necessary at all on Mediawiki)?

Short answer - the first is more pragmatic, but will need a different approach.

Any solution involves teaching a script to replace translation headers,
doing it now just means we have the opportunity to undo our mistakes :)

To edit that many pages, with dependencies between them, how about:

1. generate a big JSON document like this from the existing dump:
   {
     "Aptitude": {
       "rev": <page-revision>,
       "source": "... original contents ..."
     },
     "es/Aptitude": {
       "rev": <page-revision>,
       "source": "... original contents ..."
     },
     ...
   }
2. write a Perl script to generate all the new versions we need
 * outputs a new JSON document with an extra "dest" key per page
 * mm2mw.pl is in Perl, so may need to borrow from this script
3. write some more sanity-checking scripts
 * e.g. something to `diff <source> <dest>` for each page
4. write something to POST the new page contents and check the response
 * MoinMoin seems to check the revision on POST - will need to check,
   but we can probably handle edit conflicts easily enough
 * possible solutions:
   * paste the whole document into GreaseMonkey
   * use Selenium to remote-control the browser
   * export cookies from Firefox and give them to curl
5. gather all the POST results and update the JSON document
 * update the source contents for pages where the edit was accepted
 * download the latest revision of pages where the edit was rejected
6. for complex edits (e.g. changing the English page before its translations),
   go to 2

Reply to:

Follow-Ups:
- Re: pragma supplementation-page
  - From: "Jonathan Dowland" <jmtd@debian.org>

References:
- pragma supplementation-page (was Re: Conversion recommendations (was Re: Conversion tools work (was Re: MediaWiki next steps)))
  - From: "Jonathan Dowland" <jmtd@debian.org>
- Re: pragma supplementation-page
  - From: Andrew Sayers <2025-bugs.debian.org@pileofstuff.org>
- Re: pragma supplementation-page
  - From: "Jonathan Dowland" <jmtd@debian.org>
- Re: pragma supplementation-page
  - From: Andrew Sayers <2025-bugs.debian.org@pileofstuff.org>
- Re: pragma supplementation-page
  - From: "Jonathan Dowland" <jmtd@debian.org>

Prev by Date: Re: pragma supplementation-page
Next by Date: Re: pragma supplementation-page
Previous by thread: Re: pragma supplementation-page
Next by thread: Re: pragma supplementation-page
Index(es):
- Date
- Thread