On Mon Sep 8, 2025 at 12:50 PM BST, Andrew Sayers wrote:
I've been playing around with `bin/get-interesting-strings.pl` today. I'll make it easier to use and add it to the README once I've slept on it, but for now you need to create a `data` symlink in the repo's base directory, pointing to the dump's `data` directory. Then `make interesting-strings.txt` will create a tab-separated value file with interesting snippets from the wiki. The HEAD commit adds /Discussion links, and finds 1,059 of them :s
Argh that's a lot.Eyeballing the list, many (not sure *how* many) are translations, with the Discussion link embedded in a table with the translation links (the "translation header").
Current best practice for the translation header is for translated pages to transclude it from the parent page. But, implementing that for existing pages is more work than just fixing the Discussion link: it means first making sure the parent page has the header markers, then replacing the table in the translated pages with the transclusion.
I guess we'd also need to check for any discrepancies in the list of languages in the translation headers. I suppose it would not be impossible for a parent page to be missing a link to a translation.
Which is more pragmatic: updating/fixing these translation headers now, or teaching our conversion script to ignore the whole translation header (since, iirc, it's not necessary at all on Mediawiki)?
-- Please do not CC me for listmail. Jonathan Dowland jmtd@debian.org https://jmtd.net