Re: Conversion edge cases
On Sun, Jul 27, 2025 at 09:01:15PM +0100, Andrew Sayers wrote:
> I haven't looked at the conversion scripts yet, so I'm not sure how
> this maps to the work that's already been done.
I've now had a quick look through mm2mw.pl, and found more things
to nag people about :)
A few issues are easier to explain in code than in an e-mail -
please see this MR[0] for those.
We've talked before about maybe doing the occasional database reset,
but I suspect we'll need to reset it several times a day for a while.
Guillem has made a good attempt to cover edge cases we'd want to improve
automatically, but the initial import will no doubt produce a bunch of
edge cases we haven't thought of, and will want to fix before release.
`sub ConvertToMWName` in mm2mw.pl just does `s/Category(.*)/Category:$1/`,
which doesn't capture the complexity of the wiki.
First, it doesn't cover some important cases:
* we should do something like s/(.*)\/Discussion$/Talk:$1/
* we should put user pages in the `User:` namespace
(I don't see a list of usernames in the dump, and I bet some people
will have removed CategoryHomepage from their page)
* I would argue we *shouldn't* do s/(.*)PageFragment/Template:$1/,
because transcluding a page fragment works fine in MediaWiki,
and is subtly different to a template (e.g. parameters are not expected)
* I would also argue we *shouldn't* do s/(.*)Template$/Template:$1/ -
these aren't templates in the MW sense, they'd need manual conversion
Second, here are some category edge cases...
These pages match /Category/ but not /(^|\/)Category/:
ReproducibleBuilds/UserCategory
WikiCategory
(note: e.g. Brasil/CategoryDocumentos =~ /\/Category/ but !~ /^Category/)
These non-category pages match /^Category/ and are in CategoryCategory:
CategoryPortalTemplate
CategoryTemplate
These pages match /Category/ but are not in CategoryCategory:
CategoryAntiSpam
CategoryBugs
CategoryDatabase
CategoryDebConf22
CategoryDebianInstaller
CategoryDeveloper
CategoryFrCommandLineInterface
CategoryFrCommunity
CategoryFrDesktopEnvironment
CategoryFrGame
CategoryFrHardware
CategoryFrKernel
CategoryFrQuickInstall
CategoryFrQuickIntroduction
CategoryFrQuickPackageManagement
CategoryFrSoftware
CategoryFrSystemAdministration
CategoryGamepad
CategoryHandheld
CategoryLocalization
CategoryLts
CategoryMIDI
CategoryMail
CategoryMultimedia
CategoryPackageManagement
CategoryProgramming
CategoryProposedDeletion/Discussion
CategoryRelease
CategoryVersionControlSystem
CategoryVirtualization
CategoryWebBrowser
CategoryWifi
InstallingDebianOn/Ratings/CategoryEtchRatedThreeStars
ReproducibleBuilds/UserCategory
These pages contain /CategoryCategory/ in the latest revision,
but do not match /Category/:
BackLink
DebianPeru/Eventos
de/DebianWiki
el/DebianWiki
id/DebianWiki
DebianWiki/Administration
DebianWiki/EditorGuide
es/DebianWiki/EditorGuide
fr/DebianWiki/EditorGuide
id/DebianWiki/EditorGuide
pt_BR/DebianWiki/EditorGuide
pt_PT/DebianWiki/EditorGuide
ru/DebianWiki/EditorGuide
uk/DebianWiki/EditorGuide
zh_CN/DebianWiki/EditorGuide
HelpOnMacros
MigrationStatus
WikiTag
es/WikiTag
pt_BR/WikiTag
ru/WikiTag
uk/WikiTag
ccts
henriquetourinho
nodiscc
These pages match /^Category/ but are redirects:
CategoryBugs
CategoryCommandLineInterface
CategoryCommunity
CategoryDatabase
CategoryDebianDevelopment
CategoryDebianInstaller
CategoryDebianOn
CategoryDesktopEnvironment
CategoryDeveloper
CategoryFrCommandLineInterface
CategoryFrCommunity
CategoryFrDesktopEnvironment
CategoryFrGame
CategoryFrHardware
CategoryFrKernel
CategoryFrQuickInstall
CategoryFrQuickIntroduction
CategoryFrQuickPackageManagement
CategoryFrSoftware
CategoryFrSystemAdministration
CategoryGame
CategoryHandheld
CategoryHardware
CategoryL10nFrUpdate
CategoryLocalization
CategoryMIDI
CategoryModem
CategoryMultimedia
CategoryNetwork
CategoryNetworkApplication
CategoryOfficeApplication
CategoryOpenPGP
CategoryPackageManagement
CategoryPackaging
CategoryProgramming
CategoryQuickInstall
CategoryRelease
CategorySoftware
CategorySound
CategorySystemAdministration
CategorySystemRescue
CategorySystemSecurity
CategoryVersionControlSystem
CategoryVirtualization
CategoryWebBrowser
CategoryXWindowSystem
Speaking for a moment as a significant contributor to the Packaging page
(which CategoryPackaging redirects to), I'm not a fan of the way the wiki
conflates hand-written "portal" pages with auto-generated "category" pages.
I'd be very much in favour of making the redirect pages into categories
that link to the associated page instead.
[0] https://salsa.debian.org/guillem/mm2mw/-/merge_requests/1
Reply to: