[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Conversion edge cases



On Sun, Jul 27, 2025 at 09:01:15PM +0100, Andrew Sayers wrote:
> I haven't looked at the conversion scripts yet, so I'm not sure how
> this maps to the work that's already been done.

I've now had a quick look through mm2mw.pl, and found more things
to nag people about :)

A few issues are easier to explain in code than in an e-mail -
please see this MR[0] for those.

We've talked before about maybe doing the occasional database reset,
but I suspect we'll need to reset it several times a day for a while.
Guillem has made a good attempt to cover edge cases we'd want to improve
automatically, but the initial import will no doubt produce a bunch of
edge cases we haven't thought of, and will want to fix before release.

`sub ConvertToMWName` in mm2mw.pl just does `s/Category(.*)/Category:$1/`,
which doesn't capture the complexity of the wiki.

First, it doesn't cover some important cases:

* we should do something like s/(.*)\/Discussion$/Talk:$1/
* we should put user pages in the `User:` namespace
  (I don't see a list of usernames in the dump, and I bet some people
  will have removed CategoryHomepage from their page)
* I would argue we *shouldn't* do s/(.*)PageFragment/Template:$1/,
  because transcluding a page fragment works fine in MediaWiki,
  and is subtly different to a template (e.g. parameters are not expected)
* I would also argue we *shouldn't* do s/(.*)Template$/Template:$1/ -
  these aren't templates in the MW sense, they'd need manual conversion

Second, here are some category edge cases...

These pages match /Category/ but not /(^|\/)Category/:
	ReproducibleBuilds/UserCategory
	WikiCategory

(note: e.g. Brasil/CategoryDocumentos =~ /\/Category/ but !~ /^Category/)

These non-category pages match /^Category/ and are in CategoryCategory:
	CategoryPortalTemplate
	CategoryTemplate

These pages match /Category/ but are not in CategoryCategory:
	CategoryAntiSpam
	CategoryBugs
	CategoryDatabase
	CategoryDebConf22
	CategoryDebianInstaller
	CategoryDeveloper
	CategoryFrCommandLineInterface
	CategoryFrCommunity
	CategoryFrDesktopEnvironment
	CategoryFrGame
	CategoryFrHardware
	CategoryFrKernel
	CategoryFrQuickInstall
	CategoryFrQuickIntroduction
	CategoryFrQuickPackageManagement
	CategoryFrSoftware
	CategoryFrSystemAdministration
	CategoryGamepad
	CategoryHandheld
	CategoryLocalization
	CategoryLts
	CategoryMIDI
	CategoryMail
	CategoryMultimedia
	CategoryPackageManagement
	CategoryProgramming
	CategoryProposedDeletion/Discussion
	CategoryRelease
	CategoryVersionControlSystem
	CategoryVirtualization
	CategoryWebBrowser
	CategoryWifi
	InstallingDebianOn/Ratings/CategoryEtchRatedThreeStars
	ReproducibleBuilds/UserCategory

These pages contain /CategoryCategory/ in the latest revision,
but do not match /Category/:

	BackLink
	DebianPeru/Eventos
	de/DebianWiki
	el/DebianWiki
	id/DebianWiki
	DebianWiki/Administration
	DebianWiki/EditorGuide
	es/DebianWiki/EditorGuide
	fr/DebianWiki/EditorGuide
	id/DebianWiki/EditorGuide
	pt_BR/DebianWiki/EditorGuide
	pt_PT/DebianWiki/EditorGuide
	ru/DebianWiki/EditorGuide
	uk/DebianWiki/EditorGuide
	zh_CN/DebianWiki/EditorGuide
	HelpOnMacros
	MigrationStatus
	WikiTag
	es/WikiTag
	pt_BR/WikiTag
	ru/WikiTag
	uk/WikiTag
	ccts
	henriquetourinho
	nodiscc

These pages match /^Category/ but are redirects:
	CategoryBugs
	CategoryCommandLineInterface
	CategoryCommunity
	CategoryDatabase
	CategoryDebianDevelopment
	CategoryDebianInstaller
	CategoryDebianOn
	CategoryDesktopEnvironment
	CategoryDeveloper
	CategoryFrCommandLineInterface
	CategoryFrCommunity
	CategoryFrDesktopEnvironment
	CategoryFrGame
	CategoryFrHardware
	CategoryFrKernel
	CategoryFrQuickInstall
	CategoryFrQuickIntroduction
	CategoryFrQuickPackageManagement
	CategoryFrSoftware
	CategoryFrSystemAdministration
	CategoryGame
	CategoryHandheld
	CategoryHardware
	CategoryL10nFrUpdate
	CategoryLocalization
	CategoryMIDI
	CategoryModem
	CategoryMultimedia
	CategoryNetwork
	CategoryNetworkApplication
	CategoryOfficeApplication
	CategoryOpenPGP
	CategoryPackageManagement
	CategoryPackaging
	CategoryProgramming
	CategoryQuickInstall
	CategoryRelease
	CategorySoftware
	CategorySound
	CategorySystemAdministration
	CategorySystemRescue
	CategorySystemSecurity
	CategoryVersionControlSystem
	CategoryVirtualization
	CategoryWebBrowser
	CategoryXWindowSystem

Speaking for a moment as a significant contributor to the Packaging page
(which CategoryPackaging redirects to), I'm not a fan of the way the wiki
conflates hand-written "portal" pages with auto-generated "category" pages.
I'd be very much in favour of making the redirect pages into categories
that link to the associated page instead.

[0] https://salsa.debian.org/guillem/mm2mw/-/merge_requests/1


Reply to: