[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Conversion edge cases



I got hold of the wiki dump and had a thorough look for conversion edge cases.
If you're interested in mm -> mw conversion, hopefully this will be some help.
If not, do yourself a favour and skip this laundry list of oddities ;)

= Initial notes =

The dump doesn't include some internal pages (like EventStats/UserAgents),
which hopefully don't need to be converted.

I only looked at the most recent version of each page.

I produced this with some simple regexps and a lot of eyeballing -
false positives are likely and false negatives are entirely possible.

I haven't looked at the conversion scripts yet, so I'm not sure how
this maps to the work that's already been done.

= Moin internal pages =

I think these internal MoinMoin pages shouldn't be directly converted:

InterWikiMap
HelpOnDebianMacros
HelpOnHeadlines
HelpOnMacros
LocalBadContent (see https://moinmo.in/AntiSpamGlobalSolution)
FortuneCookies (database for the <<RandomQuote>> macro)
GermanDict (example dictionary used in HelpOnMacros)

= Processing instructions =

See https://moinmo.in/HelpOnProcessingInstructions

= ## comment =

In Moinmoin, a line beginning with "#" is a comment.  In MediaWiki, it's a numbered list.

A converter should take the following:

{{{
# This is a comment, not a list
}}}

And convert it to e.g.:

{{{
<!-- This is a comment, not a list -->
}}}

== #acl ===

The following pages use access control lists -
how will they be handled in the new wiki?

"ACL Trusted:read,write,revert All:read"
	HelpContents
"ACL Trusted:read,write,revert Known:read,write,revert"
	FrontPage
	el/FrontPage
"acl AtlasJobinson:read Known:read,write,delete,revert,admin"
	zh_CN/FrontPage
"acl FilipusKlutiero:read Known:read,write,delete,revert,admin"
	/es/KDE
	KDE
	SummerOfCode2009/KDE-based-packagemanager
"acl FilipusKlutiero:read Trusted:read,write,delete,revert Known:read,write,delete,revert All:read"
	Teams/Publicity
"acl OutreachAdminGroup:admin,read,write,delete,revert All:read"
	SummerOfCode2019
"acl OutreachAdminGroup:read,write,delete,revert All:read"
	OutreachAdminGroup
	OutreachGroup
"acl OutreachAdminGroup:read,write,delete,revert OutreachGroup:read,write,revert All:read"
	SummerOfCode2019/ApprovedProjects
"acl Trusted:read,write,delete,revert Known:read,write,delete,revert All:read"
	SummerOfCode2019/UnApprovedProjects
"acl xts:read Trusted:read,write,delete,revert Known:read,write,delete,revert All:read"
	DebianHosting

== #deprecated ==

I don't see any evidence of this being used on the site.

== #format ==

Some pages use "#FORMAT" / "#format" to indicate the parser the page should use.
"wiki" is widely used (and does nothing), the rest are only used on these pages:

"FORMAT plain"
	InterWikiMap
	LocalBadContent
	TheDebianChoirOfComplaintsBirminghamLyrics
	TheDebianChoirOfComplaintsFinnishLyrics

"FORMAT IRC"
	DebianEdu/HowTo/TranslatePackages
	DebianEdu/Meeting/20060119/log
	DebianEdu/Meeting/20060206/log
	DebianEdu/Meeting/20060306/log
	DebianEdu/Meeting/20060320/log
	DebianEdu/Meeting/20060403/log
	DebianEdu/Meeting/20060410-WebInt/log
	DebianEdu/Meeting/20060501/log
	DebianEdu/Meeting/20060529/log
	DebianEdu/Meeting/20061016/log
	DebianEdu/Meeting/20061030/log
	DebianEdu/Meeting/20061113/log
	DebianEdu/Meeting/20070329/log
	DebianEdu/Meeting/20070426/log
	DebianEdu/Meeting/20070510/log
	DebianEdu/Meeting/20070520/log
	DebianEdu/Meeting/20070813/log
	DebianEdu/Meeting/20090812/log
	DebianEdu/Meeting/20091015/log
	DebianEdu/Meeting/20101108/log
	DebianEdu/MeetingAdminTool/Log
"FORMAT IRC " (with trailing space)
	DebianEdu/Meeting/20060220/log
"FORMAT irc"
	DebianEdu/Meeting/20091218/log
"FORMAT java"
	JschDocumentation/SampleCodes

== #language ==

The following pages have "#language" on a line on its own:

"language"
	PortsSparc
	XFreeConfig
	zh_CN/Opera

About 365 pages have a "#language pt-br" instead of "pt_BR",
one page has "#language zh-TW" instead of "zh_TW", and 495 pages
have other "#language"s that don't match their namespace
(e.g. CategoryDebianJP is in Japanese).

I can provide the page names if there's interest, but think about
the amount of work you're volunteering for ;)

On the MediaWiki side, {{ROOTPAGENAME}} should give the language part
of a page's name, but it behaved the same as {{PAGENAME}}
when I tried it on my wiki.

If you want to keep "#language" instructions,
here's an untested draft of a possible solution:

1. define Template:LanguageFromPageName with contents
   {{#titleparts: {{ROOTPAGENAME}} | 1 }}
2. look for pages where the "#language" matches the namespace,
   after downcasing and doing s/[-_]//
3. replace the language string with something like
   #language {{LanguageFromPageName}}
4. if the language is "en" and the namespace doesn't look like
   a language, consider deleting it?
5. otherwise use the language specified on the page

== #pragma ==

"#pragma description" is widely used.

"#pragma hide-title true" is used on FrontPage and its localisations,
plus the following pages:

	DanielFruhauf
	DebianMed/Meeting
	DebianMed/Meeting/Luebeck2011
	DebianMed/Meeting/Southport2012
	el/FrontPage-old
	el/FrontPage-squeeze
	ja/Debian-Onsen
	ja/Debian-Onsen/2008
	ja/Debian-Onsen/2011

... I'm not sure it does anything though?

"#pragma keywords" is widely used.

"#pragma section-numbers 0" and "pragma section-numbers off" are used
on several pages, but don't seem to do anything.  For example, see
BOINC/ServerGuide (`#pragma section-numbers 0`) and ApacheCassandra
(`#pragma section-numbers off`).  Both pages have numbered tables
of contents.

Some pages have "#pragma section-numbers 2" or even "... 3", but again
these don't seem to do anything.

"pragma supplementation-page on" is only used on "DebConf/25/BrestByTrain".
It doesn't seem to do anything.

== #redirect ==

These are widely used, but note that MoinMoin uses "#redirect PAGENAME",
whereas MediaWiki prefers "#redirect [[PAGENAME]]" with square brackets.

== #refresh ==

This seems to be disabled on the current wiki, and a quick search doesn't show
a MediaWiki alternative.  I guess convert these to comments?

Uses on the site:

"REFRESH 0 http://wiki.debconf.org/wiki/Videoteam/Sprint2011";
	Sprints/2011/VideoSprint
"REFRESH 0 https://wiki.ubuntu.com/JonathanFerguson";
	JonathanFerguson
"REFRESH 5 [PkgSplit]"
	Making_Multiple_Packages_From_A_Single_Source_Tarball
"refresh 5 vi/FrontPage"
	vi

= Macros =

See https://moinmo.in/HelpOnMacros

== AbandonedPages, AdvancedSearch ==

I don't see any evidence of these being used on the site.

== <<Action>> ==

Action is used in the following places:

"Action(edit, Edit this section)"
	EricVeirasGalisson/ChangeLog
"Action(info, This page info)"
	PeterParshutov/MacrosTests
"Action(recoverpass)"
	DebianWiki/EditorGuide
	es/DebianWiki/EditorGuide
	id/DebianWiki/EditorGuide
	pt_BR/DebianWiki/EditorGuide
	ru/DebianWiki/EditorGuide
	uk/DebianWiki/EditorGuide
	zh_CN/DebianWiki/EditorGuide
"Action(recoverpass,Recuperar palavra-passe)"
	pt_PT/DebianWiki/EditorGuide
"Action(subscribe)"
	DebianWiki/Privacy
"Action(userprefs,preferências do utilizador,sub=prefs)"
	pt_PT/DebianWiki/EditorGuide
"Action(userprefs,sub=prefs)"
	DebianWiki/EditorGuide
	es/DebianWiki/EditorGuide
	fr/DebianWiki/EditorGuide
	id/DebianWiki/EditorGuide
	pt_BR/DebianWiki/EditorGuide
	ru/DebianWiki/EditorGuide
	uk/DebianWiki/EditorGuide
	zh_CN/DebianWiki/EditorGuide

== <<Anchor>> ==

This is widely used.

== <<AttachList>> ==

This is widely used, but never with an argument.

== <<BR>> ==

This is widely used.

== <<Date>> ==

Used on several pages, always with a datetime argument of the form
YYYY-MM-DDTHH:MM:SSZ.

== <<DateTime>> ==

The following pages use DateTime without an argument, causing the pages
to be re-rendered on every refresh:

"DateTime"
	Derivatives/Census/Apertis
	Derivatives/Census/Finnix
	InstallingDebianOn/HP/Mini210/wheezy
	de/InstallingDebianOn/HP/Mini210/Wheezy
"DateTime()"
	Derivatives/Census/BunsenLabs
	Derivatives/Census/Inquisitor
	Derivatives/Census/Q4OS
	Derivatives/Census/Whonix
	git-pbuilder/Discussion

Apart from that, this is used on several pages, always with a datetime argument
of the form YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DDTHH:MM:SS+TTTT.

== <<DebianCodename>>, <<DebianCodenameLink>>, <<DebianDate>>, <<DebianVersion>> ==

These are widely used, with the following arguments:

	oldoldstable
	oldstable
	stable
	testing
	unstable
	experimental

== <<FootNote>> ==

This is widely used.

== <<FullSearch>> ==

The following are too widely-used to list:

"FullSearch" (on pages matching "^Category")

"FullSearch(Category.*)" and "FullSearch(category:Category.*)"

"FullSearch(language:.* Category.*)" and (language:.* category:Category.*)"

I'd recommend replacing the explicit "language:.*" like with "#language", above.


Here are the examples that don't match any of the above patterns:

"FullSearch" (excluding pages matching "^Category")
	DebianColombia
	DebianGuatemala

"FullSearch(category:NetworkApplication)"
	BOINC/Server
"FullSearch(cipuxfirstlevelpage)"
	CipUX

"FullSearch(language:en CategoryNetwork -category:CategoryNetworkApplication)"
	Network
	fa/Network
	it/Network
	pt_BR/Network
"FullSearch(language:it CategoryNetwork -category:CategoryNetworkApplication)"
	it/Network
"FullSearch(language:uk CategoryNetwork -category:CategoryNetworkApplication)"
	uk/Network
"FullSearch(linkto:CleanMe)"
	CleanMe
"FullSearch(linkto:FixMe)"
	FixMe
"FullSearch(linkto:ToDo)"
	ToDo
"FullSearch(linkto:UseBTS)"
	UseBTS
"FullSearch(t:re:^InstallingDebianOn/MSI/.*)"
	fr/InstallingDebianOn/MSI
"FullSearch(t:re:^fr/InstallingDebianOn/MSI/.*)"
	fr/InstallingDebianOn/MSI
"FullSearch(title:Font)"
	Fonts
	fr/Fonts
	ko/Fonts
	pt_BR/Fonts
"FullSearch(title:Ruby)"
	Ruby
	it/Ruby
	ja/Ruby

== <<FullSearchCached>> ==

The widely-used <<FullSearch>> examples occur here too:

"FullSearchCached" (on pages matching "^Category")

"FullSearchCached(Category.*)" and "FullSearchCached(category:Category.*)"

"FullSearchCached(language:.* Category.*)" and (language:.* category:Category.*)"

Also, this minor variation occurs quite often:

"FullSearchCached(Category.* language:.*)" and (category:Category.* language:.*)"


Here are the examples that don't match any of the above patterns:

"FullSearchCached(CategoryPermalink -title:CategoryPermalink -title:DebianWiki -category:CategoryHomepage)"
	CategoryPermalink
"FullSearchCached(CategoryProposedDeletion -title:CategoryProposedDeletion -title:DebianWiki)"
	CategoryProposedDeletion
"FullSearchCached(CategoryProprietarySoftware -title:CategoryProprietarySoftware -category:CategoryHomepage)"
	CategoryProprietarySoftware
"FullSearchCached(category:CategoryGame category:CategoryNotNative language:en)"
	fr/Game
	pt_BR/Game
"FullSearchCached(category:CategoryGameAdventure language:en)"
	Game
	it/Game
	pt_BR/Game
	uk/Game
"FullSearchCached(category:CategoryGameAdventure language:fr)"
	fr/Game
"FullSearchCached(manpages.debian.net)"
	DebianWiki/ObsoleteText
"FullSearchCached(re:^Proposals/[^/]*$)"
	Proposals
"FullSearchCached(regex:http[^s])"
	DebianWiki/ObsoleteText
"FullSearchCached(t:re:^FooBar.*)"
	MigrationStatus
"FullSearchCached(t:re:^InstallingDebianOn/MSI/.*)"
	InstallingDebianOn/MSI

== <<GetText>> ==

Widely used, with the following arguments:

	Anmelden
	BenutzerEinstellungen
	Comments
	Delete Cache
	Edit
	General Page Infos
	Get Info
	Get info
	Info
	Informations
	Login
	Mail me my account data
	Notification
	Page
	Preferences
	RecentChanges
	Settings
	The following users subscribed to this page:
	UserPreferences
	subscribe
	设置

== <<GetText2>>, <<GetVal>>, <<GoTo>>, <<HighlighterList>> ==

I don't see any evidence of these being used on the site.

== <<Hits>> ==

Used twice, with no arguments:

"Hits"
	PeterParshutov
	PeterParshutov/MacrosTests

== <<Icon>> ==

Widely used, with the following arguments:

	admon-caution.png
	admon-important.png
	admon-note.png
	admon-tip.png
	admon-warning.png
	alert.png
	angry.png
	attach.png
	attention.png
	biggrin.png
	checkmark.png
	devil.png
	home
	icon-info.png
	idea.png
	image
	moin-search.png
	star_off.png
	star_offn.png
	star_on.png

== <<Include>> ==

Widely used, often with "from" and "to" arguments.  Several pages use other
arguments, for example DebianDay/2015 has a 
<<Include(^DebianDay/2015/.*$,,to=\"\\[?\\[?Category\",sort=ascending, editlink)>>

== <<MailTo>> ==

Widely used, seems to show the mangled text to logged-out users and a nice
e-mail address when logged in.  See e.g. KevinCoyner.

== <<MessageId>> // <MID>> ==

Widely used.

== <<MonthCalendar>> ==

Used in a few places, without arguments:

"MonthCalendar"
	FreedomBox/ProgressCalls
	LocalGroups/DebianRheinMain
	Teams/pkg-xmpp/BoF
	de/Community/XMPPDebianUserGerman

== <<Navigation>> ==

Used in a few places:

"Navigation(children)" and "Navigation(children,<maxdepth>)"
	Widely used
"Navigation(scheme=children)"
	Javascript/Nodejs/Tasks
"Navigation(siblings)" and "Navigation(siblings,<maxdepth>)"
	Widely used
"Navigation(slides)"
	Brasil/GUD/RS/FotosConfraternizacaoDebianRsDezembro2009/100_debian_rs_fim_ano_2009
	Brasil/GUD/RS/FotosConfraternizacaoDebianRsDezembro2009/200_debian_rs_fim_ano_2009
	Brasil/GUD/RS/FotosDebianDayBrasil2009RsPortoAlegre/100OficinaEmpacotamento1
	Brasil/GUD/RS/FotosDebianDayBrasil2009RsPortoAlegre/200OficinaEmpacotamento2
	Brasil/GUD/RS/FotosDebianDayBrasil2009RsPortoAlegre/300PalestrasTarde1
	Brasil/GUD/RS/FotosDebianRsColegioMesquita20091125/100TurmaAnimada
	Brasil/GUD/RS/FotosDebianRsColegioMesquita20091125/200Curiosidade
	Brasil/GUD/RS/FotosDebianRsColegioMesquita20091125/300SalaCheia
"Navigation(slideshow)"
	Brasil/GUD/RS/FotosConfraternizacaoDebianRsDezembro2009
	Brasil/GUD/RS/FotosDebianDayBrasil2009RsPortoAlegre
	Brasil/GUD/RS/FotosDebianRsColegioMesquita20091125

== <<NewPage>> ==

Widely used, in several different ways - here are some examples:

"NewPage(,Add an event,DebianPeru/Eventos)"
	DebianPeru/Eventos
"NewPage(DebianDay/Template,Add my city!,DebianDay/2017)"
	DebianDay/2017
"NewPage(InstallingDebianOn/ComputerTemplate, , InstallingDebianOn/Zotac)"
	InstallingDebianOn/Zotac
"NewPage(Lintian/Tags/Template, Edit, Lintian/Tags)"
	Lintian/Tags/Contributing

(there are many more examples, which seem to be variations on these themes)

== <<OrphanedPages>> ==

I don't see any evidence of this being used on the site.

== <<PageCount>> ==

Used in a few places:

"PageCount"
	PageCount
"PageCount(exists)"
	PageCount
	PeterParshutov/MacrosTests

== <<PageList>> ==

Widely used, with both plain text and regular expression arguments.

== <<PageSize>>, <<RandomPage>> ==

I don't see any evidence of these being used on the site.

== <<RandomQuote>> ==

Used in a few places:

"RandomQuote"
	PeterMello/FortuneCookies
"RandomQuote()"
	PeterParshutov/MacrosTests
"RandomQuote(PeterMello/FortuneCookies)"
	PeterMello

== <<RecentChanges>>, <<StatsChart>>, <<SystemAdmin>>, <<SystemInfo>> ==

I don't see any evidence of this being used on the site.

== <<TableOfContents>> ==

Widely used, without arguments, with "()" and with various numeric arguments.

== <<TeudView>> ==

I don't see any evidence of this being used on the site.

== <<TitleIndex>> ==

Used once:

"TitleIndex"
	Brasil/IndiceTitulos

== <<TitleSearch>> ==

Used once:

"TitleSearch"
	DevelopersCorner

== <<WikiConfig>>, <<WikiConfigHelp>>, <<WordIndex>> ==

I don't see any evidence of these being used on the site.


= Parsers =

See https://moinmo.in/HelpOnParsers

== creole, ReST ==

I don't see any evidence of these being used on the site.

== XML/XSLT/DocBook ==

"#!XML" is used on MobileBroadbandProviderInfo

== wiki ==

Used on the following pages as just "#!wiki":

	BSP/2018/10/de/Karlsruhe
	BSP/2019/01/ca/Montreal
	BSP/2019/01/us/Austin
	BSP/2019/04/ca/Toronto
	BSP/2023/02/ca/Montreal
	Migrate32To64Bit
	ru/Migrate32To64Bit

Widely used to create the following admonitions
(I haven't checked which of them actually work):

	"wiki Debian"
	"wiki blue"
	"wiki blue/dotted"
	"wiki blue/solid"
	"wiki caution"
	"wiki codearea"
	"wiki comXment"
	"wiki comment"
	"wiki comment Потым дапішам:"
	"wiki comment/dashed"
	"wiki comment/dotted"
	"wiki comment/dotted "
	"wiki comment/solid"
	"wiki dashed"
	"wiki debian"
	"wiki dotted"
	"wiki green/solid"
	"wiki important"
	"wiki nota"
	"wiki note"
	"wiki note "
	"wiki pink/solid"
	"wiki red/solid"
	"wiki red/solid\tcaution\t\tcomment/dashed"
	"wiki solid"
	"wiki table-of-contents"
	"wiki tabular"
	"wiki tip"
	"wiki warning"
	"wiki warning "
	"wiki warning Attention"
	"wiki white/solid"
	"wiki yellow"
	"wiki yellow/solid"

== CSV ==

Used in a few places:

"#!CSV autofilter=Processor,RAM,Hard_disk,Battery,Coreboot_tested"
	InstallingDebianOn/Acer/C710-2615-Chromebook
"#!CSV autofilter=Region,RAM,Hard_disk"
	InstallingDebianOn/Acer/C740_Chromebook
"#!csv"
      "Hugepages"
"#!csv ,"
	Teams/RoboticsTeam/DhROS
"#!csv delimiter=,"
	ShengjingZhu/Go
	VGAPassthrough

== diff, cplusplus, python, java, pascal, irc/irssi parsers ==

(these are apparently now just shortcuts for `#!highlight <name>`)

Used in a few places:

"#!cplusplus numbers=off"
	Games/Tux4Kids
"#!diff"
	M68k/OpenJDK
"#!diff numbers=disable"
	MigrationStatus
"#!irc"
	IRC/debian-tech/Logs/20050927-multiarch
	IRC/debian-tech/Logs/20051009-releasequalification
	IRC/debian-tech/Logs/20051120-python-policy
	IRC/debian-tech/Logs/20051230-archivequalification
	ReportbugWeb
	Wiki(e382b5e383b3e38389e3839ce38383e382afe382b9)
	multiarch/irc-20040225
	rly-free-software
"#!java"
	zh_CN/DebianWiki/WikiSandBox
"#!makefile"
	Python/LibraryStyleGuide
	mozilla-devscripts

== highlight ==

Widely used, with the following lexers, sometimes also with
`numbers=disable`, `numbers=off` or even `numbers=no`:

Ada
Bash
Kconfig
Make
apache
basemake
bash
c
clojure
console
control
debcontrol
diff
elisp
ini
irc
java
javascript
json
make
makefile
man
moin
mysql
perl
php
python
sh
shell
sources.list
sourceslist
sql
text
verilog
xslt
yaml


= Interwiki links =

The following interwiki links are used (several items from the InterWikiMap
don't seem to be used at all):

NAME		COUNT
AliothList	2
Bug		38
DebPkg		782
Debian		6
DebianBTSUser	27
DebianBug	3075
DebianIRC	6
DebianList	57
DebianMachine	16
DebianMan	2195
DebianPackage	320
DebianPkg	13958
DebianPkgShots	6
DebianPts	545
DebianWiki	19
LaunchpadBug	1
UbuntuBug	2
UbuntuWiki	28
WikiPedia	1098
Wikipedia	1


Reply to: