[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#567781: Conversion of english pages to Unicode, via HTML entities.



Dear Gerfried and WWW team,

my proposition is the following:

1) Bring the English pages into a state where the files are the same
   regardless whether the encoding is iso-8859-1, ascii or utf8.

2) Make the English pages served as utf8 instead of iso-8859-1.

3) If necessary, convert entities to accented characters.

Apart from 2) I am proposing to do up to 100 % of the work, according to how
much others would like to participate.

I think that by large, in the English pages, the characters that are currently
accented are in the persons or location names, as most of the pages that need
to be converted are in the directories users, events, vote, security and News.
I think that the three-step conversion I propose will not interfere with the
possibility of spellchecking the pages that are actively worked on.  Note that
anyway the pages in vote, security and News usually do not have new content
added. 

I propose to use smart-change in the steps 1) and 3), so that the translators
are not disturbed.  I already made a test in February for one page in
devel/debian-med, and it worked – see commit ID 2rdf5isFrcBQZ66v.  Please note
that I am the contributor of the English version and of the only translation of
that page: I took great care of not disturbing other's work.

It is true that in 1) and 3) there is a risk of side effects. I will look for
them and revert them. 

I would like to repeat that I did my best to think about the translators, and
never ever proposed something that would bump their translation-check headers,
because I propose to use smart-change.

At Debcamp, technically, how do you intend to convert the English pages to
unicode without bumping translation-check headers ?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan



Reply to: