[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: how to move to UTF-8 ? (was: An encoding problem)



On Thu, Jul 30, 2009 at 01:35:25PM +0200, Jens Seidel wrote:
> On Thu, Jul 30, 2009 at 01:05:40PM +0200, Simon Paillard wrote:
> > On Wed, Jul 29, 2009 at 06:27:02PM +0200, Frans Pop wrote:
> > > > Moving the website to UTF-8 would allow to get rid of such issues.
> > Could you please describe the steps you have performed and how ?
> > 
> > For what we have identified:
> > - recode wml files (using recode from recode package)
> > find . -type d -exec recode latin1..utf8 {} \;
> 
> "-type d" ? This works for directories?

Obviously "-type f"
 
> I would restrict this to *.wml files.

Indeed, it's better, converting to times from latin to utf8 po files is
not a good idea..

> Some files such as *.inc files need to be handled as well, some for
> text files (some describe mailing lists purposes, ...).

.src files as well (vote results, l10n stats)
./MailingLists/desc/
./devel/debian-jr/

> Let's avoid converting *.png, *.pdf files, OK?

They are ignored by recode (and most of them are in the english
directory).
 
> Or use iconv ...

It has the disadvantagee to actually empty the file if the output is the
same as the input (I know, I could use sponge or some temporarly file..)
  
> > - update the .wmlrc file
> > 	-D CUR_LOCALE=fr_FR.UTF-8
> > 	-D CHARSET=utf-8
> > 
> > - convert charset of po files
> > cd po ; for file in *po ; do msgconv -t UTF-8 -o $file $file ; done
> 
> That should be optional ... (but the strings need to be convertible into
> UTF-8).

msgconv *does* convert the strings to UTF-8, it's not only about the
header.
  
> > - some references to ISO-8859-15 (or old coding) in webpages about
> >   website.
> >   * devel/website/examples.wml et
> 
> s/pour le/for/ ???

(yes :-)
 
> >     international/french/web.wml
> >   * pour la traduction, international/french/traduire.wml
> > - *.UTF-8 locale on www-master -> OK, checked
> > - redirections pages with specified charset
> >   (devel/debian-installer/gtk-frontend.wml and distrib/cd.wml)
> > 
> > Do you see something else ?
> 
> This should be all except:
> Warn all users that the working copy should be clean before an update, as
> otherwise there will be many conflicts.

Frans did change some HTML entities to proper Unicode, but I don't know
which method was used.

-- 
Simon Paillard


Reply to: