[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: DDTSS suggestions



Þann þri 14.jún 2011 07:47, skrifaði helix84:
On Tue, Jun 14, 2011 at 09:08, Martijn van Oosterhout<kleptog@gmail.com>  wrote:
What I want to know is where there comes from and how are they
inserted. The comment says "rosetta", but is it a program doing the
talking, or are people typing directly. If I add a check to reject
invalidly encoded input, are users going to see it? If not, it may be
better to simply "fix" them (that is, replace broken characters by
question marks).

I don't know where they come from but I have a better suggestion how
to deal with them.
iconv is able to convert between character encodings, even
approximating characters which don't exist in the target encoding.
This is done via utf-8//TRANSLIT as the target encoding. But you have
to tell iconv the input encoding, which we don't know. Luckily, there
is a parser which can guess the input encoding:
http://chardet.feedparser.org/
This is better than just replacing characters with question marks
because it should get the encoding usually right (if it's not
extremely short) and if it doesn't, translators have to do it again
anyway.

Regards,
~~helix84


Maybe this is a part of Canonical better supporting upstream (an issue that has gotten more importance lately), at least there have been talks about it.

If this is due to some sort of automated Rosetta/Launchpad-->DDTSS/Debian scripting thingie, then it's probably better to tell the Rosetta/Launchpad folks that there's an encoding issue.

Just thoughts,

Sveinn í Felli


Reply to: