[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Questions regarding utf-8



Hi, John Darrington wrote:

> Given a text file, it will attempt to guess the natural language in
> which it was written. I'm sure it would be fairly simple to modify it to
> guess the charset.  If you point me to a reasonably large set of example
> files, I'll see what I can do.

You could use your existing samples, which hopefully include a number of
non-ASCII characters, recode them to UTF-8, and then try a few encodings
-- the German text would typically be in latin-1, latin-15, or one of the
Windows or Mac specific charsets for West or Central Europe.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
-- 
Dimensions will always be expressed in the least usable term.
EXAMPLE: Velocity will be expressed in furlongs per fortnight.



Reply to: