[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Questions regarding utf-8



On Thu, May 08, 2003 at 07:50:50PM -0400, Bob Hilliard wrote:
>      Some third-party dictionaries, such as foldoc and The Jargon File
> occasionally include 8 bit characters, such as 0xe7 for c-cedilla.  In
> order to fix these easily, I would like to know:
> 
>      1.  How can I determine what character encoding is used in a
>          document without manually scanning the entire file?

Hm, I can't thing of an easy way.  Maybe someone knows available tools
to do that.

>      2.  What is the best available filter to convert from encoding X
>          to 7 bit ASCII?

That seems to be recode.  You can convert directly to UTF-8, too.

>      3.  What is the difference between utf-8 and en_US.utf8?

UTF-8 is a multi byte encoding of Unicode and en_US.utf8 is a locale?

-- 
Andreas Bombe <bombe@informatik.tu-muenchen.de>    DSA key 0x04880A44



Reply to: