[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Questions regarding utf-8

     The Dict Protocol (RFC 2229) provides that databases shall be
encoded in utf-8.  Since US ASCII is a subset of utf-8, pure ASCII is
acceptable for the databases.

     Some third-party dictionaries, such as foldoc and The Jargon File
occasionally include 8 bit characters, such as 0xe7 for c-cedilla.  In
order to fix these easily, I would like to know:

     1.  How can I determine what character encoding is used in a
         document without manually scanning the entire file?

     2.  What is the best available filter to convert from encoding X
         to 7 bit ASCII?

     3.  What is the difference between utf-8 and en_US.utf8?

     Pointers to the appropriate documentation would be very welcome,
since I feel a need to become more knowledgeable about this subject.


  |_)  _  |_    Robert D. Hilliard        <hilliard@debian.org>
  |_) (_) |_)   1294 S.W. Seagull Way     <bob@bobhilliard.net>
                Palm City, FL 34990 USA   GPG Key ID: 390D6559 

Reply to: