Questions regarding utf-8

To: debian-devel@lists.debian.org
Subject: Questions regarding utf-8
From: Bob Hilliard <hilliard@debian.org>
Date: Thu, 08 May 2003 19:50:50 -0400
Message-id: <[🔎] 87issl2e1x.fsf@lian.bobhilliard.net>

     The Dict Protocol (RFC 2229) provides that databases shall be
encoded in utf-8.  Since US ASCII is a subset of utf-8, pure ASCII is
acceptable for the databases.

     Some third-party dictionaries, such as foldoc and The Jargon File
occasionally include 8 bit characters, such as 0xe7 for c-cedilla.  In
order to fix these easily, I would like to know:

     1.  How can I determine what character encoding is used in a
         document without manually scanning the entire file?

     2.  What is the best available filter to convert from encoding X
         to 7 bit ASCII?

     3.  What is the difference between utf-8 and en_US.utf8?

     Pointers to the appropriate documentation would be very welcome,
since I feel a need to become more knowledgeable about this subject.

Regards,   

Bob
-- 
   _
  |_)  _  |_    Robert D. Hilliard        <hilliard@debian.org>
  |_) (_) |_)   1294 S.W. Seagull Way     <bob@bobhilliard.net>
                Palm City, FL 34990 USA   GPG Key ID: 390D6559

Reply to:

Follow-Ups:
- Re: Questions regarding utf-8
  - From: Andreas Bombe <bombe@informatik.tu-muenchen.de>
- Re: Questions regarding utf-8
  - From: Sebastian Rittau <srittau@jroger.in-berlin.de>

Prev by Date: Re: Bug#192416: ITP: rsh-redone -- Reimplementation of remote shell tools.
Next by Date: Vanishing package: njamd
Previous by thread: Re: Upcoming removal of orphaned packages
Next by thread: Re: Questions regarding utf-8
Index(es):
- Date
- Thread