[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [Groff] Re: groff: radical re-implementation



Hi,

At Thu, 19 Oct 2000 10:40:35 +0200 (CEST),
Werner LEMBERG <wl@gnu.org> wrote:

> Note that such an encoding request has to determine the encoding *and*
> character set of a document (similar to Emacs).
(snip)
> Examples:
>   .\" -*- charset: JIS-X-0208; encoding: EUC -*-
>   .\" -*- charset: JIS-X-0208; encoding: ISO-2022 -*-

No.  only specifying 'encoding' is sufficient.  This is because
'encoding' includes information on which charset to be used.

For example, there are no encodings whose name is 'EUC'.
'EUC' is a generic name for EUC-based encodings (EUC-JP, EUC-KR,
EUC-CN, and EUC-TW).  'EUC' also means a method to build a
encoding which consists of at most four ISO2022-compliant charsets.

Yes, ISO-2022 is a name of encoding.  It consists of many charsets
such as ISO-8859-*, ISO-646-*, JIS-X-0208, KS-X-1001, GB-2312, and
so on so on.  There are many subsets of ISO-2022, such as ISO-2022-JP,
ISO-2022-KR, and so on.  EUC encodings are also subsets of ISO-2022.

Thus, when I specify encoding is ISO-2022-JP, it automatically
says that charsets are US-ASCII, JIS X 0201 (LeftHalf), 
JIS X 0208-1978, and JIS X 0208-1983.  When I specify encoding
is EUC-KR, it automatically says that charsets are US-ASCII and
KS X 1001.


> troff shouldn't notice encoding issues at all and just accept UTF-8.

Yes.

---
Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/



Reply to: