Re: [Groff] Re: groff: radical re-implementation
At Thu, 19 Oct 2000 10:40:35 +0200 (CEST),
Werner LEMBERG <firstname.lastname@example.org> wrote:
> Note that such an encoding request has to determine the encoding *and*
> character set of a document (similar to Emacs).
> .\" -*- charset: JIS-X-0208; encoding: EUC -*-
> .\" -*- charset: JIS-X-0208; encoding: ISO-2022 -*-
No. only specifying 'encoding' is sufficient. This is because
'encoding' includes information on which charset to be used.
For example, there are no encodings whose name is 'EUC'.
'EUC' is a generic name for EUC-based encodings (EUC-JP, EUC-KR,
EUC-CN, and EUC-TW). 'EUC' also means a method to build a
encoding which consists of at most four ISO2022-compliant charsets.
Yes, ISO-2022 is a name of encoding. It consists of many charsets
such as ISO-8859-*, ISO-646-*, JIS-X-0208, KS-X-1001, GB-2312, and
so on so on. There are many subsets of ISO-2022, such as ISO-2022-JP,
ISO-2022-KR, and so on. EUC encodings are also subsets of ISO-2022.
Thus, when I specify encoding is ISO-2022-JP, it automatically
says that charsets are US-ASCII, JIS X 0201 (LeftHalf),
JIS X 0208-1978, and JIS X 0208-1983. When I specify encoding
is EUC-KR, it automatically says that charsets are US-ASCII and
KS X 1001.
> troff shouldn't notice encoding issues at all and just accept UTF-8.
Tomohiro KUBOTA <email@example.com>