[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#99324: Default charset should be UTF-8



On Fri, Jun 08, 2001 at 09:39:24PM +0900, Fumitoshi UKAI wrote:
...
> 
> First of all, JIS means Japanese Industry Standards, it's not only for
> character sets/encoding.  JIS means many standards for industrial worlds,
> such as screw size or so.
> 
> Anyway, in this context, I assume JIS you say is JIS X0208. This is just
> character sets not encoding.  We usually uses JIS X0208 with ASCII in 
> ISO 2022 encoding.  When ASCII->G0 and JIS X0208->G1 and G0->GL G1->GR,
> we call it as EUC-JP (presicely, supplementary character sets is used
> for G2/G3).  In Japanese linux environments, we usually uses EUC-JP, because
> it's most simplest encoding for Japanese for now.
> When initially ASCII->G0 and G0->GL, and switch ASCII to JIS X0208 
> with ESC $ B and switch back with ESC ( B, we call it as JIS 7bit encodig
> or commonly ISO-2022-JP.  We use this encoding for Internet message for

well... it seems to be a stateful (sp?) encoding scheme...
while this is OKish for text documents and mail messages,
it is definitely not suitable for file names and similar

utf-8 is stateless, you can always (in the middle of the stream)
tell what is this character and what was previous character etc...


-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



Reply to: