Re: Fwd: Romanizare conform cu regulile Academiei Romane
salut,
man utf-8:
Programmers accustomed to single-byte encodings such as US-ASCII or ISO
8859 have to be aware that two assumptions made so far are no longer
valid in UTF-8 locales. Firstly, a single byte does not necessarily
correspond any more to a single character. Secondly, since modern ter-
* All UCS characters > 0x7f are encoded as a multi-byte sequence con-
sisting only of bytes in the range 0x80 to 0xfd, so no ASCII byte can
* The first byte of a multi-byte sequence which represents a single
non-ASCII UCS character is always in the range 0xc0 to 0xfd and indi-
cates how long this multi-byte sequence is. All further bytes in a
multi-byte sequence are in the range 0x80 to 0xbf. This allows easy
resynchronization and makes the encoding stateless and robust against
missing bytes.
man charsets
For most people who use ISO-8859 character sets, this means that the
characters outside of ASCII are now coded with two bytes. This tends to
expand ordinary text files by only one or two percent. For Russian or
man ascii
ASCII is the American Standard Code for Information Interchange. It is
a 7-bit code. Many 8-bit codes (such as ISO 8859-1, the Linux default
character set) contain ASCII as their lower half. The international
counterpart of ASCII is known as ISO 646.
Reply to: