[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fwd: Romanizare conform cu regulile Academiei Romane


man utf-8:  
     Programmers accustomed to single-byte encodings such as US-ASCII or ISO
       8859 have to be aware that two assumptions made so far  are  no  longer
       valid  in  UTF-8  locales.  Firstly, a single byte does not necessarily
       correspond any more to a single character. Secondly, since modern  ter-

       * All  UCS  characters > 0x7f are encoded as a multi-byte sequence con-
         sisting only of bytes in the range 0x80 to 0xfd, so no ASCII byte can
      * The first byte of a multi-byte sequence  which  represents  a  single
         non-ASCII UCS character is always in the range 0xc0 to 0xfd and indi-
         cates how long this multi-byte sequence is. All further  bytes  in  a
         multi-byte  sequence  are in the range 0x80 to 0xbf. This allows easy
         resynchronization and makes the encoding stateless and robust against
         missing bytes.
man charsets

      For most people who use ISO-8859 character sets, this  means  that  the
       characters outside of ASCII are now coded with two bytes. This tends to
       expand ordinary text files by only one or two percent. For  Russian  or
man ascii

       ASCII is the American Standard Code for Information Interchange.  It is
       a  7-bit  code. Many 8-bit codes (such as ISO 8859-1, the Linux default
       character set) contain ASCII as their lower  half.   The  international
       counterpart of ASCII is known as ISO 646.

Reply to: