[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Questions regarding utf-8



On Fri, May 09, 2003 at 10:29:28AM +0200, Nicolas Bertolissio wrote:
> Le vendredi  9 mai 2003, Sebastian Rittau écrit :

> > iconv is the generic libc-based converter:
> > 
> >   iconv --from=SRC-ENCODING --to=DST-ENCODING SRC-FILE >DST-FILE
> no, according to what I've tested (I'm trying to turn around this
> conversion trouble for the ddts), iconv is a bijective function, it
> cannot convert if some characters don't exist in the destination
> encoding.

I forgot to mention TRANSLIT (thanks, Andreas). A UTF-8 to ISO-8859
conversion (and the reverse transformation) can't be bijective, since
ISO-8859 encodings are limited to 2^8 different characters, while
Unicode defines at least 2^16 different code points. (2^18 with extended
plains, 2^31 as absolute limit.)

With TRANSLIT you have at least a valid mapping, though this is not
reversible. (Since it's not injective, either - it can't be.)

 - Sebastian



Reply to: