Re: Postgres - Unicode - Problem

To: Debian Developers <debian-devel@lists.debian.org>, Debian PostgreSQL Liste <debian-postgresql@cochrane.atnet.at>
Subject: Re: Postgres - Unicode - Problem
From: Ulrich Eckhardt <uli@doommachine.dyndns.org>
Date: Fri, 13 Jun 2003 08:19:29 +0200
Message-id: <[🔎] 200306130819.29992.uli@doommachine.dyndns.org>
Reply-to: Ulrich Eckhardt <doomster@knuut.de>
In-reply-to: <[🔎] Pine.LNX.4.44.0306111521080.28918-100000@wr-linux02.rki.ivbb.bund.de>
References: <[🔎] Pine.LNX.4.44.0306111521080.28918-100000@wr-linux02.rki.ivbb.bund.de>

On Wednesday 11 June 2003 15:23, Andreas Tille wrote:
> CREATE DATABASE $dbtocreate ENCODING 'unicode';

I seem to remember that pg also offered something like UTF8. The point is that 
'Unicode' is in most places just a buzzword. Especially in this case, the 
exact encoding would be much better as Unicode can be represented with 
several encodings.

> INSERT INTO i18n_translations(lang, orig, trans) values
> 	('de_DE', 'public', 'öffentlich');
>
> ERROR:  Unicode >= 0x10000 is not supported

So, this looks like it can only take UCS2 or UTF16. However, the question is 
in what way did it interpret the command to get to a character with a 
codepoint >= 0x10000 ? 
Possible ways:
- UCS4: here, one char uses four bytes, but that should already have failed 
for the commands before then
- USC2/UTF16: two bytes per char(plus sequences for UTF16), else the same as 
above
- UTF8: one byte per char but multibyte-chars being rather common. I'm not 
sure how it could interpret this, but try saving it as UTF8 (and _not_ 
ISO8859-1, which many editors[1] silently do).
- ASCII: using a 'signed char', they might end up with a negative codepoint 
for the umlaut, resulting in an underflow and the above error.

As a last thing, there is the possibility (albeit small) that you cannot use 
this in a script but only via some 'real' API (but I might be drifting into 
obscure speculations here).

good luck
Ulrich Eckhardt

[1] apt-get install yudit
That is a rather capable editor that understands several encodings.

Reply to:

Follow-Ups:
- Re: Postgres - Unicode - Problem
  - From: Andreas Tille <tillea@rki.de>

References:
- Postgres - Unicode - Problem
  - From: Andreas Tille <tillea@rki.de>

Prev by Date: LA Meeting
Next by Date: Re: Postgres - Unicode - Problem
Previous by thread: Re: Postgres - Unicode - Problem
Next by thread: Re: Postgres - Unicode - Problem
Index(es):
- Date
- Thread