[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Postgres - Unicode - Problem



On Wednesday 11 June 2003 15:23, Andreas Tille wrote:
> CREATE DATABASE $dbtocreate ENCODING 'unicode';

I seem to remember that pg also offered something like UTF8. The point is that 
'Unicode' is in most places just a buzzword. Especially in this case, the 
exact encoding would be much better as Unicode can be represented with 
several encodings.

> INSERT INTO i18n_translations(lang, orig, trans) values
> 	('de_DE', 'public', 'öffentlich');
>
> ERROR:  Unicode >= 0x10000 is not supported

So, this looks like it can only take UCS2 or UTF16. However, the question is 
in what way did it interpret the command to get to a character with a 
codepoint >= 0x10000 ? 
Possible ways:
- UCS4: here, one char uses four bytes, but that should already have failed 
for the commands before then
- USC2/UTF16: two bytes per char(plus sequences for UTF16), else the same as 
above
- UTF8: one byte per char but multibyte-chars being rather common. I'm not 
sure how it could interpret this, but try saving it as UTF8 (and _not_ 
ISO8859-1, which many editors[1] silently do).
- ASCII: using a 'signed char', they might end up with a negative codepoint 
for the umlaut, resulting in an underflow and the above error.

As a last thing, there is the possibility (albeit small) that you cannot use 
this in a script but only via some 'real' API (but I might be drifting into 
obscure speculations here).

good luck
Ulrich Eckhardt


[1] apt-get install yudit
That is a rather capable editor that understands several encodings. 



Reply to: