From: | Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr> |
---|---|
To: | "Kevin McPherson" <kevinmcp(at)en-tranz(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: [HACKERS] Unicode ready? |
Date: | 2002-04-02 20:56:55 |
Message-ID: | 200204022056.g32Kuthk031747@www1.translationforge |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Le Mardi 2 Avril 2002 12:53, vous avez écrit :
> Is PostgreSQL unicode compliant/ready?
> Does it store/export text in Unicode wide-character format, or single
> character strings?
[By the way : there are several Unicode encodings (UTF-8, UTF-16, UCS2).
UTF-8 is the most popular because wide characters are coded using 1 to 3
single ASCII character. Thus UTF-8 extracts can be read in a normal text
editor. On the converse, UTF-16 is coded on 16 bytes, thus can't be read
easily.]
I guess your question was "Is PostgreSQL multi-byte safe and Unicode ready?"
1) Server-side :
a) PostgreSQL needs to be compiled with
--enable-recode
--enable-multibyte
b) Create a database with
CREATE DATABASE foo WITH ENCODING ='UNICODE' (which means UTF-8 in POstgreSQL)
Several other multi-byte encodings are available. In the case of Unicode,
data is stored in UTF-8 format. Data and searches are performed on
wide-characters, not 8 bits characters.
2) Client side
By default connection is done with server encoding. But it is possible to
automatically recode connections on the fly using :
SET CLIENT_ENCODING = Latin9 (this example recodes Unicode streams to Western
European with Euro symbol). It is possible to recode several streams at the
same time.
3) ODBC interface
The current odbc interface provides Unicode UTF-8 Unicode encoding. But
Microsoft platform needs a Unicode UCS-2 encoding (ex: Access 2K). Therefore,
you will be able to view data under OpenOffice but not Microsoft Office.
The new ODBC driver in CVS supports UCS-2.
4) Server side languages
Server-side languages are the traditional weakness of Unicode programming.
When writing code, you need to calculate the lenght of a string, crop the
left side of it, etc... In PHP, this is dones using special mb_string
libraries. Usually, this breaks your code because these libraries provide
additional programming words.
This is not the case in PostgreSQL where all PLpgSQL functions are multi-byte
safe. Because of PHP instability, I ported several functions to PLpgSQL.
PostgreSQL is a pure marvel.
For additional questions, please post to pgsql-general(at)postgresql(dot)org(dot)
Cheers,
Jean-MIchel POURE
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-04-02 21:40:57 | Suggestions please: names for function cachability attributes |
Previous Message | Stephan Szabo | 2002-04-02 20:35:08 | Re: Inheritance referential integrity problem |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-04-02 21:06:27 | Re: maxint reached? |
Previous Message | Daniel Kalchev | 2002-04-02 20:39:33 | Re: maxint reached? |