From: | Dan Sugalski <dan(at)sidhe(dot)org> |
---|---|
To: | "Richard Connamacher" <rich(dot)n1(at)indieimage(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: UTF-8 question. |
Date: | 2004-09-17 01:12:58 |
Message-ID: | a06110407bd6fe9f9673a@[192.168.1.105] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
At 8:39 PM -0400 9/16/04, Richard Connamacher wrote:
>I'm new to PostgreSQL, and from the looks of it, it's a great database,
>and I'll be using more of it in the future.
>
>I had a quick question if anyone could clear this up. The documentation
>for PostgreSQL (version 7.1, the version this server is using) says that
>it supports multibyte character encodings like Unicode (which implies
>UTF-16 encoding).
Don't confuse Unicode, the 'character set' and rules for characters,
represented by a sequence of abstract 32 bit integers, with
UTF-[8|16|32] which is a way to encode those abstract integers into a
stream of bytes someplace.
> Later on, the same page says that Unicode is
>represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode.
>The multibyte version of Unicode is UTF-16.
>
>So, which is it? If I create a database using Unicode as the encoding,
>will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)?
Erm... UTF-8 *is* a multibyte encoding. Up to 6 bytes per code point,
if things get really degenerate. (And, last I checked, means you can
have up to 70 bytes for really degenerate characters, but my memory
might be off (could be 80))
UTF-8, UTF-16, and UTF-32 will all encode Unicode characters just fine.
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
dan(at)sidhe(dot)org have teddy bears and even
teddy bears get drunk
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Glaesemann | 2004-09-17 01:19:40 | Re: UTF-8 question. |
Previous Message | Richard Connamacher | 2004-09-17 00:39:48 | UTF-8 question. |