Re: UTF-8 question.

From: Dan Sugalski <dan(at)sidhe(dot)org>
To: "Richard Connamacher" <rich(dot)n1(at)indieimage(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: UTF-8 question.
Date: 2004-09-17 01:12:58
Message-ID: a06110407bd6fe9f9673a@[192.168.1.105]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At 8:39 PM -0400 9/16/04, Richard Connamacher wrote:
>I'm new to PostgreSQL, and from the looks of it, it's a great database,
>and I'll be using more of it in the future.
>
>I had a quick question if anyone could clear this up. The documentation
>for PostgreSQL (version 7.1, the version this server is using) says that
>it supports multibyte character encodings like Unicode (which implies
>UTF-16 encoding).

Don't confuse Unicode, the 'character set' and rules for characters,
represented by a sequence of abstract 32 bit integers, with
UTF-[8|16|32] which is a way to encode those abstract integers into a
stream of bytes someplace.

> Later on, the same page says that Unicode is
>represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode.
>The multibyte version of Unicode is UTF-16.
>
>So, which is it? If I create a database using Unicode as the encoding,
>will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)?

Erm... UTF-8 *is* a multibyte encoding. Up to 6 bytes per code point,
if things get really degenerate. (And, last I checked, means you can
have up to 70 bytes for really degenerate characters, but my memory
might be off (could be 80))

UTF-8, UTF-16, and UTF-32 will all encode Unicode characters just fine.
--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
dan(at)sidhe(dot)org have teddy bears and even
teddy bears get drunk

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Glaesemann 2004-09-17 01:19:40 Re: UTF-8 question.
Previous Message Richard Connamacher 2004-09-17 00:39:48 UTF-8 question.