Quick Links

Re: Unicode support

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "- -" <crossroads0000(at)googlemail(dot)com>
Subject:	Re: Unicode support
Date:	2009-04-14 12:32:44
Message-ID:	200904141532.44618.peter_e@gmx.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
> Umm, but isn't that because your encoding is using one code point?
>
> See the OP's explanation w.r.t. canonical equivalence.
>
> This isn't about the number of bytes, but about whether or not we should
> count characters encoded as two or more combined code points as a single
> char or not.

Here is a test case that shows the problem (if your terminal can display
combining characters (xterm appears to work)):

SELECT U&'\00E9', char_length(U&'\00E9');
?column? | char_length
----------+-------------
é | 1
(1 row)

SELECT U&'\0065\0301', char_length(U&'\0065\0301');
?column? | char_length
----------+-------------
é | 2
(1 row)

In response to

Re: Unicode support at 2009-04-13 19:39:58 from Andrew Dunstan

Responses

Re: Unicode support at 2009-04-14 15:49:45 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2009-04-14 12:36:35	Re: Unicode support
Previous Message	Andrew Dunstan	2009-04-14 12:10:54	Re: Unicode string literals versus the world