Quick Links

Re: Unicode support

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, - - <crossroads0000(at)googlemail(dot)com>
Subject:	Re: Unicode support
Date:	2009-04-14 15:49:45
Message-ID:	4136ffa0904140849h36bdb5adl8b4e765b1906c4ed@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 14, 2009 at 1:32 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
>> Umm, but isn't that because your encoding is using one code point?
>>
>> See the OP's explanation w.r.t. canonical equivalence.
>>
>> This isn't about the number of bytes, but about whether or not we should
>> count characters encoded as two or more combined code points as a single
>> char or not.
>
> Here is a test case that shows the problem (if your terminal can display
> combining characters (xterm appears to work)):
>
> SELECT U&'\00E9', char_length(U&'\00E9');
> ?column? | char_length
> ----------+-------------
> é | 1
> (1 row)
>
> SELECT U&'\0065\0301', char_length(U&'\0065\0301');
> ?column? | char_length
> ----------+-------------
> é | 2
> (1 row)

What's really at issue is "what is a string?". That is, it a sequence
of characters or a sequence of code points. If it's the former then we
would also have to prohibit certain strings such as U&'\0301'
entirely. And we have to make substr() pick out the right number of
code points, etc.

--
greg

In response to

Re: Unicode support at 2009-04-14 12:32:44 from Peter Eisentraut

Responses

Re: Unicode support at 2009-04-14 16:26:41 from Tom Lane
Re: Unicode support at 2009-04-14 17:12:11 from Kevin Grittner
Re: Unicode support at 2009-04-14 18:19:21 from Peter Eisentraut

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2009-04-14 15:54:33	Re: Unicode string literals versus the world
Previous Message	Tom Lane	2009-04-14 15:16:08	Re: Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1