Quick Links

Re: Unicode string literals versus the world

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Unicode string literals versus the world
Date:	2009-04-16 14:54:16
Message-ID:	17658.1239893656@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> I'd never heard of UTF-16 surrogate pairs before this discussion and
> hence didn't realise that it's valid to have a surrogate pair in place
> of a single code point. The docs say that <D800 DF02> corresponds to
> U+10302, Python would appear to follow my intuitions in that:

> ord(u'\uD800\uDF02')

> results in an error instead of giving back 66306, as I'd expect. Is
> this a bug in Python, my understanding, or something else?

I might be wrong, but I think surrogate pairs are expressly forbidden in
all representations other than UTF16/UCS2. We definitely forbid them
when validating UTF-8 strings --- that's per an RFC recommendation.
It sounds like Python is doing the same.

regards, tom lane

In response to

Re: Unicode string literals versus the world at 2009-04-16 14:34:40 from Sam Mason

Responses

Re: Unicode string literals versus the world at 2009-04-16 15:24:42 from Sam Mason
Re: Unicode string literals versus the world at 2009-04-16 15:34:27 from Andrew Dunstan
Re: Unicode string literals versus the world at 2009-04-16 15:50:30 from Marko Kreen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Fetter	2009-04-16 15:21:38	Re: [GENERAL] Performance of full outer join in 8.3
Previous Message	mito	2009-04-16 14:52:24	Re: Postgres SQL specification (tests)