Quick Links

Re: UTF16 surrogate pairs in UTF8 encoding

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marko Kreen <markokr(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: UTF16 surrogate pairs in UTF8 encoding
Date:	2010-09-08 14:01:36
Message-ID:	2195.1283954496@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> Although it does seem unnecessary.

The reason I asked for this to be spelled out is that ordinarily,
a backslash escape \nnn is a very low-level thing that will insert
exactly what you say. To me it's quite unexpected that the system
would editorialize on that to the extent of replacing two UTF16
surrogate characters by a single code point. That's necessary for
correctness because our underlying storage is UTF8, but it's not
obvious that it will happen. (As a counterexample, if our underlying
storage were UTF16, then very different things would need to happen
for the exact same SQL input.)

I think a lot of people will have this same question when reading
this para, which is why I asked for an explanation there.

regards, tom lane

In response to

Re: UTF16 surrogate pairs in UTF8 encoding at 2010-09-08 10:45:37 from Marko Kreen

Responses

Re: UTF16 surrogate pairs in UTF8 encoding at 2010-09-08 14:23:45 from Marko Kreen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hans-Jürgen Schönig	2010-09-08 14:14:29	Re: plan time of MASSIVE partitioning ...
Previous Message	Robert Haas	2010-09-08 13:54:11	Re: plan time of MASSIVE partitioning ...