Quick Links

Re: chr() is still too loose about UTF8 code points

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: chr() is still too loose about UTF8 code points
Date:	2014-05-16 17:52:43
Message-ID:	16802.1400262763@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote:
>> I think this probably means we need to change chr() to reject code points
>> above 10ffff. Should we back-patch that, or just do it in HEAD?

> The compatibility risks resemble those associated with the fixes for bug
> #9210, so I recommend HEAD only:

> http://www.postgresql.org/message-id/flat/20140220043940(dot)GA3064539(at)tornado(dot)leadboat(dot)com

While I'd be willing to ignore that risk so far as code points above
10ffff go, if we want pg_utf8_islegal to be happy then we will also
have to reject surrogate-pair code points. It's not beyond the realm
of possibility that somebody is intentionally generating such code
points with chr(), despite the dump/reload hazard. So now I agree
that this is sounding more like a major-version-only behavioral change.

regards, tom lane

In response to

Re: chr() is still too loose about UTF8 code points at 2014-05-16 17:39:09 from Noah Misch

Responses

Re: chr() is still too loose about UTF8 code points at 2014-05-16 18:07:47 from David G Johnston

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2014-05-16 17:54:56	%d in log_line_prefix doesn't work for bg/autovacuum workers
Previous Message	Noah Misch	2014-05-16 17:39:09	Re: chr() is still too loose about UTF8 code points