Re: encoding affects ICU regex character classification

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Jeremy Schneider <schneider(at)ardentperf(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: encoding affects ICU regex character classification
Date: 2023-12-18 20:39:05
Message-ID: 3a86ea75efc0a7dd1b040d3358356c901a9c154a.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2023-12-15 at 16:48 -0800, Jeremy Schneider wrote:
> This goes back to my other thread (which sadly got very little
> discussion): PosgreSQL really needs to be safe by /default/

Doesn't a built-in provider help create a safer option?

The built-in provider's version of Unicode will be consistent with
unicode_assigned(), which is a first step toward rejecting code points
that the provider doesn't understand. And by rejecting unassigned code
points, we get all kinds of Unicode compatibility guarantees that avoid
the kinds of change risks that you are worried about.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-12-18 21:00:30 Re: index prefetching
Previous Message Daniel Verite 2023-12-18 20:35:53 Fixing backslash dot for COPY FROM...CSV