From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-07-04 21:26:41 |
Message-ID: | 20240704212641.c4.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 03, 2024 at 02:19:07PM -0700, Jeff Davis wrote:
> * Unless I made a mistake, the last three releases of Unicode (14.0,
> 15.0, and 15.1) all have the exact same behavior for UPPER() and
> LOWER() -- even for unassigned code points. It would be silly to
> promise to stay with 15.1 and then realize that moving to 16.0 doesn't
> create any actual problem.
I think you're saying that if some Unicode update changes the results of a
STABLE function but does not change the result of any IMMUTABLE function, we
may as well import that update. Is that about right? If so, I agree.
In addition to the options I listed earlier (error in pg_upgrade or document
that IMMUTABLE stands) I would be okay with a third option. Decide here that
we'll not adopt a Unicode update in a way that changes a v17 IMMUTABLE
function result of the new provider. We don't need to write that in the
documentation, since it's implicit in IMMUTABLE. Delete the "stable within a
<productname>Postgres</productname> major version" documentation text.
> * While someone can pin libc+ICU to particular versions, it's
> impossible when using the official packages, and additionally requires
> using something like [1], which just became available last year. I
> don't think it's reasonable to put it forth as a matter-of-fact
> solution.
>
> * Let's keep some perspective: we've lived for a long time with ALL
> text indexes at serious risk of breakage. In contrast, the concerns you
> are raising now are about certain kinds of expression indexes over data
> containing certain unassigned code points. I am not dismissing that
> concern, but the builtin provider moves us in the right direction and
> let's not lose sight of that.
I see you're trying to help users get less breakage, and that's a good goal.
I agree $SUBJECT eliminates libc+ICU breakage, and libc+ICU breakage has hurt
plenty. However, you proposed to update Unicode data and give REINDEX as the
solution to breakage this causes. Unlike libc+ICU breakage, the packager has
no escape from that. That's a different kind of breakage proposition, and no
new PostgreSQL feature should do that. It's on a different axis from helping
users avoid libc+ICU breakage, and a feature doesn't get to credit helping on
one axis against a regression on the other axis. What am I missing here?
> Given that no code changes for v17 are proposed, I suggest that we
> refrain from making any declarations until the next version of Unicode
> is released. If the pattern holds, that will be around September, which
> still leaves time to make reasonable decisions for v18.
Soon enough, a Unicode release will add one character to regexp [[:alpha:]].
PostgreSQL will then need to decide what IMMUTABLE is going to mean. How does
that get easier in September?
Thanks,
nm
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-07-04 21:51:40 | Re: Wrong results with grouping sets |
Previous Message | Andres Freund | 2024-07-04 21:08:25 | Re: Pluggable cumulative statistics |