From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-07-01 23:03:52 |
Message-ID: | 20240701230352.2c.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 01, 2024 at 12:24:15PM -0700, Jeff Davis wrote:
> On Sat, 2024-06-29 at 15:08 -0700, Noah Misch wrote:
> > lower(), initcap(), upper(), and regexp_matches() are
> > PROVOLATILE_IMMUTABLE.
> > Until now, we've delegated that responsibility to the user. The user
> > is
> > supposed to somehow never update libc or ICU in a way that changes
> > outcomes
> > from these functions.
>
> To me, "delegated" connotes a clear and organized transfer of
> responsibility to the right person to solve it. In that sense, I
> disagree that we've delegated it.
Good point.
> > Now that postgresql.org is taking that responsibility
> > for builtin C.UTF-8, how should we govern it? I think the above text
> > and [1]
> > convey that we'll update the Unicode data between major versions,
> > making
> > functions like lower() effectively STABLE. Is that right?
>
> Marking them STABLE is not a viable option, that would break a lot of
> valid use cases, e.g. an index on LOWER().
I agree.
> I don't think we need code changes for 17. Some documentation changes
> might be helpful, though. Should we have a note around LOWER()/UPPER()
> that users should REINDEX any dependent indexes when the provider is
> updated?
I agree the v17 code is fine. Today, a user can (with difficulty) choose
dependency libraries so regexp_matches() is IMMUTABLE, as marked. I don't
want $SUBJECT to be the ctype that, at some post-v17 version, can't achieve
that with unpatched PostgreSQL. Let's change the documentation to say this
provider uses a particular snapshot of Unicode data, taken around PostgreSQL
17. We plan never to change that data, so IMMUTABLE functions can rely on the
data. If we provide a newer Unicode data set in the future, we'll provide it
in such a way that DDL must elect the new data. How well would that suit your
vision for this feature? An alternative would be to make pg_upgrade reject
operating on a cluster that contains use of $SUBJECT.
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2024-07-01 23:24:48 | Re: Relation bulk write facility |
Previous Message | Dean Rasheed | 2024-07-01 22:19:36 | Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands. |