Re: Built-in CTYPE provider

From: Noah Misch <noah(at)leadboat(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-07-01 23:03:52
Message-ID: 20240701230352.2c.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 01, 2024 at 12:24:15PM -0700, Jeff Davis wrote:
> On Sat, 2024-06-29 at 15:08 -0700, Noah Misch wrote:
> > lower(), initcap(), upper(), and regexp_matches() are
> > PROVOLATILE_IMMUTABLE.
> > Until now, we've delegated that responsibility to the user.  The user
> > is
> > supposed to somehow never update libc or ICU in a way that changes
> > outcomes
> > from these functions.
>
> To me, "delegated" connotes a clear and organized transfer of
> responsibility to the right person to solve it. In that sense, I
> disagree that we've delegated it.

Good point.

> >   Now that postgresql.org is taking that responsibility
> > for builtin C.UTF-8, how should we govern it?  I think the above text
> > and [1]
> > convey that we'll update the Unicode data between major versions,
> > making
> > functions like lower() effectively STABLE.  Is that right?
>
> Marking them STABLE is not a viable option, that would break a lot of
> valid use cases, e.g. an index on LOWER().

I agree.

> I don't think we need code changes for 17. Some documentation changes
> might be helpful, though. Should we have a note around LOWER()/UPPER()
> that users should REINDEX any dependent indexes when the provider is
> updated?

I agree the v17 code is fine. Today, a user can (with difficulty) choose
dependency libraries so regexp_matches() is IMMUTABLE, as marked. I don't
want $SUBJECT to be the ctype that, at some post-v17 version, can't achieve
that with unpatched PostgreSQL. Let's change the documentation to say this
provider uses a particular snapshot of Unicode data, taken around PostgreSQL
17. We plan never to change that data, so IMMUTABLE functions can rely on the
data. If we provide a newer Unicode data set in the future, we'll provide it
in such a way that DDL must elect the new data. How well would that suit your
vision for this feature? An alternative would be to make pg_upgrade reject
operating on a cluster that contains use of $SUBJECT.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-07-01 23:24:48 Re: Relation bulk write facility
Previous Message Dean Rasheed 2024-07-01 22:19:36 Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.