Re: Built-in CTYPE provider

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-07-09 01:05:45
Message-ID: 20240709010545.8c.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 06, 2024 at 04:19:21PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > As a released feature, NORMALIZE() has a different set of remedies to choose
> > from, and I'm not proposing one. I may have sidetracked this thread by
> > talking about remedies without an agreement that pg_c_utf8 has a problem. My
> > question for the PostgreSQL maintainers is this:
>
> > textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being
> > IMMUTABLE, will change behavior in some major releases. pg_upgrade does not
> > have a concept of IMMUTABLE functions changing, so index scans will return
> > wrong query results after upgrade. Is it okay for v17 to release a
> > pg_c_utf8 planned to behave that way when upgrading v17 to v18+?
>
> I do not think it is realistic to define "IMMUTABLE" as meaning that
> the function will never change behavior until the heat death of the
> universe. As a counterexample, we've not worried about applying
> bug fixes or algorithm improvements that change the behavior of
> "immutable" numeric computations.

True. There's a continuum from "releases can change any IMMUTABLE function"
to "index integrity always wins, even if a function is as wrong as 1+1=3".
I'm less concerned about the recent "Incorrect results from numeric round"
thread, even though it's proposing to back-patch. I'm thinking about these
aggravating factors for $SUBJECT:

- $SUBJECT is planning an annual cadence of this kind of change.

- We already have ICU providing collation support for the same functions.
Unlike $SUBJECT, ICU integration gives packagers control over when to accept
corruption at pg_upgrade time.

- SQL Server, DB2 and Oracle do their Unicode updates in a non-corrupting way.
(See Jeremy Schneider's reply concerning DB2 and Oracle.)

- lower() and regexp are more popular in index expressions than
high-digit-count numeric calculations.

> I'd say a realistic policy is "immutable means we don't intend to
> change it within a major release". If we do change the behavior,
> either as a bug fix or a major-release improvement, that should
> be release-noted so that people know they have to rebuild dependent
> indexes and matviews.

It sounds like you're very comfortable with $SUBJECT proceeding in its current
form. Is that right?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-07-09 01:17:52 Re: Built-in CTYPE provider
Previous Message Michael Paquier 2024-07-09 00:59:44 Re: MIN/MAX functions for a record