From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-07-09 01:05:45 |
Message-ID: | 20240709010545.8c.nmisch@google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jul 06, 2024 at 04:19:21PM -0400, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > As a released feature, NORMALIZE() has a different set of remedies to choose
> > from, and I'm not proposing one. I may have sidetracked this thread by
> > talking about remedies without an agreement that pg_c_utf8 has a problem. My
> > question for the PostgreSQL maintainers is this:
>
> > textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being
> > IMMUTABLE, will change behavior in some major releases. pg_upgrade does not
> > have a concept of IMMUTABLE functions changing, so index scans will return
> > wrong query results after upgrade. Is it okay for v17 to release a
> > pg_c_utf8 planned to behave that way when upgrading v17 to v18+?
>
> I do not think it is realistic to define "IMMUTABLE" as meaning that
> the function will never change behavior until the heat death of the
> universe. As a counterexample, we've not worried about applying
> bug fixes or algorithm improvements that change the behavior of
> "immutable" numeric computations.
True. There's a continuum from "releases can change any IMMUTABLE function"
to "index integrity always wins, even if a function is as wrong as 1+1=3".
I'm less concerned about the recent "Incorrect results from numeric round"
thread, even though it's proposing to back-patch. I'm thinking about these
aggravating factors for $SUBJECT:
- $SUBJECT is planning an annual cadence of this kind of change.
- We already have ICU providing collation support for the same functions.
Unlike $SUBJECT, ICU integration gives packagers control over when to accept
corruption at pg_upgrade time.
- SQL Server, DB2 and Oracle do their Unicode updates in a non-corrupting way.
(See Jeremy Schneider's reply concerning DB2 and Oracle.)
- lower() and regexp are more popular in index expressions than
high-digit-count numeric calculations.
> I'd say a realistic policy is "immutable means we don't intend to
> change it within a major release". If we do change the behavior,
> either as a bug fix or a major-release improvement, that should
> be release-noted so that people know they have to rebuild dependent
> indexes and matviews.
It sounds like you're very comfortable with $SUBJECT proceeding in its current
form. Is that right?
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-07-09 01:17:52 | Re: Built-in CTYPE provider |
Previous Message | Michael Paquier | 2024-07-09 00:59:44 | Re: MIN/MAX functions for a record |