Re: [18] Policy on IMMUTABLE functions and Unicode updates

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Verite <daniel(at)manitou-mail(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-24 18:10:45
Message-ID: e753e0e3-dc99-44f6-8ad7-100597cc6e7e@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.07.24 14:20, Robert Haas wrote:
> On Wed, Jul 24, 2024 at 12:42 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>> Fair enough. My argument was, that topic is distinct from the topic of
>> this thread.
>
> OK, that's fair. But I think the solutions are the same: we complain
> all the time about glibc and ICU shipping collations and not
> versioning them. We shouldn't make the same kinds of mistakes. Even if
> ctype is less likely to break things than collations, it still can,
> and we should move in the direction of letting people keep the v17
> behavior for the foreseeable future while at the same time having a
> way that they can also get the new behavior if they want it (and the
> new behavior should be the default).

Versioning is possibly part of the answer, but I think it would be
different versioning from the collation version.

The collation versions are in principle designed to change rarely. Some
languages' rules might change once in twenty years, some never. Maybe
you have a database mostly in English and a few tables in, I don't know,
Swedish (unverified examples). Most of the time nothing happens during
upgrades, but one time in many years you need to reindex the Swedish
tables, and the system starts warning you about that as soon as you
access the Swedish tables. (Conversely, if you never actually access
the Swedish tables, then you don't get warned about.)

If we wanted a similar versioning system for the Unicode updates, it
would be separate. We'd write the Unicode version that was current when
the system catalogs were initialized into, say, a pg_database column.
And then at run-time, when someone runs say the normalize() function or
some regular expression character classification, then we check what the
version of the current compiled-in Unicode tables are, and then we'd
issue a warning when they are different.

A possible problem is that the Unicode version changes in practice with
every major PostgreSQL release, so this approach would end up warning
users after every upgrade. To avoid that, we'd probably need to keep
support for multiple Unicode versions around, as has been suggested in
this thread already.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-07-24 18:26:55 Re: warning: dereferencing type-punned pointer
Previous Message Tom Lane 2024-07-24 18:09:49 Re: warning: dereferencing type-punned pointer