Re: [18] Policy on IMMUTABLE functions and Unicode updates

From: Joe Conway <mail(at)joeconway(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, LaurenzAlbe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-16 18:57:10
Message-ID: 7639208b-0c8c-4846-aaa4-0f584ed9b2eb@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/16/24 13:42, Jeff Davis wrote:
> The IMMUTABLE marker for functions is quite simple on the surface, but
> could be interpreted a few different ways, and there's some historical
> baggage that makes it complicated.
>
> There are a number of ways in which IMMUTABLE functions can change
> behavior:
>
> 1. Updating or moving to a different OS affects all collations that use
> the libc provider (other than "C" and "POSIX", which don't actually use
> libc). LOWER(), INITCAP(), UPPER() and pattern matching are also
> affected.
>
> 2. Updating ICU affects the collations that use the ICU provider.
> ICU_UNICODE_VERSION(), LOWER(), INITCAP(), UPPER() and pattern matching
> are also affected.
>
> 3. Moving to a different database encoding may affect collations that
> use the "C" or "POSIX" locales in the libc provider (NB: those locales
> don't actually use libc).
>
> 4. A PG Unicode update may change the results of functions that depend
> on Unicode. For instance, NORMALIZE(), UNICODE_ASSIGNED(), and
> UNICODE_VERSION(). Or, if using the new builtin provider's "C.UTF-8"
> locale in version 17, LOWER(), INITCAP(), UPPER(), and pattern matching
> (NB: collation itself is not affected -- always code point order).
>
> 5. If a well-defined IMMUTABLE function produces the wrong results, we
> may fix the bug in the next major release.
>
> 6. The GUC extra_float_digits can change the results of floating point
> text output.
>
> 7. A UDF may be improperly marked IMMUTABLE. A particularly common
> variant is a UDF without search_path specified, which is probably not
> truly IMMUTABLE.
>
> (more I'm sure, please add to list...)
>
>
> #1 and #2 have been discussed much more than the rest, but I think it's
> worthwhile to enumerate the other problems even if the impact is a lot
> lower.
>
>
> Noah seemed particularly concerned[1] about #4, so I'll start off by
> discussing that. Here's a brief history (slightly confusing because the
> PG and Unicode versions are similar numbers):
>
> PG13: Unicode 13.0 and NORMALIZE() is first exposed as a SQL function
> PG15: Unicode updated to 14.0
> PG16: Unicode updated to 15.0
> PG17: Unicode updated to 15.1, UNICODE_ASSIGNED(), UNICODE_VERSION()
> and builtin "C.UTF-8" locale are introduced
>
> To repeat, these Unicode updates do not affect collation itself, they
> affect affect NORMALIZE(), UNICODE_VERSION(), and UNICODE_ASSIGNED().
> If using the builtin "C.UTF-8" locale, they also affect LOWER(),
> INITCAP(), UPPER(), and pattern matching. (NB: the builtin collation
> provider hasn't yet gone through any Unicode update.)
>
> There are two alternative philosophies:
>
> A. By choosing to use a Unicode-based function, the user has opted in
> to the Unicode stability guarantees[2], and it's fine to update Unicode
> occasionally in new major versions as long as we are transparent with
> the user.
>
> B. IMMUTABLE implies some very strict definition of stability, and we
> should never again update Unicode because it changes the results of
> IMMUTABLE functions.
>
> We've been following (A), and that's the defacto policy today[3][4].
> Noah and Laurenz argued[5] that the policy starting in version 18
> should be (B). Given that it's a policy decision that affects more than
> just the builtin collation provider, I'd like to discuss it more
> broadly outside of that subthread.

On the general topic, we have these definitions in the fine manual:

8<-----------------
A VOLATILE function can do anything, ... A query using a volatile
function will re-evaluate the function at every row where its value is
needed.

A STABLE function cannot modify the database and is guaranteed to return
the same results given the same arguments for all rows within a single
statement...

An IMMUTABLE function cannot modify the database and is guaranteed to
return the same results given the same arguments forever.
8<-----------------

As Jeff points out, the IMMUTABLE definition has never really been true.
Even the STABLE is not quite right, as there are at least some STABLE
functions that will return the same value for multiple statements if
they are within a transaction block (e.g. "now()" -- TBH I don't
remember offhand if that is true for all stable functions).

In any case, there is quite a gap between "forever" and "single
statement". Perhaps we need to have more volatility categories, with
guarantees that lie somewhere between the two, and allow those to be
used like we do IMMUTABLE except with appropriate warning labels. E.g.
something ("STABLE_VERSION"?) to mean "forever within a major version
lifetime" and something ("STABLE_SYSTEM?") to mean "as long as you don't
upgrade your OS".

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2024-07-16 19:04:13 recovery test error
Previous Message Aleksander Alekseev 2024-07-16 18:14:35 Re: [PATCH] Refactor pqformat.{c,h} and protocol.h