Re: [18] Policy on IMMUTABLE functions and Unicode updates

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Verite <daniel(at)manitou-mail(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, LaurenzAlbe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-16 19:33:55
Message-ID: CAKFQuwaw1TP--7NdOeX3vr_76nuyqGSc8+-1qh=5XSm8yRyg4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 16, 2024 at 11:57 AM Joe Conway <mail(at)joeconway(dot)com> wrote:

>
> > There are two alternative philosophies:
> >
> > A. By choosing to use a Unicode-based function, the user has opted in
> > to the Unicode stability guarantees[2], and it's fine to update Unicode
> > occasionally in new major versions as long as we are transparent with
> > the user.
> >
> > B. IMMUTABLE implies some very strict definition of stability, and we
> > should never again update Unicode because it changes the results of
> > IMMUTABLE functions.
> >
> > We've been following (A), and that's the defacto policy today[3][4].
> > Noah and Laurenz argued[5] that the policy starting in version 18
> > should be (B). Given that it's a policy decision that affects more than
> > just the builtin collation provider, I'd like to discuss it more
> > broadly outside of that subthread.
>
> On the general topic, we have these definitions in the fine manual:
>
> 8<-----------------
> A VOLATILE function can do anything, ... A query using a volatile
> function will re-evaluate the function at every row where its value is
> needed.
>
> A STABLE function cannot modify the database and is guaranteed to return
> the same results given the same arguments for all rows within a single
> statement...
>
> An IMMUTABLE function cannot modify the database and is guaranteed to
> return the same results given the same arguments forever.
> 8<-----------------
>
> As Jeff points out, the IMMUTABLE definition has never really been true.
>

> Even the STABLE is not quite right, as there are at least some STABLE
> functions that will return the same value for multiple statements if
> they are within a transaction block (e.g. "now()" -- TBH I don't
> remember offhand if that is true for all stable functions).
>

Under-specification here doesn't make the meaning of stable incorrect. We
don't have anything that guarantees stability at the transaction scope
because I don't think it can be guaranteed there without considering
whether said transaction is read-committed, repeatable read, or
serializable. The function itself can promise more but the marker seems
correctly scoped for how the system uses it in statement optimization.

and allow those to be
> used like we do IMMUTABLE except with appropriate warning labels. E.g.
> something ("STABLE_VERSION"?) to mean "forever within a major version
> lifetime" and something ("STABLE_SYSTEM?") to mean "as long as you don't
> upgrade your OS".
>
>
I'd be content cutting "forever" down to "within a given server
configuration". Then just note that immutable functions can depend
implicitly on external server characteristics and so when moving data
between servers re-evaluation of immutable functions may be necessary. Not
so bad for indexes. A bit more problematic for generated values.

I'm not against adding metadata options here but for internal functions
comments and documentation can work. For user-defined functions I have my
doubts on how trustworthy they would end up being.

For the original question, I suggest continuing behaving per "A" and work
on making it more clear to users what that means in terms of server
upgrades.

If we do add metadata to reflect our reality I'd settle on a generic
"STATIC" marker that can be used on those functions the rely on real world
state, whether we are directly calling into the system (e.g., hashing) or
have chosen to provide the state access management ourselves (e.g.,
unicode).

When we do take control we should have a goal of allowing for a given
external dependency version to exist in many PostgreSQL versions and give
the DBA the choice of when to move individual databases from one version to
the next. Possibly dropping the dependency version support alongside the
dropping of support of the major version it first appeared in. Not keeping
up with external dependency versions just punishes new users by forbidding
them a tool permanently, as well as puts us out-of-step with those
dependency development groups, to save existing users some short-term
pain. Being able to deal with that pain at a time different than the
middle of a major version upgrade, one database at a time, gives those
existing users reasonable options.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-16 19:48:34 Re: [PATCH] Refactor pqformat.{c,h} and protocol.h
Previous Message Andrew Dunstan 2024-07-16 19:04:13 recovery test error