[18] Policy on IMMUTABLE functions and Unicode updates

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, LaurenzAlbe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-16 17:42:03
Message-ID: d75d2d0d1d2bd45b2c332c47e3e0a67f0640b49c.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The IMMUTABLE marker for functions is quite simple on the surface, but
could be interpreted a few different ways, and there's some historical
baggage that makes it complicated.

There are a number of ways in which IMMUTABLE functions can change
behavior:

1. Updating or moving to a different OS affects all collations that use
the libc provider (other than "C" and "POSIX", which don't actually use
libc). LOWER(), INITCAP(), UPPER() and pattern matching are also
affected.

2. Updating ICU affects the collations that use the ICU provider.
ICU_UNICODE_VERSION(), LOWER(), INITCAP(), UPPER() and pattern matching
are also affected.

3. Moving to a different database encoding may affect collations that
use the "C" or "POSIX" locales in the libc provider (NB: those locales
don't actually use libc).

4. A PG Unicode update may change the results of functions that depend
on Unicode. For instance, NORMALIZE(), UNICODE_ASSIGNED(), and
UNICODE_VERSION(). Or, if using the new builtin provider's "C.UTF-8"
locale in version 17, LOWER(), INITCAP(), UPPER(), and pattern matching
(NB: collation itself is not affected -- always code point order).

5. If a well-defined IMMUTABLE function produces the wrong results, we
may fix the bug in the next major release.

6. The GUC extra_float_digits can change the results of floating point
text output.

7. A UDF may be improperly marked IMMUTABLE. A particularly common
variant is a UDF without search_path specified, which is probably not
truly IMMUTABLE.

(more I'm sure, please add to list...)

#1 and #2 have been discussed much more than the rest, but I think it's
worthwhile to enumerate the other problems even if the impact is a lot
lower.

Noah seemed particularly concerned[1] about #4, so I'll start off by
discussing that. Here's a brief history (slightly confusing because the
PG and Unicode versions are similar numbers):

PG13: Unicode 13.0 and NORMALIZE() is first exposed as a SQL function
PG15: Unicode updated to 14.0
PG16: Unicode updated to 15.0
PG17: Unicode updated to 15.1, UNICODE_ASSIGNED(), UNICODE_VERSION()
and builtin "C.UTF-8" locale are introduced

To repeat, these Unicode updates do not affect collation itself, they
affect affect NORMALIZE(), UNICODE_VERSION(), and UNICODE_ASSIGNED().
If using the builtin "C.UTF-8" locale, they also affect LOWER(),
INITCAP(), UPPER(), and pattern matching. (NB: the builtin collation
provider hasn't yet gone through any Unicode update.)

There are two alternative philosophies:

A. By choosing to use a Unicode-based function, the user has opted in
to the Unicode stability guarantees[2], and it's fine to update Unicode
occasionally in new major versions as long as we are transparent with
the user.

B. IMMUTABLE implies some very strict definition of stability, and we
should never again update Unicode because it changes the results of
IMMUTABLE functions.

We've been following (A), and that's the defacto policy today[3][4].
Noah and Laurenz argued[5] that the policy starting in version 18
should be (B). Given that it's a policy decision that affects more than
just the builtin collation provider, I'd like to discuss it more
broadly outside of that subthread.

Regards,
Jeff Davis

[1] 
https://www.postgresql.org/message-id/20240629220857.fb.nmisch@google.com

[2]
https://www.unicode.org/policies/stability_policy.html

[3] 
https://www.postgresql.org/message-id/1d178eb1bbd61da1bcfe4a11d6545e9cdcede1d1.camel%40j-davis.com

[4]
https://www.postgresql.org/message-id/564325.1720297161%40sss.pgh.pa.us

[5]
https://www.postgresql.org/message-id/af82b292f13dd234790bc701933e9992ee07d4fa.camel%40cybertec.at

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2024-07-16 17:50:30 Re: [Proposal] Add foreign-server health checks infrastructure
Previous Message Dean Rasheed 2024-07-16 17:36:40 Re: Restrict EXPLAIN (ANALYZE) for RLS and security_barrier views