Re: Built-in CTYPE provider

From: Noah Misch <noah(at)leadboat(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-06-29 22:08:57
Message-ID: 20240629220857.fb.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2024 at 05:13:26PM -0700, Jeff Davis wrote:
> On Tue, 2024-03-19 at 13:41 +0100, Peter Eisentraut wrote:
> > * v25-0002-Support-C.UTF-8-locale-in-the-new-builtin-collat.patch
> >
> > Looks ok.
>
> Committed.

> <varlistentry>
> + <term><literal>pg_c_utf8</literal></term>
> + <listitem>
> + <para>
> + This collation sorts by Unicode code point values rather than natural
> + language order. For the functions <function>lower</function>,
> + <function>initcap</function>, and <function>upper</function>, it uses
> + Unicode simple case mapping. For pattern matching (including regular
> + expressions), it uses the POSIX Compatible variant of Unicode <ulink
> + url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility
> + Properties</ulink>. Behavior is efficient and stable within a
> + <productname>Postgres</productname> major version. This collation is
> + only available for encoding <literal>UTF8</literal>.
> + </para>
> + </listitem>
> + </varlistentry>

lower(), initcap(), upper(), and regexp_matches() are PROVOLATILE_IMMUTABLE.
Until now, we've delegated that responsibility to the user. The user is
supposed to somehow never update libc or ICU in a way that changes outcomes
from these functions. Now that postgresql.org is taking that responsibility
for builtin C.UTF-8, how should we govern it? I think the above text and [1]
convey that we'll update the Unicode data between major versions, making
functions like lower() effectively STABLE. Is that right?

(This thread had some discussion[2] that datcollversion/collversion won't
necessarily change when a major versions changes lower() behavior.)

[1] https://postgr.es/m/7089acb3ebac0c1682a79c8bc16803cf06896fb9.camel@j-davis.com
[2] https://postgr.es/m/5a1ecc40539f36cac5b27a62739a45a49785ca54.camel@j-davis.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-06-30 02:12:11 Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Previous Message Tomas Vondra 2024-06-29 21:13:04 Re: Flush pgstats file during checkpoints