Re: [18] Policy on IMMUTABLE functions and Unicode updates

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Jeremy Schneider <schneider(at)ardentperf(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date: 2024-07-23 18:40:27
Message-ID: CA+TgmoZ+gaj6J=xAtiHeFA9v8pk79FbcOEiHKTMoJbzjZv4qwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 23, 2024 at 1:03 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> One of my strongest motivations for PG_C_UTF8 was that there was still
> a use case for libc in PG16: the "C.UTF-8" locale, which is not
> supported at all in ICU. Daniel Vérité made me aware of the importance
> of this locale, which offers code point order collation combined with
> Unicode ctype semantics.
>
> With PG17, between ICU and the builtin provider, there's little
> remaining reason to use libc (aside from legacy).

I was really interested to read Jeremy Schneider's slide deck, to
which he linked earlier, wherein he explained that other major
databases default to something more like C.UTF-8. Maybe we need to
relitigate the debate about what our default should be in light of
those findings (but, if so, on another thread with a clear subject
line). But even if we were to decide to change the default, there are
lots and lots of existing databases out there that are using libc
collations. I'm not in a good position to guess how many of those
people actually truly care about language-specific collations. I'm
positive it's not zero, but I can't really guess how much more than
zero it is. Even if it were zero, though, the fact that so many
upgrades are done using pg_upgrade means that this problem will still
be around in a decade even if we changed the default tomorrow.

(I do understand that you wrote "aside from legacy" so I'm not
accusing you of ignoring the upgrade issues, just taking the
opportunity to be more explicit about my own view.)

Also, Noah has pointed out that C.UTF-8 introduces some
forward-compatibility hazards of its own, at least with respect to
ctype semantics. I don't have a clear view of what ought to be done
about that, but if we just replace a dependency on an unstable set of
libc definitions with a dependency on an equally unstable set of
PostgreSQL definitions, we're not really winning. Do we need to
version the new ctype provider?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-07-23 19:26:55 Re: [18] Policy on IMMUTABLE functions and Unicode updates
Previous Message Marcos Pegoraro 2024-07-23 18:35:13 Useless toast