pgsql: Support PG_UNICODE_FAST locale in the builtin collation provider

From: Jeff Davis <jdavis(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Support PG_UNICODE_FAST locale in the builtin collation provider
Date: 2025-01-17 23:59:48
Message-ID: E1tYwFg-0029Dk-Lm@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Support PG_UNICODE_FAST locale in the builtin collation provider.

The PG_UNICODE_FAST locale uses code point sort order (fast,
memcmp-based) combined with Unicode character semantics. The character
semantics are based on Unicode full case mapping.

Full case mapping can map a single codepoint to multiple codepoints,
such as "ß" uppercasing to "SS". Additionally, it handles
context-sensitive mappings like the "final sigma", and it uses
titlecase mappings such as "Dž" when titlecasing (rather than plain
uppercase mappings).

Importantly, the uppercasing of "ß" as "SS" is specifically mentioned
by the SQL standard. In Postgres, UCS_BASIC uses plain ASCII semantics
for case mapping and pattern matching, so if we changed it to use the
PG_UNICODE_FAST locale, it would offer better compliance with the
standard. For now, though, do not change the behavior of UCS_BASIC.

Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com
Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Reviewed-by: Peter Eisentraut, Daniel Verite

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/d3d0983169130a9b81e3fe48d5c2ca4931480956

Modified Files
--------------
doc/src/sgml/charset.sgml | 29 +++++-
doc/src/sgml/ref/create_collation.sgml | 3 +-
doc/src/sgml/ref/create_database.sgml | 6 +-
doc/src/sgml/ref/initdb.sgml | 4 +-
src/backend/regex/regc_pg_locale.c | 6 +-
src/backend/utils/adt/pg_locale.c | 7 +-
src/backend/utils/adt/pg_locale_builtin.c | 12 ++-
src/bin/initdb/initdb.c | 6 +-
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_collation.dat | 3 +
src/include/utils/pg_locale.h | 1 +
src/test/regress/expected/collate.utf8.out | 160 +++++++++++++++++++++++++++++
src/test/regress/sql/collate.utf8.sql | 60 +++++++++++
13 files changed, 283 insertions(+), 16 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Andrew Dunstan 2025-01-18 14:34:48 pgsql: Fix readlink() for non-PostgreSQL junction points on Windows.
Previous Message Nathan Bossart 2025-01-17 21:24:12 pgsql: vacuumdb: Fix comment for vacuum_one_database().