Re: Add CASEFOLD() function.

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Joe Conway <mail(at)joeconway(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add CASEFOLD() function.
Date: 2025-01-18 00:34:43
Message-ID: 7134f83a3242a1f04cf91c259ec98ecfe56367ed.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2025-01-10 at 16:27 -0800, Jeff Davis wrote:
> New patch series attached.

v5 attached.

This version is rebased over the Full Case Mapping support, and
supports Default Case Folding when using the PG_UNICODE_FAST collation.

That means that "ẞ", "ß", "SS", "Ss", and "ss" all fold to "ss"; and
"Σ", "σ", and "ς" all fold to "σ".

CASEFOLD() is better (according to Unicode, anyway) than LOWER() for
caseless matching, or in an expression index to enforce case-
insensitive uniqueness without relying on ICU.

Additionally, the infrastructure in this patch (as well as 286a365b9c)
can be used in the future for better case-insensitive pattern matching,
or casefolding identifiers in the parser without relying on libc.

I feel this is about ready for commit. The main point of discussion was
whether CASEFOLD() would do normalization, and if so, what the SQL API
would look like. I concluded upthread that it was unnecessary to meet
the Unicode Default Case Folding behavior, and we should just leave
normalization as a separate process. If someone disagrees with
reasoning, please let me know.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/610a56de2bd958e96c149ca60420db30e7d51588.camel%40j-davis.com

Attachment Content-Type Size
v5-0001-Add-support-for-Unicode-case-folding.patch text/x-patch 651.0 KB
v5-0002-Add-SQL-function-CASEFOLD.patch text/x-patch 16.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-01-18 00:40:07 Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC
Previous Message Tom Lane 2025-01-18 00:27:43 Re: [PATCH] Add roman support for to_number function