From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Joe Conway <mail(at)joeconway(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Add CASEFOLD() function. |
Date: | 2025-01-18 00:34:43 |
Message-ID: | 7134f83a3242a1f04cf91c259ec98ecfe56367ed.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2025-01-10 at 16:27 -0800, Jeff Davis wrote:
> New patch series attached.
v5 attached.
This version is rebased over the Full Case Mapping support, and
supports Default Case Folding when using the PG_UNICODE_FAST collation.
That means that "ẞ", "ß", "SS", "Ss", and "ss" all fold to "ss"; and
"Σ", "σ", and "ς" all fold to "σ".
CASEFOLD() is better (according to Unicode, anyway) than LOWER() for
caseless matching, or in an expression index to enforce case-
insensitive uniqueness without relying on ICU.
Additionally, the infrastructure in this patch (as well as 286a365b9c)
can be used in the future for better case-insensitive pattern matching,
or casefolding identifiers in the parser without relying on libc.
I feel this is about ready for commit. The main point of discussion was
whether CASEFOLD() would do normalization, and if so, what the SQL API
would look like. I concluded upthread that it was unnecessary to meet
the Unicode Default Case Folding behavior, and we should just leave
normalization as a separate process. If someone disagrees with
reasoning, please let me know.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/610a56de2bd958e96c149ca60420db30e7d51588.camel%40j-davis.com
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Add-support-for-Unicode-case-folding.patch | text/x-patch | 651.0 KB |
v5-0002-Add-SQL-function-CASEFOLD.patch | text/x-patch | 16.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-01-18 00:40:07 | Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC |
Previous Message | Tom Lane | 2025-01-18 00:27:43 | Re: [PATCH] Add roman support for to_number function |