From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Joe Conway <mail(at)joeconway(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Add CASEFOLD() function. |
Date: | 2025-01-11 00:27:12 |
Message-ID: | 610a56de2bd958e96c149ca60420db30e7d51588.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2025-01-08 at 15:19 -0800, Jeff Davis wrote:
> 3. Allow CASEFOLD() to break the normal form of the input string. The
> problem here is that the user may be surprised that the output is not
> normalized even when all of their data is normalized. It's not clear
> to
> me whether it still works for caseless matching -- it might if the
> string is in a consistent form, even if not normalized.
Looking at the Unicode standard again, it distinguishes between
"default caseless matching" and "canonical caseless matching". The
latter accounts for a few nuances that the former does not. See Unicode
16.0 section 3.13.4 D144 & D145. Using Default Caseless Matching
simplifies things quite a bit.
We could argue that it would be nice to have canonical caseless
matching, but that seems to be going above and beyond what Unicode
suggests. And normalization is expensive -- if we combine case folding
and normalization, there's no way for the user to avoid the cost. So
I'm changing my answer to #3, and we just document that it does not
preserve normalization. I believe this means that Peter and I are now
in agreement[1], though I'm not sure if his reasoning is the same.
New patch series attached.
Regards,
Jeff Davis
[1]
https://www.postgresql.org/message-id/8c384b0d-00f2-4515-8e60-ff7d0d4c093a%40eisentraut.org
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Add-support-for-Unicode-case-folding.patch | text/x-patch | 571.0 KB |
v4-0002-Add-SQL-function-CASEFOLD.patch | text/x-patch | 15.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-01-11 00:36:09 | Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC |
Previous Message | Tom Lane | 2025-01-11 00:15:09 | Re: Restore support for USE_ASSERT_CHECKING in extensions only |