From: | Joe Conway <mail(at)joeconway(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Add CASEFOLD() function. |
Date: | 2024-12-16 21:27:11 |
Message-ID: | 70a27f2e-3629-4fe8-8deb-4a0b987f6245@joeconway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/16/24 12:49, Jeff Davis wrote:
> One question I have is whether we want this function to normalize the
> output.
>
> I believe most usecases would want the output normalized, because
> normalization differences (e.g. "a" U+0061 followed by "combining
> acute" U+0301 vs "a with acute" U+00E1) are more minor than differences
> in case.
>
> Of course, a user could wrap it with the normalize() function, but
> that's verbose and easy to forget. I'm also not sure that it can be
> made as fast as a combined function that does both.
Perhaps a one arg version that always casefolds and a two arg version
that accepts nfc, nfd, none (or something similar)?
> And a follow-up question: if it does normalize, the second parameter
> would be the requested normal form. But to accept the keyword forms
> (NFC, NFD in gram.y) rather than the string forms ('NFC', 'NFD') then
> we'd need to also need to add CASEFOLD to gram.y (like NORMALIZE). Is
> that a reasonable thing to do?
SQL 2023 seems to include the NORMALIZE syntax, but the only case
folding considered is UPPER and LOWER. As such, I think it ought to be a
function but not part of the grammar.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2024-12-16 21:28:51 | Re: Support regular expressions with nondeterministic collations |
Previous Message | David Rowley | 2024-12-16 21:20:46 | Re: Pg18 Recursive Crash |