From: | Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org> |
Subject: | Re: Support LIKE with nondeterministic collations |
Date: | 2024-07-26 22:32:08 |
Message-ID: | CA+renyWd-_sAj3YqBRaQVOOMr5uQoeBcA3tjCSyQFzvnbGrMYA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jun 27, 2024 at 11:31 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> Here is an updated patch for this.
I took a look at this. I added some tests and found a few that give
the wrong result (I believe). The new tests are included in the
attached patch, along with the results I expect. Here are the
failures:
-- inner %% matches b then zero:
SELECT U&'cb\0061\0308' LIKE U&'c%%\00E4' COLLATE ignore_accents;
?column?
----------
- t
+ f
(1 row)
-- trailing _ matches two codepoints that form one char:
SELECT U&'cb\0061\0308' LIKE U&'cb_' COLLATE ignore_accents;
?column?
----------
- t
+ f
(1 row)
-- leading % matches zero:
SELECT U&'\0061\0308bc' LIKE U&'%\00E4bc' COLLATE ignore_accents;
?column?
----------
- t
+ f
(1 row)
-- leading % matches zero (with later %):
SELECT U&'\0061\0308bc' LIKE U&'%\00E4%c' COLLATE ignore_accents;
?column?
----------
- t
+ f
(1 row)
I think the 1st, 3rd, and 4th failures are all from % not backtracking
to match zero chars.
The 2nd failure I'm not sure about. Maybe my expectation is wrong, but
then why does the same test pass with __ leading not trailing? Surely
they should be consistent.
> I have added some more documentation based on the discussions, including
> some examples taken directly from the emails here.
This looks good to me.
> One thing I have been struggling with a bit is the correct use of
> LIKE_FALSE versus LIKE_ABORT in the MatchText() code. I have made some
> small tweaks about this in this version that I think are more correct,
> but it could use another look. Maybe also some more tests to verify
> this one way or the other.
I haven't looked at this yet.
Yours,
--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Support-LIKE-with-nondeterministic-collations.patch | application/octet-stream | 24.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2024-07-26 22:34:46 | Re: Use pgBufferUsage for block reporting in analyze |
Previous Message | Michael Paquier | 2024-07-26 22:24:33 | Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15) |