Re: Support LIKE with nondeterministic collations

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>
Subject: Re: Support LIKE with nondeterministic collations
Date: 2024-11-18 03:30:20
Message-ID: CACJufxHVcgt6ybYLX+R6YYcK=Hc0ctTD_wFfJvrR37yrjYyrww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 15, 2024 at 11:42 PM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> On 15.11.24 05:26, jian he wrote:
> > /*
> > * Now build a substring of the text and try to match it against
> > * the subpattern. t is the start of the text, t1 is one past the
> > * last byte. We start with a zero-length string.
> > */
> > t1 = t
> > t1len = tlen;
> > for (;;)
> > {
> > int cmp;
> > CHECK_FOR_INTERRUPTS();
> > cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale);
> >
> > select '.foo.' LIKE '_oo' COLLATE ign_punct;
> > pg_strncoll's iteration of the first 4 argument values.
> > oo 2 foo. 0
> > oo 2 foo. 1
> > oo 2 foo. 2
> > oo 2 foo. 3
> > oo 2 foo. 4
> >
> > seems there is a shortcut/optimization.
> > if subpat don't have wildcard(percent sign, underscore)
> > then we can have less pg_strncoll calls?
>
> How would you do that? You need to try all combinations to find one
> that matches.
>

we can optimize when trailing (last character) is not wildcards.

SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents;
within the for loop
for(;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
....
}

pg_strncoll comparison will become
Ha12foo foo
a12foo foo
12foo foo
2foo foo
foo foo

it's safe because in MatchText we have:
else if (*p == '%')
{
while (tlen > 0)
{
if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
{
int matched = MatchText(t, tlen, p, plen, locale);
if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */
}
NextChar(t, tlen);
}
}

please check attached.
> > minimum case to trigger error within GenericMatchText
> > since no related tests.
> > create table t1(a text collate case_insensitive, b text collate "C");
> > insert into t1 values ('a','a');
> > select a like b from t1;
>
> This results in
>
> ERROR: 42P22: could not determine which collation to use for LIKE
> HINT: Use the COLLATE clause to set the collation explicitly.
>
> which is the expected behavior.
>
sorry, didn't mention it clearly, i mean we can add it to the regress test.

Attachment Content-Type Size
v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot application/octet-stream 2.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-11-18 03:43:27 Re: memory leak in pgoutput
Previous Message Zhijie Hou (Fujitsu) 2024-11-18 03:26:52 RE: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY