Re: Support POSITION with nondeterministic collations

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Euler Taveira <euler(at)eulerto(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support POSITION with nondeterministic collations
Date: 2025-02-21 11:48:21
Message-ID: 6107daa2-5cf7-4cf2-a526-626be1d15b18@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.02.25 05:31, Euler Taveira wrote:
> On Mon, Dec 2, 2024, at 6:09 AM, Peter Eisentraut wrote:
>> On 26.08.24 08:09, Peter Eisentraut wrote:
>> > This patch allows using text position search functions with
>> > nondeterministic collations.  These functions are
>> >
>> > - position, strpos
>> > - replace
>> > - split_part
>> > - string_to_array
>> > - string_to_table
>> >
>> > which all use common internal infrastructure.
>>
>> > Some exploratory testing could be useful here.  The present test
>> > coverage was already quite helpful during development, but there is
>> > always the possibility that something was overlooked.
>
> I took a look at this patch.
>
>          * Most callers will require "greedy" semantics, meaning that
> we need
>          * to find the longest such substring, not the shortest.  For
> callers
>          * don't don't need greedy semantics, we can finish on the first
>
> s/don't don't/that don't/ ?

fixed

> -   Assert(len1 > 0);
>     Assert(len2 > 0);
>
> Is there a reason to remove this assert?

len1 is the length of the "haystack". The previous code did not call
text_position_setup() with an empty haystack, due to early exits at

/* Empty needle always matches at position 1 */

and

/* Otherwise, can't match if haystack is shorter than needle */

but the second one does not apply for nondeterministic collations, so we
have to deal with with zero-length haystacks.

>      * (With nondeterministic collations, the search was already
>      * multibyte-aware, so we don't need this.)
>
> s/was/is/

fixed

> The commit title could be changed to reflect that you are adding support for
> multiple functions. The POSITION gives the impression that it is only
> for the
> position() function. Something like
>
>     Support position search functions with nondeterministic collations

fixed

> I did a couple of tests (some are shown below) and I didn't find issues.

Thanks for the extra tests. I have committed this with the adjustments
mentioned above.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rucha Kulkarni 2025-02-21 11:50:15 Doubts regarding pg_freespacemap extension
Previous Message Michael Paquier 2025-02-21 11:46:03 Re: Missing [NO] INDENT flag in XMLSerialize backward parsing