Re: Support regular expressions with nondeterministic collations

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support regular expressions with nondeterministic collations
Date: 2024-12-18 20:42:26
Message-ID: c10ed44c7e5dcbb7b4597889f02d029298f0c919.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2024-12-18 at 14:55 -0500, Tom Lane wrote:
> It would not actually be too
> hard I think to build the right regex, if we had the information
> available as to what all the case-variants are. The problem at the
> moment is that the existing code assumes that pg_wc_tolower and
> pg_wc_toupper together give us all the case variants, and that
> API can't cope with multi-glyph expansions.

That's doable. I can do that after refactoring the ctype logic to use a
method table.

I'll have to think about how the API should look though. The maximum
amount of expansion that can occur during case folding is from one
codepoint to 3, and the maximum number of case variants is also ~3, so
it could fill in a caller-supplied 3x3 array of pg_wchar. Somewhat
awkward in C, so I welcome better ideas.

Note: if the string is not normalized consistently with the
pattern, pattern matching in general won't work very well. This has
always been true, but as we make pattern matching smarter we should be
more clear about that point.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-12-18 20:56:24 Re: Parametrization minimum password lenght
Previous Message Tom Lane 2024-12-18 20:22:09 Re: Using Expanded Objects other than Arrays from plpgsql