From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support regular expressions with nondeterministic collations |
Date: | 2024-12-18 19:55:24 |
Message-ID: | 2808617.1734551724@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Mon, 2024-12-16 at 17:16 -0500, Tom Lane wrote:
>> The existing logic in the regex engine for case-insensitive matching
>> is to convert every letter to a bracket expression containing all
>> its case variants. For example, "a" becomes "[aA]" and "[xY1]"
>> becomes "[xXyY1]". This fails on "ß", so a better way would be
>> nice...
> We have a couple options:
> * create more complex regexes like "(ß|[sS][sS])"
> * case fold the pattern first, and then lazily case fold the string as
> we match against it
> The former sounds faster but the latter sounds simpler.
Yeah, the latter sounds really slow. It would not actually be too
hard I think to build the right regex, if we had the information
available as to what all the case-variants are. The problem at the
moment is that the existing code assumes that pg_wc_tolower and
pg_wc_toupper together give us all the case variants, and that
API can't cope with multi-glyph expansions.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2024-12-18 20:13:25 | Re: Can rs_cindex be < 0 for bitmap heap scans? |
Previous Message | Jeff Davis | 2024-12-18 18:47:02 | Re: Final result (display) collation? |