Re: simplify regular expression locale global variables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: simplify regular expression locale global variables
Date: 2024-10-15 15:04:56
Message-ID: 3965498.1729004696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(at)eisentraut(dot)org> writes:
> but after the recent improvements to pg_locale_t handling, we don't need
> all three anymore. All the information we have is contained in
> pg_locale_t, so we just need to keep that one. This allows us to
> structure the locale-using regular expression code more similar to other
> locale-using code, mainly by provider, avoiding another layer that is
> specific only to the regular expression code. The first patch
> implements that.

I didn't read that patch in detail; somebody who's more familiar than
I with the recent locale-code changes ought to read it and confirm
that no subtle behavioral changes are sneaking in. But +1 for
concept.

> The second patch removes a call to pg_set_regex_collation() that I think
> is unnecessary.

I think this is actively wrong. pg_regprefix is engaged in
determining whether there's a fixed prefix of the regex, which
at least involves a sort of symbolic execution. As an example,
whether '^x' has a fixed prefix surely depends on whether the locale
is case-insensitive. (It may be that we get such cases wrong today,
since pg_regprefix was written before we had ICU locales and I don't
know if anyone has revisited it with this in mind. But removing this
pg_set_regex_collation call is surely not going to make that better.
In any case, the gain of removing it must be microscopic.)

> (I don't have any plans to get rid of the remaining global variable.
> That would certainly be nice from an intellectual point of view, but
> fiddling this into the regular expression code looks quite messy. In
> any case, it's probably easier with one variable instead of three, if
> someone wants to try.)

Yeah. Those global variables are my fault. I did try hard to avoid
having them, but came to the same conclusion that it was not worth
contorting the regex code to pass a locale pointer through it.
Maybe if we ever completely give up on maintaining code similarity
with the Tcl version, we should just bull ahead and do that; but for
now I don't want to.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-10-15 15:22:42 Re: generic plans and "initial" pruning
Previous Message Tom Lane 2024-10-15 14:48:19 Re: simplify regular expression locale global variables