pgsql: Improve regex compiler's arc moving/copying logic.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Improve regex compiler's arc moving/copying logic.
Date: 2021-08-17 16:00:12
Message-ID: E1mG1Vc-0000ph-Q9@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Improve regex compiler's arc moving/copying logic.

The functions moveins(), copyins(), moveouts(), copyouts() are
required to preserve the invariant that there are no duplicate arcs in
the regex's NFA. Spencer's original implementation of them was O(N^2)
since it checked separately for a match to each source arc. In commit
579840ca0 I improved that by adding sort/merge logic to be used if
more than a few arcs are to be moved/copied. However, I now realize
that that missed a bet. At many call sites, the target state is newly
made and cannot have any existing in-arcs (respectively out-arcs)
that could be duplicates. So spending any cycles at all on checking
for duplicates is wasted effort; in these cases we can just blindly
move/copy all the source arcs. Add code paths to do that.

It turns out that for copyins()/copyouts(), *all* the call sites have
this property, making all the "improved" logic in them flat out
unreachable. Perhaps we'll need the full capability again someday,
so I just #ifdef'd those paths out rather than removing them entirely.

In passing, add a few test cases to improve code coverage in this
area as well as in regc_locale.c/regc_pg_locale.c.

Discussion: https://postgr.es/m/810272.1629064063@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/78a843f119ca7d4a6eb173a7ee3bed45889d48d8

Modified Files
--------------
src/backend/regex/regc_nfa.c | 66 +++++++++++-
.../modules/test_regex/expected/test_regex.out | 120 +++++++++++++++++++++
.../test_regex/expected/test_regex_utf8.out | 106 ++++++++++++++++++
src/test/modules/test_regex/sql/test_regex.sql | 32 ++++++
.../modules/test_regex/sql/test_regex_utf8.sql | 18 ++++
5 files changed, 338 insertions(+), 4 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2021-08-17 17:00:41 pgsql: Reduce assumptions about locale's behavior in new regex tests.
Previous Message Michael Meskes 2021-08-17 13:01:32 pgsql: Improved ECPG warning as suggested by Michael Paquier and remove