Re: Using regexp from table has unpredictable poor performance

From: Jack Christensen <jack(at)jncsoftware(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Using regexp from table has unpredictable poor performance
Date: 2021-08-25 21:21:55
Message-ID: CAMovtNpox7hwtRGqf+nU17o_K3oFCxj_snOXnC5CNA5LarvN-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The optimizer was a bit too clever. It used the same plan for the LEFT
JOIN. But that put me on the right track. I tried a LATERAL join. But the
optimizer saw through that too and used the same plan. So I tried a
materialized CTE and that finally forced it to use a different plan. That
made it run in ~70ms -- about 18x faster. Thanks!

explain analyze
with r as materialized (
select * from matching_rules
where id >= 0 and id < 60
)
select r.id, i.id
from r
join items i on i.name ~ r.name_matches
;

QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=2.78..714.20 rows=230 width=8) (actual
time=0.071..69.545 rows=702 loops=1)
Join Filter: (i.name ~ r.name_matches)
Rows Removed by Join Filter: 45298
CTE r
-> Seq Scan on matching_rules (cost=0.00..2.78 rows=46 width=26)
(actual time=0.007..0.047 rows=46 loops=1)
Filter: ((id >= 0) AND (id < 60))
Rows Removed by Filter: 6
-> CTE Scan on r (cost=0.00..0.92 rows=46 width=36) (actual
time=0.008..0.090 rows=46 loops=1)
-> Materialize (cost=0.00..23.00 rows=1000 width=27) (actual
time=0.000..0.081 rows=1000 loops=46)
-> Seq Scan on items i (cost=0.00..18.00 rows=1000 width=27)
(actual time=0.003..0.092 rows=1000 loops=1)
Planning Time: 0.206 ms
Execution Time: 69.633 ms

On Wed, Aug 25, 2021 at 4:05 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Wed, Aug 25, 2021 at 11:47:43AM -0500, Jack Christensen wrote:
> > I have items that need to be categorized by user defined matching rules.
> > Trusted users can create rules that include regular expressions. I've
> > reduced the problem to this example.
>
> > I use the following query to find matches:
> >
> > select r.id, i.id
> > from items i
> > join matching_rules r on i.name ~ r.name_matches;
> >
> > When there are few rules the query runs quickly. But as the number of
> rules
> > increases the runtime often increases at a greater than linear rate.
>
> Maybe it's because the REs are cached by RE_compile_and_cache(), but if you
> loop over the REs in the inner loop, then the caching is ineffecive.
>
> Maybe you can force it to join with REs on the outer loop by writing it as:
> | rules LEFT JOIN items WHERE rules.id IS NOT NULL,
> ..to improve performance, or at least test that theory.
>
> --
> Justin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2021-08-25 22:16:26 Re: Using regexp from table has unpredictable poor performance
Previous Message Justin Pryzby 2021-08-25 21:05:00 Re: Using regexp from table has unpredictable poor performance