Re: Planning performance problem (67626.278ms)

From: Manuel Weitzman <manuelweitzman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Jeremy Schneider <schnjere(at)amazon(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Planning performance problem (67626.278ms)
Date: 2021-06-20 00:09:58
Message-ID: F06D0C00-DE75-4688-98B0-85A46E8F312C@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello everyone,

> Apparently, the planner isn't reusing the data boundaries across alternative
> plans. It would be nicer if the planner remembered each column boundaries
> for later reuse (within the same planner execution).

I've written a very naive (and crappy) patch to show how adding
memorization to get_actual_variable_range() could help the planner on
scenarios with a big number of joins.

For the previous example,

> explain (analyze, buffers)
> select * from a
> join b b1 on (b1.a = a.a)
> join b b2 on (b2.a = a.a)
> where b1.a in (1,100,10000,1000000,1000001);

each time you add a join clause the planner has to read an extra ~5[K]
buffers and gets about 200[ms] slower.

1 join
Planning:
Buffers: shared hit=9 read=27329
Planning Time: 101.745 ms
Execution Time: 0.082 ms

2 joins
Planning:
Buffers: shared hit=42 read=81988
Planning Time: 303.237 ms
Execution Time: 0.102 ms

3 joins
Planning:
Buffers: shared hit=94 read=136660
Planning Time: 508.947 ms
Execution Time: 0.155 ms

4 joins
Planning:
Buffers: shared hit=188 read=191322
Planning Time: 710.981 ms
Execution Time: 0.168 ms

After adding memorization the cost in buffers remains constant and the
latency deteriorates only marginally (as expected) with each join.

1 join
Planning:
Buffers: shared hit=10 read=27328
Planning Time: 97.889 ms
Execution Time: 0.066 ms

2 joins
Planning:
Buffers: shared hit=7 read=27331
Planning Time: 100.589 ms
Execution Time: 0.111 ms

3 joins
Planning:
Buffers: shared hit=9 read=27329
Planning Time: 105.669 ms
Execution Time: 0.134 ms

4 joins
Planning:
Buffers: shared hit=132 read=27370
Planning Time: 155.716 ms
Execution Time: 0.219 ms

I'd be happy to improve this patch into something better. Though I'd
like suggestions on how to do it:
I have this idea of creating a local "memorization" struct instance within
standard_planner(). That would require passing on a pointer down until
it reaches get_actual_variable_range(), which I think would be quite
ugly, if done just to improve the planner for this scenario.
Is there any better mechanism I could reuse from other modules? (utils
or cache, for example).

Regards,
Manuel

Attachment Content-Type Size
actual_variable_range_memorization.diff application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2021-06-20 21:06:31 Re: Planning performance problem (67626.278ms)
Previous Message Vijaykumar Jain 2021-06-18 18:40:08 Re: Estimating wal_keep_size