From: | Manuel Weitzman <manuelweitzman(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Jeremy Schneider <schnjere(at)amazon(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Planning performance problem (67626.278ms) |
Date: | 2021-06-20 00:09:58 |
Message-ID: | F06D0C00-DE75-4688-98B0-85A46E8F312C@gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello everyone,
> Apparently, the planner isn't reusing the data boundaries across alternative
> plans. It would be nicer if the planner remembered each column boundaries
> for later reuse (within the same planner execution).
I've written a very naive (and crappy) patch to show how adding
memorization to get_actual_variable_range() could help the planner on
scenarios with a big number of joins.
For the previous example,
> explain (analyze, buffers)
> select * from a
> join b b1 on (b1.a = a.a)
> join b b2 on (b2.a = a.a)
> where b1.a in (1,100,10000,1000000,1000001);
each time you add a join clause the planner has to read an extra ~5[K]
buffers and gets about 200[ms] slower.
1 join
Planning:
Buffers: shared hit=9 read=27329
Planning Time: 101.745 ms
Execution Time: 0.082 ms
2 joins
Planning:
Buffers: shared hit=42 read=81988
Planning Time: 303.237 ms
Execution Time: 0.102 ms
3 joins
Planning:
Buffers: shared hit=94 read=136660
Planning Time: 508.947 ms
Execution Time: 0.155 ms
4 joins
Planning:
Buffers: shared hit=188 read=191322
Planning Time: 710.981 ms
Execution Time: 0.168 ms
After adding memorization the cost in buffers remains constant and the
latency deteriorates only marginally (as expected) with each join.
1 join
Planning:
Buffers: shared hit=10 read=27328
Planning Time: 97.889 ms
Execution Time: 0.066 ms
2 joins
Planning:
Buffers: shared hit=7 read=27331
Planning Time: 100.589 ms
Execution Time: 0.111 ms
3 joins
Planning:
Buffers: shared hit=9 read=27329
Planning Time: 105.669 ms
Execution Time: 0.134 ms
4 joins
Planning:
Buffers: shared hit=132 read=27370
Planning Time: 155.716 ms
Execution Time: 0.219 ms
I'd be happy to improve this patch into something better. Though I'd
like suggestions on how to do it:
I have this idea of creating a local "memorization" struct instance within
standard_planner(). That would require passing on a pointer down until
it reaches get_actual_variable_range(), which I think would be quite
ugly, if done just to improve the planner for this scenario.
Is there any better mechanism I could reuse from other modules? (utils
or cache, for example).
Regards,
Manuel
Attachment | Content-Type | Size |
---|---|---|
actual_variable_range_memorization.diff | application/octet-stream | 5.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-06-20 21:06:31 | Re: Planning performance problem (67626.278ms) |
Previous Message | Vijaykumar Jain | 2021-06-18 18:40:08 | Re: Estimating wal_keep_size |