Re: Planning performance problem (67626.278ms)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Manuel Weitzman <manuelweitzman(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jeremy Schneider <schnjere(at)amazon(dot)com>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Planning performance problem (67626.278ms)
Date: 2021-06-20 23:23:40
Message-ID: CAEudQApVF7TX-EJXwBz4TUhH6NzNB-wXqhs4wj+qSe6Vw3H5-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Em dom., 20 de jun. de 2021 às 14:50, Manuel Weitzman <
manuelweitzman(at)gmail(dot)com> escreveu:

> Hello everyone,
>
> > Apparently, the planner isn't reusing the data boundaries across
> alternative
> > plans. It would be nicer if the planner remembered each column boundaries
> > for later reuse (within the same planner execution).
>
> I've written a very naive (and crappy) patch to show how adding
> memorization to get_actual_variable_range() could help the planner on
> scenarios with a big number of joins.
>
> For the previous example,
>
> > explain (analyze, buffers)
> > select * from a
> > join b b1 on (b1.a = a.a)
> > join b b2 on (b2.a = a.a)
> > where b1.a in (1,100,10000,1000000,1000001);
>
> each time you add a join clause the planner has to read an extra ~5[K]
> buffers and gets about 200[ms] slower.
>
> 1 join
> Planning:
> Buffers: shared hit=9 read=27329
> Planning Time: 101.745 ms
> Execution Time: 0.082 ms
>
> 2 joins
> Planning:
> Buffers: shared hit=42 read=81988
> Planning Time: 303.237 ms
> Execution Time: 0.102 ms
>
> 3 joins
> Planning:
> Buffers: shared hit=94 read=136660
> Planning Time: 508.947 ms
> Execution Time: 0.155 ms
>
> 4 joins
> Planning:
> Buffers: shared hit=188 read=191322
> Planning Time: 710.981 ms
> Execution Time: 0.168 ms
>
>
> After adding memorization the cost in buffers remains constant and the
> latency deteriorates only marginally (as expected) with each join.
>
> 1 join
> Planning:
> Buffers: shared hit=10 read=27328
> Planning Time: 97.889 ms
> Execution Time: 0.066 ms
>
> 2 joins
> Planning:
> Buffers: shared hit=7 read=27331
> Planning Time: 100.589 ms
> Execution Time: 0.111 ms
>
> 3 joins
> Planning:
> Buffers: shared hit=9 read=27329
> Planning Time: 105.669 ms
> Execution Time: 0.134 ms
>
> 4 joins
> Planning:
> Buffers: shared hit=132 read=27370
> Planning Time: 155.716 ms
> Execution Time: 0.219 ms
>
>
> I'd be happy to improve this patch into something better. Though I'd
> like suggestions on how to do it:
> I have this idea of creating a local "memorization" struct instance within
> standard_planner(). That would require passing on a pointer down until
> it reaches get_actual_variable_range(), which I think would be quite
> ugly, if done just to improve the planner for this scenario.
> Is there any better mechanism I could reuse from other modules? (utils
> or cache, for example).
>
Without going into the merits of whether this cache will be adopted or not,
I have some comments about the code.

1. Prefer to use .patch instead of .diff, it makes it easier for browsers
such as firefox to read and show the content automatically.
2. New struct?
Oid is unsigned int, lower than int64.
Better struct is:
+struct ActualVariableRangeCache {
+ int64 min_value; /* 8 bytes */
+ int64 max_value; /* 8 bytes */
+ Oid indexoid; /* 4 bytes */
+ bool has_min; /* 1 byte */
+ bool has_max; /*1 byte */
+};
Takes up less space.

3. Avoid use of type *long*, it is very problematic with 64 bits.
Windows 64 bits, long is 4 (four) bytes.
Linux 64 bits, long is 8 (eight) bytes.

4. Avoid C99 style declarations
for(unsigned long i = 0;)
Prefer:
size_t i;
for(i = 0;)
Helps backpatching to C89 versions.

regards,
Ranier Vilela

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dean Gibson (DB Administrator) 2021-06-20 23:53:35 Re: Estimating wal_keep_size
Previous Message Tom Lane 2021-06-20 21:06:31 Re: Planning performance problem (67626.278ms)