Re: Adaptive Plan Sharing for PreparedStmt

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adaptive Plan Sharing for PreparedStmt
Date: 2021-05-25 11:02:14
Message-ID: CAKU4AWo2G7CkoPT90gzKe268hpcgaRZnXXm74P3AbsD3gNbCEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, May 20, 2021 at 5:02 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

> Hi,
>
> On 5/20/21 5:43 AM, Andy Fan wrote:
> > Currently we are using a custom/generic strategy to handle the data skew
> > issue. However, it doesn't work well all the time. For example: SELECT *
> > FROM t WHERE a between $1 and $2. We assume the selectivity is 0.0025,
> > But users may provide a large range every time. Per our current strategy,
> > a generic plan will be chosen, Index scan on A will be chosen. oops..
> >
>
> Yeah, the current logic is rather simple, which is however somewhat on
> purpose, as it makes the planning very cheap. But it also means there's
> very little info to check/compare and so we may make mistakes.
>
> > I think Oracle's Adaptive Cursor sharing should work. First It calculate
> > the selectivity with the real bind values and generate/reuse different
> plan
> > based on the similarity of selectivity. The challenges I can think of
> > now are:
> > a). How to define the similarity. b). How to adjust the similarity
> > during the
> > real run. for example, we say [1% ~ 10%] is similar. but we find
> > selectivity 20%
> > used the same plan as 10%. what should be done here.
> >
>
> IMO the big question is how expensive this would be. Calculating the
> selectivities for real values (i.e. for each query) is not expensive,
> but it's not free either. So even if we compare the selectivities in
> some way and skip the actual query planning, it's still going to impact
> the prepared statements.
>

That's true if we need to do this every time. We may just need to do
this on some cases where the estimation is likely to be wrong, like a > $1;
or
a between $1 and $2; In such cases, we just use the hard coded value
currently.

> Also, we currently don't have any mechanism to extract the selectivities
> from the whole query - not sure how complex that would be, as it may
> involve e.g. join selectivities.
>
> The idea in my mind is just checking the quals on base relations. like
t1.a > $1.
So for something like t1.a + t2.a > $1 will be ignored.

>
> As for how to define the similarity, I doubt there's a simple and
> sensible/reliable way to do that :-(
>
> I remember reading a paper about query planning in which the parameter
> space was divided into regions with the same plan. In this case the
> parameters are selectivities for all the query operations. So what we
> might do is this:
>
> 1) Run the first N queries and extract the selectivities / plans.
>
> 2) Build "clusters" of selecitivies with the same plan.
>
> 3) Before running a query, see if it the selectivities fall into one of
> the existing clusters. If yes, use the plan. If not, do regular
> planning, add it to the data set and repeat (2).
>
> I have no idea how expensive would this be, and I assume the "clusters"
> may have fairly complicated shapes (not simple convex regions).
>
>
Thanks for sharing this, we do have lots of things to do here. Your idea
should be a good place to start with.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-25 11:20:01 Re: Assertion failure while streaming toasted data
Previous Message Aleksander Alekseev 2021-05-25 10:55:13 Add ZSON extension to /contrib/