From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date: | 2020-06-12 04:10:26 |
Message-ID: | CAKU4AWpHBWWX5rpQP_ZOFMvb_eh5bN+MxKF2HzE17eUZXX5XZQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 3, 2020 at 10:36 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
>> Thanks for running those tests. I had a quick look at the results and
>> I think to say that all 4 are better is not quite right. One is
>> actually a tiny bit slower and one is only faster due to a plan
>> change.
>>
>>
> Yes.. Thanks for pointing it out.
>
>
>> Q18 uses a result cache for 2 x nested loop joins and has a 0% hit
>> ratio. The execution time is reduced to 91% of the original time only
>> because the planner uses a different plan, which just happens to be
>> faster by chance.
>>
>
This case should be caused by wrong rows estimations on condition
o_orderkey in (select l_orderkey from lineitem group by l_orderkey having
sum(l_quantity) > 312). The estimation is 123766 rows, but the fact is 10
rows.
This estimation is hard and I don't think we should address this issue on
this
patch.
Q20 uses a result cache for the subplan and has a 0% hit ratio. The
>> execution time is 100.27% of the original time. There are 8620 cache
>> misses.
>>
>
>
This is by design for current implementation.
> For subplans, since we plan subplans before we're done planning the
> outer plan, there's very little information to go on about the number
> of times that the cache will be looked up. For now, I've coded things
> so the cache is always used for EXPR_SUBLINK type subplans. "
I first tried to see if we can have a row estimation before the subplan
is created and it looks very complex. The subplan was created during
preprocess_qual_conditions, at that time, we even didn't create the base
RelOptInfo , to say nothing of join_rel which the rows estimation happens
much later.
Then I see if we can delay the cache decision until we have the rows
estimation,
ExecInitSubPlan may be a candidate. At this time we can't add a new
ResutCache node, but we can add a cache function to SubPlan node with costed
based. However the num_of_distinct values for parameterized variable can't
be
calculated which I still leave it as an open issue.
--
Best Regards
Andy Fan
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-06-12 04:23:59 | Re: Transactions involving multiple postgres foreign servers, take 2 |
Previous Message | Fujii Masao | 2020-06-12 03:56:16 | Re: Resetting spilled txn statistics in pg_stat_replication |