grouping_planner refactoring

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: grouping_planner refactoring
Date: 2019-02-14 01:30:31
Message-ID: b720a3ea-111d-861c-2264-2736d614fbe5@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It would help the project to "speed up partition planning" [1] a bit if
grouping_planner didn't call query_planner directly. grouping_planner's
main role seems to be adding Path(s) for the "top-level" operations of the
query such as grouping, aggregation, etc. on top of Path(s) for scan/join
paths produced by query_planner(). ISTM, scan/join paths themselves could
very well be generated *before* we get into grouping_planner, that is, by
calling query_planner before calling grouping_planner. Some of the
top-level processing code in grouping_planner depends on the information
produced by some code in the same function placed before where
query_planner is called, but we could share that information between
grouping_planner and its caller where that information would be generated.

Attached patch shows one possible way that could be done.

Over in [1], the premise of the one of the patches is that
inheritance_planner gets slow as the number of children increases, because
it invokes query_planner repeatedly (via grouping_planner) on the
translated query tree for *all* target child relations. For partitioned
tables, that also means that partition pruning cannot be used, making
UPDATE vastly slower compared to SELECT. Based on that premise, the
patch's solution is to invoke query_planner only once at the beginning by
passing it the original query tree. That will generate scan Paths for all
target and non-target base relations (partition pruning can be used to
quickly determine target partitions) and join paths per target child
relation. Per-target-child join paths are generated by repeatedly running
make_rel_from_joinlist on translated joinlist wherein the top-parent
target relation reference is replaced by those to individual child target
relations. So, query_planner now effectively generates N top-level
scan/join RelOptInfos for N target child relations, which are tucked away
in the top PlannerInfo. Back in inheritance_planner, grouping_planner is
called to apply the final PathTarget to individual scan/join paths
collected above based on each target child relation's row type, but
query_planner is NOT called again during these grouping_planner
invocations. The way that's currently implemented by the patch seems a
bit hacky, but if we refactor grouping_planner like I described above,
then there's no need for grouping_planner to behave specially for
inheritance_planner (not minding the inherited_update argument).

Thoughts?

Thanks,
Amit

[1] https://commitfest.postgresql.org/22/1778/

Attachment Content-Type Size
0001-Refactor-planner.c-a-bit.patch text/plain 42.9 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2019-02-14 01:31:49 RE: Protect syscache from bloating with negative cache entries
Previous Message Tom Lane 2019-02-14 01:11:23 Re: proposal: pg_restore --convert-to-text