From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
Cc: | Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Support run-time partition pruning for hash join |
Date: | 2023-08-22 06:38:09 |
Message-ID: | CAApHDvp+O0PtZMWomyYg=RE68d1wBnc-yxjJsDK3hmYBnP=-qw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 22 Aug 2023 at 00:34, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> On Mon, Aug 21, 2023 at 11:48 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>> 1. All the join partition prunning decisions are made in createplan.c
>> where the best path tree has been decided. This is not great. Maybe
>> it's better to make it happen when we build up the path tree, so that
>> we can take the partition prunning into consideration when estimating
>> the costs.
>
>
> fwiw, the current master totally ignores the cost reduction for run-time
> partition prune, even for init partition prune. So in some real cases,
> pg chooses a hash join just because the cost of nest loop join is
> highly over estimated.
This is true about the existing code. It's a very tricky thing to cost
given that the parameter values are always unknown to the planner.
The best we have for these today is the various hardcoded constants in
selfuncs.h. While I do agree that it's not great that the costing code
knows nothing about run-time pruning, I also think that run-time
pruning during execution with parameterised nested loops is much more
likely to be able to prune partitions and save actual work than the
equivalent with Hash Joins. It's more common for the planner to
choose to Nested Loop when there are fewer outer rows, so the pruning
code is likely to be called fewer times with Nested Loop than with
Hash Join.
With Hash Join, it seems to me that the pruning must take place for
every row that makes it into the hash table. There will be maybe
cases where the unioned set of partitions simply yields every
partition and all the work results in no savings. Pruning on a scalar
value seems much more likely to be able to prune away unneeded
Append/MergeAppend subnodes.
Perhaps there can be something adaptive in Hash Join which stops
trying to prune when all partitions must be visited. On a quick
glance of the patch, I don't see any code in ExecJoinPartitionPrune()
which gives up trying to prune when the number of members in
part_prune_result is equal to the prunable Append/MergeAppend
subnodes.
It would be good to see some performance comparisons of the worst case
to see how much overhead the pruning code adds to Hash Join. It may
well be that we need to consider two Hash Join paths, one with and one
without run-time pruning. It's pretty difficult to meaningfully cost,
as I already mentioned, however.
>> 4. Is it possible and worthwhile to extend the join partition prunning
>> mechanism to support nestloop and mergejoin also?
>
>
> In my current knowledge, we have to build the inner table first for this
> optimization? so hash join and sort merge should be OK, but nestloop should
> be impossible unless I missed something.
But run-time pruning already works for Nested Loops... I must be
missing something here.
I imagine for Merge Joins a more generic approach would be better by
implementing parameterised Merge Joins (a.k.a zigzag merge joins).
The Append/MergeAppend node could then select the correct partition(s)
based on the current parameter value at rescan. I don't think any code
changes would be needed in node[Merge]Append.c for that to work. This
could also speed up Merge Joins to non-partitioned tables when an
index is providing presorted input to the join.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-08-22 06:48:15 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Previous Message | Peter Smith | 2023-08-22 06:01:41 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |