Re: Support run-time partition pruning for hash join

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support run-time partition pruning for hash join
Date: 2023-08-29 10:41:28
Message-ID: CAMbWs488_uMqmyQo0Cq-XrDSiEqiXCnfJtCyhXrWn_HdBen3Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 25, 2023 at 11:03 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> I'd suggest writing some cost which costs an execution of run-time
> pruning. With LIST and RANGE you probably want something like
> cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
> for the binary search over the sorted datum array. For HASH
> partitions, something like cpu_operator_cost * npartcols once for each
> hashed tuple.
>
> You'll need to then come up with some counter costs to subtract from
> the Append/MergeAppend. This is tricky, as discussed. Just come up
> with something crude for now.
>
> To start with, it could just be as crude as:
>
> total_costs *= (Min(expected_outer_rows, n_append_subnodes) /
> n_append_subnodes);
>
> i.e assume that every outer joined row will require exactly 1 new
> partition up to the total number of partitions. That's pretty much
> worst-case, but it'll at least allow the optimisation to work for
> cases like where the hash table is expected to contain just a tiny
> number of rows (fewer than the number of partitions)
>
> To make it better, you might want to look at join selectivity
> estimation and see if you can find something there to influence
> something better.

I have a go at writing some costing codes according to your suggestion.
That's compute_partprune_cost() in the v2 patch.

For the hash side, this function computes the pruning cost as
cpu_operator_cost * LOG2(nparts) * inner_rows for LIST and RANGE, and
cpu_operator_cost * nparts * inner_rows for HASH.

For the Append/MergeAppend side, this function first estimates the size
of outer side that matches, using the same idea as we estimate the
joinrel size for JOIN_SEMI. Then it assumes that each outer joined row
occupies one new partition (the worst case) and computes how much cost
can be saved from partition pruning.

If the cost saved from the Append/MergeAppend side is larger than the
pruning cost from the Hash side, then we say that partition pruning is a
win.

Note that this costing logic runs for each Append-Hash pair, so it copes
with the case where we have multiple join levels.

With this costing logic added, I performed the same performance
comparisons of the worst case as in [1], and here is what I got.

tuples unpatched patched
10000 44.66 44.37 -0.006493506
20000 52.41 52.29 -0.002289639
30000 61.11 61.12 +0.000163639
40000 67.87 68.24 +0.005451599
50000 74.51 74.75 +0.003221044
60000 82.3 81.55 -0.009113001
70000 87.16 86.98 -0.002065168
80000 93.49 93.89 +0.004278532
90000 101.52 100.83 -0.00679669
100000 108.34 108.56 +0.002030644

So the costing logic successfully avoids performing the partition
pruning in the worst case.

I also tested the cases where partition pruning is possible with
different sizes of the hash side.

tuples unpatched patched
100 36.86 2.4 -0.934888768
200 35.87 2.37 -0.933928074
300 35.95 2.55 -0.92906815
400 36.4 2.63 -0.927747253
500 36.39 2.85 -0.921681781
600 36.32 2.97 -0.918226872
700 36.6 3.23 -0.911748634
800 36.88 3.44 -0.906724512
900 37.02 3.46 -0.906537007
1000 37.25 37.21 -0.001073826

The first 9 rows show that the costing logic allows the partition
pruning to be performed and the pruning turns out to be a big win. The
last row shows that the partition pruning is disallowed by the costing
logic because it thinks no partition can be pruned (we have 1000
partitions in total).

So it seems that the new costing logic is quite crude and tends to be
very conservative, but it can help avoid the large overhead in the worst
cases. I think this might be a good start to push this patch forward.

Any thoughts or comments?

[1]
https://www.postgresql.org/message-id/CAMbWs49%2Bp6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q%40mail.gmail.com

Thanks
Richard

Attachment Content-Type Size
v2-0001-Support-run-time-partition-pruning-for-hash-join.patch application/octet-stream 57.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-08-29 11:11:30 Re: Standardize spelling of "power of two"
Previous Message Heikki Linnakangas 2023-08-29 10:35:44 Re: Sync scan & regression tests