From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: A problem about partitionwise join |
Date: | 2024-03-25 03:30:55 |
Message-ID: | CAMbWs48AOLfx+weJ6=2U1DuxBsWDnC3LBrQr9QXoDq6TBq+H6w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 19, 2024 at 3:40 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:
> On Tue, Mar 19, 2024 at 8:18 AM Richard Guo <guofenglinux(at)gmail(dot)com>
> wrote:
>
>> On Thu, Mar 7, 2024 at 7:13 PM Ashutosh Bapat <
>> ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>>
>>> Approach
>>> --------
>>> The equijoin condition between partition keys doesn't appear in the
>>> join's restrictilist because of 'best_score' strategy as you explained well
>>> in [2]. What if we add an extra score for clauses between partition keys
>>> and give preference to equijoin between partition keys? Have you given it a
>>> thought? I feel that having an equijoin clause involving partition keys has
>>> more usages compared to a clause with any random column. E.g. nextloop may
>>> be able to prune partitions from inner relation if the clause contains a
>>> partition key.
>>>
>>
>> Hmm, I think this approach won't work in cases where one certain pair of
>> partition keys has formed an EC that contains pseudoconstants. In such
>> cases, the EC machinery will generate restriction clauses like 'pk =
>> const' rather than any join clauses.
>>
>
> That should be ok and more desirable. Clauses like pk = const will leave
> only one partition around in each of the joining relations thus PWJ won't
> be required OR it will be automatic - whichever way you see it.
>
No, that's not true. There could be multiple partition keys, and the
particular key involved in the pushed-down restriction 'pk = const' may
not be able to prune away any partitions. To be concrete, consider the
query:
create table p (k1 int, k2 int, val int) partition by range(k1, k2);
create table p_1 partition of p for values from (1,1) to (10,100);
create table p_2 partition of p for values from (10,100) to (20,200);
set enable_partitionwise_join to on;
explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2
and foo.k2 = 5;
QUERY PLAN
-----------------------------------------
Hash Join
Hash Cond: (foo.k1 = bar.k1)
-> Append
-> Seq Scan on p_1 foo_1
Filter: (k2 = 5)
-> Seq Scan on p_2 foo_2
Filter: (k2 = 5)
-> Hash
-> Append
-> Seq Scan on p_1 bar_1
Filter: (k2 = 5)
-> Seq Scan on p_2 bar_2
Filter: (k2 = 5)
(13 rows)
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-03-25 04:18:23 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Masahiko Sawada | 2024-03-25 03:13:19 | Re: [PoC] Improve dead tuple storage for lazy vacuum |