Re: Asymmetric partition-wise JOIN

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, KaiGai Kohei <kaigai(at)heterodb(dot)com>, sulamul(at)gmail(dot)com, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Aleksander Alekseev <afiskon(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2024-08-19 08:43:35
Message-ID: 944ed18c-3e7d-42ef-816e-0afc41610e93@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/8/2024 20:56, Alexander Korotkov wrote:
> On Tue, Apr 2, 2024 at 6:07 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> Actually, the idea I tried to express is the combination of #1 and #2:
> to build individual plan for every partition, but consider the 'Common
> Resources'. Let me explain this a bit more.
Thanks for keeping your eye on it!
> My idea is to introduce a new property for paths selection.
> 3) Usage of common resources. The common resource can be: hash
> representation of relation, memoize over relation scan, etc. We can
> exclude the cost of common resource generation from the path cost, but
> keep the reference for the common resource with its generation cost.
> If one path uses more common resources than another path, it could
> cost-dominate another one only if its cheaper together with its extra
> common resources cost. If one path uses less or equal common
> resources than another, it could normally cost-dominate another one.
The most challenging part for me is the cost calculation, which is
bonded with estimations of other paths. To correctly estimate the
effect, we need to remember at least the whole number of paths sharing
resources.
Also, I wonder if it can cause some corner cases where prediction error
on a shared resource will cause an even worse situation upstream.
I think we could push off here from an example and a counter-example,
but I still can't find them.

> However, I understand this is huge amount of work given we have to
> introduce new basic optimizer concepts. I get that the main
> application of this patch is sharding. If we have global tables
> residing each shard, we can push down any joins with them. Given this
> patch gives some optimization for non-sharded case, I think we
> *probably* can accept its concept even that it this optimization is
> obviously not perfect.
Yes, right now sharding is the most profitable case. We can push down
parts of the plan which references only some common resources:
FunctionScan, ValueScan, tables which can be proved are existed
everywhere and provide the same output. But for now it is too far from
the core code, IMO. - So, I search for cases that can be helpful for a
single instance.

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo Nagata 2024-08-19 09:01:54 Re: Drop database command will raise "wrong tuple length" if pg_database tuple contains toast attribute.
Previous Message px shi 2024-08-19 08:43:09 [Bug Fix]standby may crash when switching-over in certain special cases