Re: Asymmetric partition-wise JOIN

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Aleksander Alekseev <afiskon(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, KaiGai Kohei <kaigai(at)heterodb(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2024-08-01 22:51:11
Message-ID: CAPpHfdu3NZsym5mCsy-RAjenNodnenySdwSbs4pJojN1bq1oig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 5, 2024 at 5:55 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> On 18/10/2023 16:59, Ashutosh Bapat wrote:
> > On Wed, Oct 18, 2023 at 10:55 AM Andrei Lepikhov
> >>> The relid is also used to track the scans at executor level. Since we
> >>> have so many scans on A, each may be using different plan, we will
> >>> need different ids for those.
> >>
> >> I don't understand this sentence. Which way executor uses this index of
> >> RelOptInfo ?
> >
> > See Scan::scanrelid
> >
> Hi,
>
> In the attachment, you will find a fresh version of the patch.
> I've analysed the danger of the same RelOptInfo index for the executor.
> In the examples I found (scared), it is still not a problem because
> ExecQual() does all the jobs at one operation and doesn't intersect with
> over operations. Of course, it is not a good design, and we will work on
> this issue. But at least this code can be used in experiments.
> Furthermore, I've shared some reflections on this feature. To avoid
> cluttering the thread, I've published them in [1]. These thoughts
> provide additional context and considerations for our ongoing work.
>
> [1]
> https://danolivo.substack.com/p/postgresql-asymmetric-join-technique?r=34q1yy

I've rebased the patch to the current master. Also, I didn't like the
needFlatCopy argument to reparameterize_path_by_child(). It looks
quite awkward. Instead, as soon as we need to copy paths, I've
enabled native copy of paths. Now, we can do just copyObject() over
path in caller. Looks much cleaner for me. What do you think?

Other notes:

1) I think we need to cover the cases, which
is_inner_rel_safe_for_asymmetric_join() filters out, by regression
tests.
2) is_asymmetric_join() looks awkward for me. Should we instead make
a flag in JoinPath?
3) I understand that you have re-use RelOptInfo multiple times. It's
too late stage of query processing to add a simple relation into
planner structs. I tried rescans issued by cursors, EvalPlanQual()
caused by concurrent updates, but didn't manage to break this. It
seems that even if same relation index is used multiple times in
different places of a query, it never get used simultaneously. But
even if this somehow is OK, this is significant change of assumptions
in planner/executor data structures. Perhaps, we need at least Tom's
opinion on this.

------
Regards,
Alexander Korotkov
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-01 22:51:57 Casts from jsonb to other types should cope with json null
Previous Message Pavel Stehule 2024-08-01 22:02:27 Re: proposal: schema variables