Re: apply_scanjoin_target_to_paths and partitionwise join

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrei Lepikhov <lepihov(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, arne(dot)roland(at)malkut(dot)net, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join
Date: 2025-01-06 06:07:43
Message-ID: CAExHW5uw6KJ0RZ4DqdCi=iJPFU90wSLr3Qa-AtH-5MshfM5C4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 3, 2025 at 3:02 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> I think it's unhelpful that you keep calling this a "bug" when the
> behavior is clearly deliberate. Whether it's the *right* behavior is
> debatable, but it's not *accidental* behavior.
>

Ok, let's call it "not right" behaviour :). Let me further expand on
the explanation in my first email [1].
After the planner has added all possible paths,
apply_scanjoin_target_to_paths(), which should be just adjusting their
targets, zaps them all. That looks weird. But the comment explains why
its doing so.

-- quote comment
* This function would still be correct if we kept the
* existing paths: we'd modify them to generate the correct target above
* the partitioning Append, and then they'd compete on cost with paths
* generating the target below the Append. However, in our current cost
* model the latter way is always the same or cheaper cost, so modifying
* the existing paths would just be useless work. Moreover, when the cost
* is the same, varying roundoff errors might sometimes allow an existing
* path to be picked, resulting in undesirable cross-platform plan
* variations.
--
The comment mentions only "partitioning Append" and not "Join" paths.
A simple partitioned relation's pathlist contains only append paths
but a partitioned join relation's pathlist contains join paths as well
as append (of join) paths. What the comment says is true for both
Append of scans as well as Append of joins, but not for join paths
joining append paths - non-partition wise paths. In such paths we
should be adjusting only the target of the join path and not that of
the paths under the Append. The comment does not mention Join over
append at all. The code discards both join as well as append paths but
rebuilds only append paths. There is no comment in the function
explaining why we omit join of append paths. So the code does not seem
intentional to me as far as partitionwise joins are concerned. Losing
a costwise optimal path doesn't seem to be something that should
happen while adjusting path targets (unless while adjusting path
targets we find a lower cost path, but we aren't keeping the old paths
around for comparison.)

> On Thu, Jan 2, 2025 at 3:58 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I am wondering if the problem is not that the plan is slower, it's
> > that for some reason the planner took a lot longer to create it.
> > It's very plausible that partitionwise planning takes longer, and
> > maybe we have some corner cases where the time is O(N^2) or worse.
>
> That doesn't seem like a totally unreasonable speculation, but it
> seems a little surprising that retaining the non-partitionwise paths
> would fix it. True, that might let us discard a bunch of partitionwise
> paths more quickly than would otherwise be possible, but I wouldn't
> expect that to have an impact as dramatic as what Jakub alleged. The
> thing I thought about was whether there might be some weird effects
> with lots of empty partitions; or maybe with some other property of
> the path like say sort keys or parallelism. For example if we couldn't
> generate a partitionwise path with sort keys as good as the
> non-partitionwise path had, or if we couldn't generate a parallel
> partitionwise path but we could generate a parallel non-partitionwise
> path. As far as I knew neither of those things are real problems, but
> if they were then I believe they could pretty easily explain a large
> regression.
>
> > However, this is pure speculation without a test case, and any
> > proposed fix would be even more speculative. I concur with your
> > bottom line: we should insist on a public test case before deciding
> > what to do about it.
>

That's a valid ask. AFAIR, it's a quite tricky scenario. Both Jakub
and myself have tried but could not reproduce the issue. Let me mark
this CF entry as returned with feedback and resurrect it when we have
a reproduction.

[1] https://www.postgresql.org/message-id/CAExHW5toze58+jL-454J3ty11sqJyU13Sz5rJPQZDmASwZgWiA@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2025-01-06 06:40:39 Re: Fix crash when non-creator being an iteration on shared radix tree
Previous Message Nisha Moond 2025-01-06 05:26:09 Re: Conflict detection for update_deleted in logical replication