Re: should we have a fast-path planning for OLTP starjoins?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2025-02-05 12:22:55
Message-ID: bcc47569-f3a5-48af-8d57-446ed045584d@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/5/25 09:27, Richard Guo wrote:
> On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Right now, if we have four tables to join, we have a joinlist
>> (A B C D). (Really they're integer relids, but let's use names here.)
>> If we decide to force C to be joined last, it should be sufficient to
>> convert this to ((A B D) C). If C and D both look like candidates for
>> this treatment, we can make it be (((A B) C) D) or (((A B) D) C).
>> This is pretty much the same thing that happens if you set
>> join_collapse_limit to 1 and use JOIN syntax to force a join order.
>> In fact, IIRC we start out with nested joinlists and there is some
>> code that normally flattens them until it decides it'd be creating
>> too large a sub-problem. I'm suggesting selectively reversing the
>> flattening.
>
> This should not be too difficult to implement. Outer joins seem to
> add some complexity, though. We need to ensure that the resulting
> joins in each sub-list are legal given the query's join order
> constraints. For example, if we make the joinlist be (((A B) C) D),
> we need to ensure that the A/B join and the (A/B)/C join are legal.
>

If the requirement is that all "dimensions" only join to the fact table
(which in this example would be "A" I think) through a FK, then why
would these joins be illegal?

We'd also need to require either an outer (left) join, or "NOT NULL" on
the fact table side, right? IIRC we already do that when using the FKeys
for join estimates.

Essentially, this defines a "dimension" as a relation that is joined
through a PK, without any other restrictions, both of which seems fairly
simple to check, and it's a "local" feature. And we'd simply mark those
as "join at the very end, in arbitrary order". Easy enough, I guess.

I'm thinking about some more complex cases:

(a) Query with multiple starjoins (a special case of that is snowflake
schema) - but I guess this is not too difficult, it just needs to
consider the FKs as "transitive" (a bit like Dijkstra's algorithm). In
the worst case we might need to "split" the whole query into multiple
smaller subproblems.

(b) Joining additional stuff to the dimensions (not through a FK,
possibly to multiple dimensions, ...). Imagine a "diamond join" with
some summary table, etc. IMHO this is a fairly rare case / expensive
enough to make the planning part less important.

I'm also wondering how this should interact with join_collapse_limit. It
would seems ideal to do this optimization before splitting the join list
into subproblems (so that the "dimensions" are do not even count against
the limit, pretty much). But that would mean join_collapse_limit can no
longer be used to enforce a join order like today ...

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2025-02-05 12:30:44 Remove unnecessary static specifier
Previous Message Alena Rybakina 2025-02-05 12:03:24 Re: Vacuum statistics