Re: should we have a fast-path planning for OLTP starjoins?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2025-02-04 21:42:24
Message-ID: 46782de2-4950-48ce-bf5d-60688d680d75@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/4/25 21:23, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> On 2/4/25 20:43, Jeff Davis wrote:
>>> If you base it on the join conditions rather than the size of the
>>> table, then detection of the star join would be based purely on the
>>> query structure (not stats), which would be nice for predictability.
>
>> Right, there may be other (possibly better) ways to detect the star join
>> shape. I was thinking about also requiring for foreign keys on the join
>> clauses - in DWH systems FKeys are sometimes omitted, which would break
>> the heuristics, but in OLTP it's common to still have them.
>
> I think you need to insist on foreign keys. Otherwise you don't know
> whether the joins will eliminate fact-table rows. If that's a
> possibility then it's no longer sensible to ignore different join
> orders.
>

Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.

Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.

I was thinking that if we allow the dimensions to eliminate rows in the
fact table, we'd simply join them starting from the most selective ones.
But that doesn't work if the joins might have different per-row costs
(e.g. because some dimensions are much larger etc). Doing something
smarter would likely end up fairly close to the regular join order
algorithm ...

> I'm kind of imagining a planner rule like "if table X is joined to
> using a match of a foreign-key column to its PK (so that the join
> removes no rows from the other table) and there are not other
> restriction conditions on table X, then force X to be joined last.
> And if there are multiple such tables X, it doesn't matter what
> order they are joined in as long as they're last."
>

I think it'd need to be a bit smarter, to handle (a) snowflake schemas
and (b) additional joins referencing the starjoin result.

The (a) shouldn't be too hard, except that it needs to check the
'secondary dimension' is also joined by FK and has no restrictions, and
then do that join later.

For (b), I don't have numbers but I've seen queries that first do a
starjoin and then add more data to that, e.g. by joining to a
combination of attributes from multiple dimensions (think region +
payment type). Or by joining to some "summary" table that does not have
an explicit FK. Still, we could leave at least some of the joins until
the very end, I guess. But even for the dimensions joined earlier the
order does not really matter.

I think (a) is something we should definitely handle. (b) is more a
DWH/BI thing, not really an OLTP query (which is what this thread is about).

> The interesting thing about this is we pretty much have all the
> infrastructure for detecting such FK-related join conditions
> already. Possibly the join order forcing could be done with
> existing infrastructure too (by manipulating the joinlist).
>

Maybe, interesting. I've ruled out relying on the FKeys early in the
coding, but I'm sure there's infrastructure the patch could use. It'd
still need to check the transitive FK relationships for snowflake joins
to work, ofc. Which is not something we need to consider right now.

What kind of "manipulation" of the joinlist you have in mind?

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-02-04 21:43:57 Re: vacuumdb changes for stats import/export
Previous Message Frédéric Yhuel 2025-02-04 21:41:37 Re: doc: explain pgstatindex fragmentation