Re: should we have a fast-path planning for OLTP starjoins?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2025-02-11 15:14:04
Message-ID: 7c26b2e7-90db-42e4-84e9-debbbfcc3c92@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/10/25 22:36, Robert Haas wrote:
> On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I don't think that's quite true. The order of dimension joins does not
>> matter because the joins do not affect the join size at all. The size of
>> |F| has nothing to do with that, I think. We'll do the same number of
>> lookups against the dimensions no matter in what order we join them. And
>> we know it's best to join them as late as possible, after all the joins
>> that reduce the size (and before joins that "add" rows, I think).
>
> This is often not quite true, because there are often restriction
> clauses on the fact tables that result in some rows being eliminated.
>
> e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id
> JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia';
>

True. I think this was discussed earlier in this thread - dimensions
with additional restrictions may affect the row count, and thus would be
exempt from this (and would instead go through the "regular" join order
search algorithm). So I assumed the "dimensions" don't have any such
restrictions in my message, I should have mentioned that.

> However, I think that trying to somehow leverage the existence of
> either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases,
> many of the joins don't change the row count, so we don't really need
> to explore all possible orderings of those joins. We might be able to
> define some concept of "join that does't change the row count at all"
> or maybe better "join that doesn't change the row count very much".
> And then if we have a lot of such joins, we can consider them as a
> group. Say we have 2 joins that do change the row count significantly,
> and then 10 more than don't. The 10 that don't can be done before,
> between, or after the two that do, but it doesn't seem necessary to
> consider doing some of them at one point and some at another.
>
> Maybe that's not the right way to think about this problem; I haven't
> read the academic literature on star-join optimization. But it has
> always felt stupid to me that we spend a bunch of time considering
> join orders that are not meaningfully different, and I think what
> makes two join orders not meaningfully different is that we're
> commuting joins that are not changing the row count.
>
> (Also worth noting: even joins of this general form change the row
> count, they can only reduce it.)
>

I searched for papers on star-joins, but pretty much everything I found
focuses on the OLAP case. Which is interesting, I think the OLTP
star-join I described seems quite different, and I'm not sure the trade
offs are necessarily the same.

My gut feeling is that in the first "phase" we should focus on the case
with no restrictions - that makes it obvious what the optimal order is,
and it will help a significant fraction of cases.

And then in the next step we can try doing something for cases with
restrictions - be it some sort of greedy algorithm, something that
leverages knowledge of the selectivity to prune join orders early
(instead of actually exploring all N! join orders like today). Or
something else, not sure.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-02-11 15:29:16 Re: should we have a fast-path planning for OLTP starjoins?
Previous Message jian he 2025-02-11 15:09:40 Re: Non-text mode for pg_dumpall