From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | James Hunter <james(dot)hunter(dot)pg(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: should we have a fast-path planning for OLTP starjoins? |
Date: | 2025-02-10 21:36:13 |
Message-ID: | CA+TgmoYGU9amJt_P_H28rcZoaehJVN8ttaD5xL1p5Zb1HXX04w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> I don't think that's quite true. The order of dimension joins does not
> matter because the joins do not affect the join size at all. The size of
> |F| has nothing to do with that, I think. We'll do the same number of
> lookups against the dimensions no matter in what order we join them. And
> we know it's best to join them as late as possible, after all the joins
> that reduce the size (and before joins that "add" rows, I think).
This is often not quite true, because there are often restriction
clauses on the fact tables that result in some rows being eliminated.
e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id
JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia';
However, I think that trying to somehow leverage the existence of
either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases,
many of the joins don't change the row count, so we don't really need
to explore all possible orderings of those joins. We might be able to
define some concept of "join that does't change the row count at all"
or maybe better "join that doesn't change the row count very much".
And then if we have a lot of such joins, we can consider them as a
group. Say we have 2 joins that do change the row count significantly,
and then 10 more than don't. The 10 that don't can be done before,
between, or after the two that do, but it doesn't seem necessary to
consider doing some of them at one point and some at another.
Maybe that's not the right way to think about this problem; I haven't
read the academic literature on star-join optimization. But it has
always felt stupid to me that we spend a bunch of time considering
join orders that are not meaningfully different, and I think what
makes two join orders not meaningfully different is that we're
commuting joins that are not changing the row count.
(Also worth noting: even joins of this general form change the row
count, they can only reduce it.)
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2025-02-10 21:41:05 | Re: BitmapHeapScan streaming read user and prelim refactoring |
Previous Message | Peter Smith | 2025-02-10 21:29:28 | Re: describe special values in GUC descriptions more consistently |