Re: should we have a fast-path planning for OLTP starjoins?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: should we have a fast-path planning for OLTP starjoins?
Date: 2025-02-04 20:06:53
Message-ID: 9edd7fa2-843d-4381-9836-5771c2c72b86@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/4/25 20:43, Jeff Davis wrote:
> On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote:
>> This is a surprisingly common query pattern in OLTP applications,
>> thanks
>> to normalization.
>
> +1. Creating a small lookup table should be encouraged rather than
> penalized.
>
> Your test data includes a fact table with 10k rows and no index on the
> filter condition. In OLTP applications the fact table might often fit
> in memory, but I'd still expect it to have an index on the filter
> condition. That might not change your overall point, but I'm curious
> why you constructed the test that way?
>

No particular reason. I think I intended to make it a lookup by PK
(which would match the use case examples), and I forgot about that. But
yeah, I would expect an index too.

>
>> There's a lot of stuff that could / should be improved on the current
>> patch. For (1) we might add support for more complex cases with
>> snowflake schemas [3] or with multiple fact tables. At the same time
>> (1)
>> needs to be very cheap, so that it does not regress every non-
>> starjoin
>> query.
>
> The patch only considers the largest table as the fact table, which is
> a good heuristic of course. However, I'm curious if other approaches
> might work. For instance, could we consider the table involved in the
> most join conditions to be the fact table?
>
> If you base it on the join conditions rather than the size of the
> table, then detection of the star join would be based purely on the
> query structure (not stats), which would be nice for predictability.
>

Right, there may be other (possibly better) ways to detect the star join
shape. I was thinking about also requiring for foreign keys on the join
clauses - in DWH systems FKeys are sometimes omitted, which would break
the heuristics, but in OLTP it's common to still have them.

I think the cost of the heuristic will be an important metric - I don't
know if the number of join conditions is more expensive to determine
than what the patch does now, though.

>> But the bigger question is whether it makes sense to have such fast-
>> path
>> modes for certain query shapes.
>
> We should explore what kinds of surprising cases it might create, or
> what maintenance headaches might come up with future planner changes.
> But the performance numbers you posted suggest that we should do
> something here.
>

Yes, it seems like an interesting opportunity for starjoin queries. It's
a pretty common query pattern, but it also happens to be very expensive
to plan because the dimensions can be reordered almost arbitrarily.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-02-04 20:11:29 Re: new commitfest transition guidance
Previous Message Robert Treat 2025-02-04 19:57:40 Re: Eagerly scan all-visible pages to amortize aggressive vacuum