Re: allowing extensions to control planner behavior

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-08-28 20:29:43
Message-ID: 95e423ea042a8d2f85c76ee28feb7a5b10265f4d.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2024-08-26 at 12:32 -0400, Robert Haas wrote:
> I think there are two basic approaches that are possible here. If
> someone sees a third option, let me know. First, we could allow users
> to hook add_path() and add_partial_path().

...

> The other possible approach is to allow extensions to feed some
> information into the planner before path generation and let that
> influence which paths are generated.

Preserving a path for the right amount of time seems like the primary
challenge for most of the use cases you raised (removing paths is
easier than resurrecting one that was pruned too early). If we try to
keep a path around, that implies that we need to keep parent paths
around too, which leads to an explosion if we aren't careful.

But we already solved all of that for pathkeys. We keep the paths
around if there's a reason to (a useful pathkey) and there's not some
other cheaper path that also satisfies the same reason.

Idea: generalize the idea of "pathkeys" to work for other reasons to
preserve a path.

Mechanically, a hint to use an index could work very similarly: come up
with a custom reason to keep a path around, such as "a hint suggests we
use index foo_idx for table foo", and assign it a unique number. If
there's another hint that says we should also use index bar_idx for
table bar, then that reason would get a different unique reason number.
(In other words, the number of reasons would not be fixed; there could
be one reason for each hint specified in the query, kind of like there
could be many interesting pathkeys for a query.)

Each Path would have a "preserve_for_these_reasons" bitmapset holding
all of the non-cost reasons we are preserving that path. If two paths
have exactly the same set of reasons, then add_path() would only keep
the cheaper one.

We could get fancy and have a compare_reasons_hook that would allow you
to take two paths with the same reason and see if there are other
factors to consider that would cause both to still be preserved
(similar to pathkey length).

I suspect that we might see interesting applications of this mechanism
in core as well: for instance, track partition keys or other properties
relevant to parallelism. That could allow us to keep parallel-friendly
paths around and then decide later in the planning process whether to
actually parallelize them or not.

Once we've generalized the "reasons" mechnism, it would be easy enough
to have a hook to add reasons to a path as it's being generated to be
sure it's not lost. These hooks should probably be called in the
individual create_*_path() functions where there's enough context to
know what's happening. There could be many such hooks, but I suspect
only a handful of important ones.

This idea allows the extension author to preserve the right paths long
enough to use set_rel_pathlist_hook/set_join_pathlist_hook, which can
editorialize on costs or do its own pruning.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-08-28 20:35:18 Re: allowing extensions to control planner behavior
Previous Message Robert Haas 2024-08-28 20:27:38 query ID goes missing with extended query protocol