Re: allowing extensions to control planner behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-10-10 16:51:51
Message-ID: CA+TgmoZ=6jJi9TGyZCm33vads46HFkyz6Aju_saLT6GFS-iFug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 18, 2024 at 11:48 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Still, I think it's a pretty useful starting point. It is mostly
> enough to give you control over join planning, and if combined with
> similar work for scan planning, I think it would be enough for
> pg_hint_plan. If we also got control over appendrel and agg planning,
> then you could do a bunch more cool things.

Here's a new set of patches where I added a similar mechanism for scan
type control. See the commit message for some notes on limitations of
this approach. In the absence of better ideas, I'd like to proceed
with something along the lines of 0001 and 0002.

I upgraded the hint_via_alias contrib module (which is not intended
for commit, it's just a demo) so that it can hint either scan type or
join type. I think this is sufficient to demonstrate that it's quite
easy to use hooks to leverage this new infrastructure. In fact, the
biggest thing I'm unhappy about right now is the difficulty of
providing the hooks with any sort of useful information. I don't think
it should be the job of this patch set to solve that problem, but I do
think we should do something about it. The problem exists on two
levels:

1. If you want to specify in-query hints using comments, how does your
extension get access to the comments? I realize that a lot of people
hate in-query hints and hope they die in a fire, but pg_hint_plan does
implement them, among other things, and the way it does that is to
have a complete copy of the backend lexer so that it can re-lex the
query text and pull out the comments, which it can then parse. The
fact that someone was willing to do that is impressive, but it's
pretty ugly. Still, it's not clear what other approach you could
adopt. We could make the core system able to extract and pass through
comments to extensions. We could add new syntax so that instead of
saying SELECT ... FROM foo AS bar you can say SELECT ... FROM foo AS
bar ADVICE 'anything you want goes here' and arrange to pass that
string through to extensions. We could also spend a lot of time
ranting about how this is a terrible idea on principle and therefore
we shouldn't care about supporting it, but the horse is already out of
the barn, so I'm not very impressed by that approach.

2. If you want a certain base relation or join relation to be treated
in a certain way, how do you identify it? You might think that this is
easy because, even when a query contains multiple references to a
relation with the same name, or identical aliases in different parts
of the query, EXPLAIN renames them so they have disjoint names. What
would be nice is if you could feed those names back into your
extension and use them as a way of specifying what behavior you want
where. But that doesn't work, because what actually happens is that
the plan can contain duplicated aliases, and then when EXPLAIN
deparses it using ruleutils.c, that's when we rename things so that
they're distinct. This means that, at the time we're planning, we
don't yet know what name EXPLAIN will end up giving to any relation
yet, which means we can't use the names that EXPLAIN produced for an
earlier plan for the same query to associate behaviors with relations.
I wonder if we could think about reversing the order of operations
here and making it so that we do the distinct-ification during parse
analysis or maybe early in planning, so that the name you see EXPLAIN
print out is the real name of that thing and not just a value invented
for display purposes.

This second problem affects practically any use of the mechanism added
by this patch, as well as things like pg_hint_plan and EDB's own
internal implementation of planner hints. As far as I know, nobody has
a good solution, and everybody just punts. Hints get specified by
table name or table alias and then you hope that things match in the
right places. This is sort of workable if hints are what you're trying
to implement, but AFAICS it's a complete disaster if what you want to
do is recreate automatically a plan you saw before. If you're hinting
your queries and the hints aren't applying in quite the right places
because of some name collisions, you can maybe adjust the query to
avoid the collisions and still win. But if you are trying to recreate
a previous plan, you really need to look at what happened last time
and then make the same things happen in the same places this time, and
how are you supposed to do that if there's no unique key that you can
use to reliably identify the rels involved in the query? Before
somebody says it, I do realize that these patches as proposed aren't
enough to ensure reliably recreating a plan even if we had a perfect
solution to this problem, but you have to start someplace. It seems
fundamentally reasonable to me to say "hey, if I want to modify the
planner behavior, I need a way to say which part of the query plan
should get modified," and right now it appears to me that we don't
have that.

So again, I am definitely not saying that these patches get us all the
way to where we should be -- not in terms of the ability to control
the plan, and definitely not in terms of giving extensions all the
information they need to be effective. But if we insist on a perfect
solution before doing anything, we will never get anywhere, and I
personally believe these are going in a useful direction.

Comments, preferably constructive ones, welcome.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v4-0003-New-contrib-module-alphabet_join.patch application/octet-stream 4.7 KB
v4-0004-New-contrib-module-hint_via_alias.patch application/octet-stream 8.4 KB
v4-0002-Allow-extensions-to-control-scan-strategy.patch application/octet-stream 12.1 KB
v4-0001-Allow-extensions-to-control-join-strategy.patch application/octet-stream 50.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-10-10 17:41:13 Re: Avoiding superfluous buffer locking during nbtree backwards scans
Previous Message Alvaro Herrera 2024-10-10 16:38:36 Re: overflow bug for inhcounts