Re: allowing extensions to control planner behavior

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-08-26 18:00:54
Message-ID: aaf4250a-77db-43e3-9720-f60e2e4f5eba@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26/8/2024 18:32, Robert Haas wrote:
> I'm somewhat expecting to be flamed to a well-done crisp for saying
> this, but I think we need better ways for extensions to control the
> behavior of PostgreSQL's query planner. I know of two major reasons
It is the change I have been waiting for a long time. Remember how many
kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written
for only the reason of a small number of hooks - I guess many other
people could cheer such work.

> why somebody might want to do this. First, you might want to do
> something like what pg_hint_plan does, where it essentially implements
> Oracle-style hints that can be either inline or stored in a side table
> and automatically applied to queries.[1] In addition to supporting
> Oracle-style hints, it also supports some other kinds of hints so that
> you can, for example, try to fix broken cardinality estimates. Second,
My personal most wanted list:
- Selectivity list estimation hook
- Groups number estimation hook
- hooks on memory estimations, involving work_mem
- add_path() hook
- Hook on final RelOptInfo pathlist
- a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query
structures
- Extensibility of extended and plain statistics
- Hook on portal error processing
- Canonicalise expressions hook

> you might want to convince the planner to keep producing the same kind
> of plan that it produced previously. I believe this is what Amazon's
> query plan management feature[2] does, although since it is closed
> source and I don't work at Amazon maybe it's actually implemented
> completely differently. Regardless of what Amazon did in this case,
> plan stability is a feature people want. Just trying to keep using the
> same plan data structure forever doesn't seem like a good strategy,
> because for example it would be fragile in the case of any DDL
> changes, like dropping and recreating an index, or dropping or adding
As a designer of plan freezing feature [1] I can say it utilises
plancache and, being under its invalidation callbacks it doesn't afraid
DDL or any other stuff altering database objects.

> Unfortunately, the part about the hook having the freedom to delete
> paths isn't really true. Perhaps technically you can delete a path
> that you don't want to be chosen, but any paths that were dominated by
> the path you deleted have already been thrown away and it's too late
> to get them back. You can modify paths if you don't want to change
> their costs, but if you change their costs then you have the same
> problem: the contents of the pathlist at the time that you see it are
> determined by the costs that each path had when it was initially
> added, and it's really too late to editorialize on that. So all you
> can really do here in practice is add new paths.
From my standpoint, it is enough to export routines creating paths and
calculating costs.

> set_join_pathlist_hook, which applies to joinrels, is similarly
> limited. appendrels don't even have an equivalent of this hook.
>
> So, how could we do better?
>
> I think there are two basic approaches that are possible here. If
> someone sees a third option, let me know. First, we could allow users
> to hook add_path() and add_partial_path(). That certainly provides the
> flexibility on paper to accept or reject whatever paths you do or do
+1

> The attached patch, briefly mentioned above, essentially converts the
> enable_* GUCs into RelOptInfo properties where the defaults are set by
> the corresponding GUCs. The idea is that a hook could then change this
> on a per-RelOptInfo basis before path generation happens. For
IMO, it is better not to switch on/off algorithms, but allow extensions
to change their cost multipliers, modifying costs balance. 10E9 looks
like a disable, but multiplier == 10 for a cost node just provide more
freedom for hashing strategies.

[1] https://postgrespro.com/docs/enterprise/16/sr-plan

--
regards, Andrei Lepikhov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Vatsa 2024-08-26 18:09:35 Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
Previous Message Heikki Linnakangas 2024-08-26 17:54:13 Re: thread-safety: getpwuid_r()