Re: allowing extensions to control planner behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-08-26 19:44:11
Message-ID: CA+TgmoYXgBVCnFhrW3X1NxpdjWtJCYRKP38PQ-AdR-RJziTBUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 26, 2024 at 2:00 PM Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
> It is the change I have been waiting for a long time. Remember how many
> kludge codes in pg_hint_plan, aqo, citus, timescale, etc., are written
> for only the reason of a small number of hooks - I guess many other
> people could cheer such work.

I think so, too. I know there are going to be people who hate this,
but I think the cat is already out of the bag. It's not a question any
more of whether it will happen, it's just a question of whether we
want to collaborate with extension developers or try to make their
life difficult.

> My personal most wanted list:
> - Selectivity list estimation hook
> - Groups number estimation hook
> - hooks on memory estimations, involving work_mem
> - add_path() hook
> - Hook on final RelOptInfo pathlist
> - a custom list of nodes in RelOptinfo, PlannerStmt, Plan and Query
> structures
> - Extensibility of extended and plain statistics
> - Hook on portal error processing
> - Canonicalise expressions hook

One of my chronic complaints about hooks is that people propose hooks
that are just in any random spot in the code where they happen to want
to change something. If we accept that, we end up with a lot of hooks
where nobody can say how the hook can be used usefully and maybe it
can't actually be used usefully even by the original author, or only
them and nobody else. So these kinds of proposals need detailed,
case-by-case scrutiny. It's unacceptable for the planner to get filled
up with a bunch of poorly-designed hooks just as it is for any other
part of the system, but well-designed hooks whose usefulness can
clearly be seen should be just as welcome here as anywhere else.

> IMO, it is better not to switch on/off algorithms, but allow extensions
> to change their cost multipliers, modifying costs balance. 10E9 looks
> like a disable, but multiplier == 10 for a cost node just provide more
> freedom for hashing strategies.

That may be a valid use case, but I do not think it is a typical use
case. In my experience, when people want to force the planner to do
something, they really mean it. They don't mean "please do it this way
unless you really, really don't feel like it." They mean "please do it
this way, period." And that is also what other systems provide. Oracle
could provide a hint MERGE_COST(foo,10) meaning make merge joins look
ten times as expensive but in fact they only provide MERGE and
NO_MERGE. And a "reproduce this previous plan" feature really demands
infrastructure that truly forces the planner to do what it's told,
rather than just nicely suggesting that it might want to do as it's
told. I wouldn't be sad at all if we happen to end up with a system
that's powerful enough for an extension to implement "make merge joins
ten times as expensive"; in fact, I think that would be pretty cool.
But I don't think it should be the design center for what we
implement, because it looks nothing like what existing PG or non-PG
systems do, at least in my experience.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-08-26 19:46:39 Re: Enable data checksums by default
Previous Message Nathan Bossart 2024-08-26 19:32:12 Re: Proposal for Updating CRC32C with AVX-512 Algorithm.