Re: allowing extensions to control planner behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "chungui(dot)wcg" <wcg2008zl(at)126(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-08-27 16:11:18
Message-ID: CA+TgmobN6-dY2e4UJuH7wvo4zGTpZxy_frbrJEN5PzubC2sUGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 27, 2024 at 2:44 AM chungui.wcg <wcg2008zl(at)126(dot)com> wrote:
> I really admire this idea.

Thanks.

> here is my confusion: Isn't the core of this idea whether to turn the planner into a framework? Personally, I think that under PostgreSQL's heap table storage, the optimizer might be better off focusing on optimizing the generation of execution plans. It’s possible that in some specific scenarios, developers might want to intervene in the generation of execution plans by extensions. I'm not sure if these scenarios usually occur when the storage structure is also extended by developers. If so, could existing solutions like "planner_hook" potentially solve the problem?

You could use planner_hook if you wanted to replace the entire planner
with your own planner. However, that doesn't seem like something
practical, as the planner code is very large. The real use of the hook
is to allow running some extra code when the planner is invoked, as
demonstrated by the pg_stat_statements contrib module. To get some
meaningful control over the planner, you need something more
fine-grained. You need to be able to run code at specific points in
the planner, as we already allow with, for example,
get_relation_info_hook or set_rel_pathlist_hook.

Whether or not that constitutes "turning the planner into a framework"
is, I suppose, a question of opinion. Perhaps a more positive way to
phrase it would be "allowing for some code reuse". Right now, if you
mostly like the behavior of the planner but want a few things to be
different, you've got to duplicate a lot of code and then hack it up.
That's not very nice. I think it's better to set things up so that you
can keep most of the planner behavior but override it in a few
specific cases without a lot of difficulty.

Cases where the data is stored in some different way are really a
separate issue from what I'm talking about here. In that case, you
don't want to override the planner behavior for all tables everywhere,
so planner_hook still isn't a good solution. You only want to change
the behavior for the specific table AM that implements the new
storage. You would probably want there to be an option where
cost_seqscan() calls a tableam-specific function instead of just doing
the same thing for every AM; and maybe something similar for indexes,
although that is less clear. The details aren't quite clear, which is
probably part of why we haven't done anything yet.

But this patch set is really more about enabling use cases where the
user wants an extension to take control of the plan more explicitly,
say to avoid some bad plan that they got before and that they don't
want to get again.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-08-27 16:15:41 Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Previous Message Joe Conway 2024-08-27 15:56:59 Re: allowing extensions to control planner behavior