Re: allowing extensions to control planner behavior

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-10-14 10:02:40
Message-ID: CAKZiRmy2CCZCdoOBwEav5n0E3aUKy93vbOwuFBWs8G4_MDdgwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On Thu, Oct 10, 2024 at 6:52 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Sep 18, 2024 at 11:48 AM Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > Still, I think it's a pretty useful starting point. It is mostly
> > enough to give you control over join planning, and if combined with
> > similar work for scan planning, I think it would be enough for
> > pg_hint_plan. If we also got control over appendrel and agg planning,
> > then you could do a bunch more cool things.
>
> Here's a new set of patches where I added a similar mechanism for scan
> type control. See the commit message for some notes on limitations of
> this approach. In the absence of better ideas, I'd like to proceed
> with something along the lines of 0001 and 0002.
>
> I upgraded the hint_via_alias contrib module (which is not intended
> for commit, it's just a demo) so that it can hint either scan type or
> join type. I think this is sufficient to demonstrate that it's quite
> easy to use hooks to leverage this new infrastructure.

Thank You! I've played a little and IMHO this is a step in a good direction
after playing a tiny bit with 'hint_via_alias'.

> In fact, the
> biggest thing I'm unhappy about right now is the difficulty of
> providing the hooks with any sort of useful information. I don't think
> it should be the job of this patch set to solve that problem, but I do
> think we should do something about it. The problem exists on two
> levels:
>
> 1. If you want to specify in-query hints using comments, how does your
> extension get access to the comments? [..]Still, it's not clear what other
> approach you could
> adopt.
>

No, I don't think the ability to influence optimizers should be tied to SQL
comments as a "vehicle" to transfer some information, so -1 for going into
any discussion about it and wasting Your time on this. Rationale: as you
note such "interface" is quirky, and troublesome even for users. IMHO they
are mostly just used as a way to run experiments, but noone with sense of
touch with reality would ever embed query with hints in the comment section
inside the production application if had other choices, and that's for
multiple reasons (e.g. it's better to have control about it in the DB as
performance is function of time [and pgversion], ORM might be even unable
to do it in the 1st place, people do not have access to the source code).
You could just assume we have "SET
extension.influence_optimizer='SeqScan(t)'" and be done with it as far as
the production-troubleshooting goes. It's better because it's easier to use
(one does not need to even launch an editor to modify the query) during
those experiments. E.g. I would find it much faster to iterate in psql with
a loop of: SET + `\i query.sql` rather than often having dozens of KBs to
re-edit. And there's even a logon trigger today, so it could be (ab)used to
SET that GUC with just some specific conditions (e.g. only for specific
application_name and that could be even forced by e.g. pgjdbc driver --
jdbcurl?ApplicationName=enable_this_workaround_just_here).

Well the issue is however how do you bind such influence to just one
specific query without changing the app in the long run. My take is that we
should utilize compute_query_id (hash) and then extension should allow
doing something along of the lines of mapping (queryId <->
influence_optimizer), or even it could provide `ALTER SQL <queryId> SET
influence_optimizer='SeqScan(t)'`. Users could take that hash from the %Q
in the log_line_prefix.

2. If you want a certain base relation or join relation to be treated
> in a certain way, how do you identify it? You might think that this is
> easy because, even when a query contains multiple references to a
> relation with the same name, or identical aliases in different parts
> of the query, EXPLAIN renames them so they have disjoint names. What
> would be nice is if you could feed those names back into your
> extension and use them as a way of specifying what behavior you want
> where. But that doesn't work, because what actually happens is that
> the plan can contain duplicated aliases, and then when EXPLAIN
> deparses it using ruleutils.c, that's when we rename things so that
> they're distinct. This means that, at the time we're planning, we
> don't yet know what name EXPLAIN will end up giving to any relation
> yet, which means we can't use the names that EXPLAIN produced for an
> earlier plan for the same query to associate behaviors with relations.
> I wonder if we could think about reversing the order of operations
> here and making it so that we do the distinct-ification during parse
> analysis or maybe early in planning, so that the name you see EXPLAIN
> print out is the real name of that thing and not just a value invented
> for display purposes.
>

This if that's possible?, or simply some counter and numbering the plan
operation? or Andrei's response/idea of using hashing??

So again, I am definitely not saying that these patches get us all the
> way to where we should be -- not in terms of the ability to control
> the plan, and definitely not in terms of giving extensions all the
> information they need to be effective. But if we insist on a perfect
> solution before doing anything, we will never get anywhere, and I
> personally believe these are going in a useful direction.
>

Even as it stands today, the v4-0002 would be better to have than nothing
(well other than pg_hint_plan), as the it looks to me that the most
frequent workaround for optimizer issues is to just throw 'enable_nestloop
= no' into the mix quite often (so having the ability to just throw
fixproblem.so into session_preload_libraries with just strstr()/regex() -
to match on specific query - and disable it just there seems to be
completely achievable and has much better granularity when targeting whole
sessions with SET issued there).

-Jakub Wartak.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-10-14 10:05:18 Re: Missing deconstruct_array_builtin usage
Previous Message Nitin Motiani 2024-10-14 09:45:43 Re: Inval reliability, especially for inplace updates