Re: allowing extensions to control planner behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allowing extensions to control planner behavior
Date: 2024-08-28 14:57:38
Message-ID: CA+Tgmoa5TiS2pNKXp2VOTuAWCMabPdpQxRr-wudqjwX1s9RBTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 28, 2024 at 9:46 AM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> As a somewhat tiny culprit of the self-flaming done by Robert (due to
> nagging him about this in the past on various occasions), I'm of
> course obligated to +1 to any work related to giving end-users/DBA the
> ability to cage the plans generated by the optimizer.

Thanks.

> When dealing with issues like those, I have a feeling we have 2
> classes of most frequent issues being reported (that's my subjective
> experience):
> a. cardinality misestimate leading usually to nest loop plans (e.g.
> JOIN estimates thread [1] could also somehow help and it also has nice
> reproducers)
> b. issues after upgrades
>
> So the "selectivity estimation hook(s)" mentioned by Andrei seems to
> be a must, but also the ability not to just guess & tweak (shape) the
> plan, but a way to export all SQL plans before upgrade with capability
> to import and override(lock) specific SQL query to specific plan from
> before upgrade.

I'm not against some kind of selectivity estimation hook in principle,
but I don't know what the proposal is specifically, and I think it's
separate from the topic of this thread. On the other hand, being able
to force the same plans after an upgrade that you were getting before
an upgrade is the kind of thing that I'd like to enable with the
infrastructure proposed here. I do not propose to put something like
that in core, at least not any time soon, but I'd like to have the
infrastructure be good enough that people can try to do it in an
extension module and learn from how it turns out.

Ever since I read
https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
I have believed that the cardinality misestimate leading to nested
loop plans is just because we're doing something dumb. They write:

"When looking at the queries that did not finish in a reasonable time
using the estimates, we found that most have one thing in common:
PostgreSQL’s optimizer decides to introduce a nestedloop join (without
an index lookup) because of a very low cardinality estimate, whereas
in reality the true cardinality is larger. As we saw in the previous
section, systematic underestimation happens very frequently, which
occasionally results in the introduction of nested-loop joins. [...]
if the cost estimate is 1,000,000 with the nested-loop join algorithm
and 1,000,001 with a hash join, PostgreSQL will always prefer the
nested-loop algorithm even if there is a equality join predicate,
which allows one to use hashing. [...] given the fact that
underestimates are quite frequent, this decision is extremely risky.
And even if the estimates happen to be correct, any potential
performance advantage of a nested-loop join in comparison with a hash
join is very small, so taking this high risk can only result in a very
small payoff. Therefore, we disabled nested-loop joins (but not
index-nestedloop joins) in all following experiments."

We don't even have an option to turn off that kind of join, and we
could probably avoid a lot of pain if we did. This, too, is mostly
separate from the topic of this thread, but I just can't believe we've
chosen to do literally nothing about this given that we all know this
specific thing hoses everybody, everywhere, all the time.

> I'm not into the internals of optimizer at all, but here are other
> random thoughts/questions:
> - I do think that "hints" words have bad connotations and should not
> be used. It might be because of embedding them in SQL query text of
> the application itself. On one front they are localized to the SQL
> (good), but the PG operator has no realistic way of altering that once
> it's embedded in binary (bad), as the application team is usually
> separate if not from an external company (very bad situation, but
> happens almost always).

I haven't quite figured out whether the problem is that hints are
actually bad or whether it's more that we just hate saying the word
hints. The reason I'm talking about hints here is mostly because
that's how other systems let users control the query planner. If we
want to let extensions control the query planner, we need to know in
what ways it needs to be controllable, and looking to hints in other
systems is one way to understand what would be useful. As far as
having hints in PostgreSQL, which admittedly is not really the topic
of this thread either, one objection is that we should just make the
query planner instead, and I used to believe that, but I no longer do,
because I've been doing this PostgreSQL thing for 15+ years and we're
not much closer to a perfect query planner that never makes any
mistakes than we were when I started. It's not really clear that
perfection is possible, but it's extremely clear that we're not
getting there any time soon. Another objection to hints is that they
require modifying the query text, which does indeed suck but it
doesn't mean they're useless either. There are also schemes that put
them out of line, including pg_hint_plan's optional use of a hint
table. Yet another objection is that you should fix cardinalities
instead of controlling the plan manually, and I agree that's often a
better solution, but it again does not mean that using a hint is never
defensible in any situation. I think we've become so negative about
hints that we rarely have a rational discussion about them. I'm no
more keen to see every PostgreSQL query in the universe decorated with
a bunch of hints than anyone else here, but I also don't enjoy telling
a customer "hey, I know this query started misbehaving in the middle
of the night on Christmas, but hints are bad and we shouldn't ever
have them so you'd better get started on redesigning your schema or
alternatively you can just have your web site be down for the next 20
years while we try to improve the optimizer." I don't know what the
right solution(s) are exactly but it's insane not to have some kind of
pressure relief valve that can be used in case of emergency.

> - Would stacking of such extensions, each overriding the same planner
> hooks, be allowed or not in the long run ? Technically there's nothing
> preventing it and I think I could imagine someone attempting to run
> multiple planner hook hotfixes for multiple issues, all at once?

I suspect this would tend to work poorly in practice, but there might
be specific cases where it works OK. It's usually best if only one
person is steering a given vehicle at a time, and so here. But there's
no intrinsic reason you couldn't use multiple extensions at once if
you happen to have multiple extensions that use the hooks in mutually
compatible ways.

> - Shouldn't EXPLAIN contain additional information that for that
> specific plan the optimizer hooks changed at least 1 thing ? (e.g.
> "Plan was tainted" or something like that). Maybe extension could mark
> it politely that it did by setting a certain flag, or maybe there
> should be routines exposed by the core to do that ?

This could be really useful when things go wrong and someone is trying
to figure out from an EXPLAIN ANALYZE output what in the world
happened. I'm not sure exactly what makes sense to do here but I think
we should come back to this topic after we've settled some of the
basics.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2024-08-28 14:58:16 Minor refactor: Use more consistent names for the labels of PG_Locale_Strategy
Previous Message Andrei Lepikhov 2024-08-28 14:26:22 Remove unnecessary check on set-returning functions in values_lists