From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Adam Brusselback <adambrusselback(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Discussion on missing optimizations |
Date: | 2017-10-11 15:50:21 |
Message-ID: | CA+TgmoZfUUuTk+AdfD-cwW-b8dOLRPsvbg6Kqj4QGGT9qBS4iQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 6, 2017 at 10:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 9. Unneeded Self JOIN
>
>> Can't remember discussions of this.
>
> I can't get very excited about that one either.
My memories of being a PostgreSQL user rather than a developer are
getting a bit distant by now, but I definitely remember hitting this
problem especially in cases involving DELETE FROM and UPDATE USING.
You can't specify a left join between the result relation and some
other table directly, so you end up having to do DELETE FROM x USING x
LEFT JOIN y ... which then has lousy performance.
I think you could hit it in other cases, too, e.g. a join between two
views which share a common base table. Of course maybe you wouldn't
have two views in the first place if join removal worked better than
it currently does, but without, at least, inner join removal you can't
really rely on the query planner to prune things down to what's really
needed.
> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods. If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.
I think what you're failing to consider is that the author of the
article is a user. When he says these optimizations are cool, he
means that they would benefit his use cases (and presumably that he's
willing to pay some number of additional cycles to get them).
We haven't really done a very good job figuring out what to do about
optimizations that some people (mostly you) think are marginal. It's
obviously true that we can't just burn infinite planner cycles on
things that don't usually help, but at the same time, we can't just
keep ignoring requests by users for the same features and saying
"you'll be sad if we give this to you". Those people don't give up on
wanting the optimization; they just don't use PostgreSQL. I think we
need to stop just saying "no" and start thinking about what we could
do that would let us say "yes".
One trick that some system use is avoid replanning as much as we do
by, for example, saving plans in a shared cache and reusing them even
in other sessions. That's hard to do in our architecture because the
controlling GUCs can be different in every session and there's not
even any explicit labeling of which GUCs control planner behavior. But
if you had it, then extra planning cycles would be, perhaps, more
tolerable.
Another trick is to plan harder when the cost or complexity of the
query exceeds some threshold, but that's hard to do because we don't
really know what the cost or complexity is until after we get done
planning, by which time it's too late to (cheaply) go back and apply
those optimizations. This problem of not really knowing whether we've
got an expensive query is also a problem in terms of being smart about
how many parallel workers to pick or whether to consider parallelism
at all; we'd like to consider more options for expensive queries than
cheap ones. But there's not really any good way to do this as things
stand today.
Maybe one way to attack this problem is to make it more general:
what's the next big thing for the planner? It wasn't altogether clear
to me that the path->plan conversion was going to be a huge
improvement, but it definitely has been. Nothing really got much
faster as a direct result of that work, but it made the code simpler
and more understandable and thus easier to enhance, unblocking
improvements like postgres_fdw aggregate pushdown. I think we're a
long way from reaching the end of the benefits of that commit - there
is a lot more that can usefully be done. But, what's next? You told
me a few years back when I was thinking about what to do next that
there was still a lot of gains to be had from improving the planner,
and I wasn't sure I quite believed it. But it's becoming more clear
to me that this is true, and that the kinds of things this article is
talking about are one approach. I don't really have the answers here,
but I don't think our planner is anywhere close to as good as it can
be, even if in certain respects it is stuck at some kind of local
maximum.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2017-10-11 15:51:27 | Re: Predicate Locks for writes? |
Previous Message | Andres Freund | 2017-10-11 15:49:11 | Re: pgsql: Add port/strnlen support to libpq and ecpg Makefiles. |