From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: disfavoring unparameterized nested loops |
Date: | 2021-06-21 15:39:24 |
Message-ID: | CAH2-Wzkb+9B=F4PQyGXEf91QVT_=EcW9ZvdqKGcbeskFj4ra+g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 21, 2021 at 7:45 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jun 21, 2021 at 6:41 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > For example, in an unparameterized Nested Loop that estimates the
> > outer Path to have 1 row will cost an entire additional inner cost if
> > there are 2 rows. With Hash Join the cost is just an additional
> > hashtable lookup, which is dead cheap. I don't know exactly how
> > add_path() would weigh all that up, but it seems to me that I wouldn't
> > take the risk unless I was 100% certain that the Nested Loop's outer
> > Path would only return 1 row exactly, if there was any chance at all
> > it could return more, I'd be picking some other join method.
>
> It seems like everyone agrees that it would be good to do something
> about this problem, but the question is whether it's best to do
> something that tries to be general, or whether we should instead do
> something about this specific case. I favor the latter approach.
I agree with your conclusion, but FWIW I am sympathetic to David's
view too. I certainly understand why he'd naturally want to define the
class of problems that are like this one, to understand what the
boundaries are.
The heuristic that has the optimizer flat out avoids an
unparameterized nested loop join is justified by the belief that
that's fundamentally reckless. Even though we all agree on that much,
I don't know when it stops being reckless and starts being "too risky
for me, but not fundamentally reckless". I think that that's worth
living with, but it isn't very satisfying.
> Risk
> and uncertainty exist all over the place, but dealing with that in a
> general way seems difficult, and maybe unnecessary. Addressing the
> case of unparameterized nest loops specifically seems simpler, because
> it's easier to reason about what the alternatives are. Your last
> sentence here seems right on point to me.
Right. Part of why this is a good idea is that the user is exposed to
so many individual risks and uncertainties. We cannot see any one risk
as existing in a vacuum. It is not the only risk the user will ever
take in the planner -- if it was then it might actually be okay to
allow unparameterized nested loop joins.
A bad unparameterized nested loop join plan has, in a sense, unknown
and unbounded cost/downside. But it is only very slightly faster than
a hash join, by a fixed well understood amount. With enough "trials"
and on a long enough timeline, it will inevitably blow up and cause
the application to grind to a halt. It seems like no amount of fixed,
bounded benefit from "fast unparameterized nested loop joins" could
possibly make up for that. The life of Postgres users would be a lot
better if bad plans were at least "survivable events".
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-06-21 15:39:42 | Re: Add version macro to libpq-fe.h |
Previous Message | Tom Lane | 2021-06-21 15:34:05 | Re: Maintaining a list of pgindent commits for "git blame" to ignore |