From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: disfavoring unparameterized nested loops |
Date: | 2021-06-21 15:55:48 |
Message-ID: | 1648206.1624290948@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> The heuristic that has the optimizer flat out avoids an
> unparameterized nested loop join is justified by the belief that
> that's fundamentally reckless. Even though we all agree on that much,
> I don't know when it stops being reckless and starts being "too risky
> for me, but not fundamentally reckless". I think that that's worth
> living with, but it isn't very satisfying.
There are certainly cases where the optimizer can prove (in principle;
it doesn't do so today) that a plan node will produce at most one row.
They're hardly uncommon either: an equality comparison on a unique
key, or a subquery with a simple aggregate function, come to mind.
In such cases, not only is this choice not reckless, but it's provably
superior to a hash join. So in the end this gets back to the planning
risk factor that we keep circling around but nobody quite wants to
tackle.
I'd be a lot happier if this proposal were couched around some sort
of estimate of the risk of the outer side producing more than the
expected number of rows. The arguments so far seem like fairly lame
rationalizations for not putting forth the effort to do that.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2021-06-21 16:19:13 | Re: Add version macro to libpq-fe.h |
Previous Message | Tom Lane | 2021-06-21 15:43:37 | Re: Use simplehash.h instead of dynahash in SMgr |