Re: disfavoring unparameterized nested loops

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-22 09:53:26
Message-ID: d0bbfcff-78f0-4381-cadf-dcc5293a05e6@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/22/21 2:25 AM, Peter Geoghegan wrote:
> On Mon, Jun 21, 2021 at 4:51 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> There were a lot of interesting ideas in this thread and I want to
>> analyze some of them. First, there is the common assumption (not
>> stated) that over-estimating by 5% and underestimating by 5% cause the
>> same harm, which is clearly false. If I go to a restaurant and estimate
>> the bill to be 5% higher or %5 lower, assuming I have sufficient funds,
>> under or over estimating is probably fine. If I am driving and
>> under-estimate the traction of my tires, I am probably fine, but if I
>> over-estimate their traction by 5%, I might crash.
>
> My favorite analogy is the home insurance one:
>
> It might make sense to buy home insurance because losing one's home
> (say through fire) is a loss that usually just cannot be tolerated --
> you are literally ruined. We can debate how likely it is to happen,
> but in the end it's not so unlikely that it can't be ruled out. At the
> same time I may be completely unwilling to buy insurance for personal
> electronic devices. I can afford to replace all of them if I truly
> have to. And the chances of all of them breaking or being stolen on
> the same day is remote (unless my home burns down!). If I drop my cell
> phone and crack the screen, I'll be annoyed, but it's certainly not
> the end of the world.
>
> This behavior will make perfect sense to most people. But it doesn't
> scale up or down. I have quite a few electronic devices, but only a
> single home, so technically I'm taking risks way more often than I am
> playing it safe here. Am I risk tolerant when it comes to insurance?
> Conservative?
>
> I myself don't think that it is sensible to apply either term here.
> It's easier to just look at the specifics. A home is a pretty
> important thing to almost everybody; we can afford to treat it as a
> special case.
>
>> If that is accurate, I think the big question is how common are cases
>> where the outer side can't be proven to have zero or one row and nested
>> loops are enough of a win to risk its greater sensitivity to
>> misestimation. If it is uncommon, seems we could just code the
>> optimizer to use hash joins in those cases without a user-visible knob,
>> beyond the knob that already turns off nested loop joins.
>
> I think it's possible that Robert's proposal will lead to very
> slightly slower plans in the vast majority of cases that are affected,
> while still being a very good idea. Why should insurance be 100% free,
> though? Maybe it can be in some cases where we get lucky, but why
> should that be the starting point? It just has to be very cheap
> relative to what we do today for us to come out ahead, certainly, but
> that seems quite possible in at least this case.
>

Yeah, I like the insurance analogy - it gets to the crux of the problem,
because insurance is pretty much exactly about managing risk. But making
everything slower will be a hard sell, because wast majority of
workloads already running on Postgres don't have this issue at all, so
for them it's not worth the expense. Following the insurance analogy,
selling tornado insurance in Europe is mostly pointless.

Insurance is also about personal preference / risk tolerance. Maybe I'm
fine with accepting risk that my house burns down, or whatever ...

And the lack of data also plays role - the insurance company will ask
for higher rates when it does not have enough accurate data about the
phenomenon, or when there's a lot of unknowns. Maybe this would allow
some basic measure of uncertainty, based on the number and type of
restrictions, joins, etc. The more restrictions we have, the less
certain the estimates are. Some conditions are estimated less
accurately, and using default estimates makes it much less accurate.

So maybe some fairly rough measure of uncertainty might work, and the
user might specify how much risk it's willing to tolerate.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-06-22 10:01:45 Re: subscription/t/010_truncate.pl failure on desmoxytes in REL_13_STABLE
Previous Message Simon Riggs 2021-06-22 09:00:51 Re: Doc chapter for Hash Indexes