Re: nested loop semijoin estimates

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Mark Wong <markwkm(at)gmail(dot)com>
Subject: Re: nested loop semijoin estimates
Date: 2015-06-01 21:47:47
Message-ID: 556CD303.5080507@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/01/2015 02:18 PM, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> On 06/01/15 00:08, Tom Lane wrote:
>>> Attached is an incremental patch (on top of the previous one) to
>>> allow startup cost of parameterized paths to be considered when the
>>> relation is the RHS of a semi or anti join. It seems reasonably clean
>>> except for one thing: logically, we perhaps should remove the checks
>>> on path->param_info from the last half of
>>> compare_path_costs_fuzzily(), so as to increase the symmetry between
>>> parameterized paths and unparameterized ones. However, when I did
>>> that, I got changes in some of the regression test plans, and they
>>> didn't seem to be for the better. So I left that alone. As-is, this
>>> patch doesn't seem to affect the results for any existing regression
>>> tests.
>
>> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't
>> that simply caused by very small data sets?
>
> No, I don't think the size of the data set is the issue. The point is
> that previously, if we had two parameterized paths of fuzzily the same
> total cost, we'd choose which to keep based on factors other than startup
> cost. If we change that part of compare_path_costs_fuzzily(), we'll be
> changing the behavior in cases that are unrelated to the problem we are
> trying to solve, because same-total-cost isn't going to apply to the
> bitmap indexscan vs. plain indexscan cases we're worried about here.
> (The places where it tends to happen are things like hash left join
> vs hash right join with the inputs swapped.) I don't want to change such
> cases in a patch that's meant to be back-patched; people don't like plan
> instability in stable branches.
>
> It's also not quite clear what the new rule ought to be. Rather than
> treating parameterized paths exactly like regular ones, maybe we should
> consider their startup cost only if consider_param_startup is true.
> It remains the case that in most scenarios for parameterized paths,
> startup cost really isn't that interesting and shouldn't drive choices.
>
> Another practical problem is that the regression tests that are being
> affected are specifically meant to test particular scenarios in
> construction of nested-loop join plans. If the planner stops selecting
> a nestloop plan there, the test is effectively broken. We could probably
> rejigger the test queries to restore the intended test scenario, but that
> wasn't an exercise I was in a hurry to undertake.
>
> Anyway, bottom line is that if we want to change that, it ought to be
> a separate HEAD-only patch.
>
>>>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>>>> behavior change so that no back-patching, right?
>
>>> Mumble. It's definitely a planner bug fix, and I believe that the
>>> effects are narrowly constrained to cases where significantly better
>>> plans are possible. So personally I'd be willing to back-patch it.
>>> But others might see that differently, especially since it's been
>>> like this for a long time and we only have one field complaint.
>
>> +1 to back-patching from me. It's true we only have one field complaint,
>> but I believe there are more users impacted by this. They just didn't
>> notice - we only really got this complaint because of the plan
>> discrepancy between queries that are almost exactly the same.
>
> Perhaps. It seems like in many cases the planner would accidentally
> choose the "right" plan anyway, because even though it was overestimating
> the cost, the other alternatives would usually look even worse. So it's
> hard to tell how many users will be benefited. But I'm fairly sure the
> patch as posted is narrow enough that very few people would be negatively
> affected.
>
> Anybody else want to speak for or against back-patching the patch as
> posted? I intentionally didn't push it in before today's releases,
> but I will push it later this week if there are not objections.

I would like Mark Wong to test this on DBT3, if that's possible for him.
I'm very worried about an unanticipated regression.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2015-06-01 21:57:27 Re: pg_xlog -> pg_xjournal?
Previous Message Alvaro Herrera 2015-06-01 21:30:14 Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1