Re: nested loop semijoin estimates

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Wong <markwkm(at)gmail(dot)com>
Subject: Re: nested loop semijoin estimates
Date: 2015-06-02 19:54:58
Message-ID: 3847.1433274898@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> So, if required_outer=false and both p->_startup=false, we get
> consider_startup=false irrespectedly of the required_outer value, so
> (!consider_startupe) != required_outer
> so that the conditions are not equivalent. And indeed, by reverting the
> if condition to the previous form, we get the same plans as on master.

Ah, I see we arrived at the same conclusions independently.

> I don't know whether this is correct behavior or not, but in all three
> cases I've observed on TPC-DS, the new plans performed better

Hm. In principle, add_path_precheck shouldn't be doing anything except
rejecting paths that would get rejected anyway later on. However it
strikes me that there's a logic error in it; specifically, this
simplifying assumption is wrong:

* For speed, we make exact rather than fuzzy cost comparisons. If an
* old path dominates the new path exactly on both costs, it will
* surely do so fuzzily.

The old path's cost might be fuzzily the same, not fuzzily better. This
is less likely to be an issue for total cost (since even if they're
fuzzily the same at this point, they probably won't be after we get done
adding more components to the new path's cost estimate). But it could
well be an issue for startup cost since that's often exactly zero. So
I guess this must be rejecting or letting through a slightly different set
of non-parameterized paths than it did before. I'm not prepared to assume
the results are always superior, because we haven't fixed the actual logic
oversight.

> 1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like
> this? I mean, if it's a parametric query, use any of the flags?

No. The paths are parametric, not the query as a whole.

> 2) Is it really possible to get different values for the startup
> flags on path1 and path2 in compare_path_costs_fuzzily()?

If we were comparing a parameterized and an unparameterized path,
it might be that one's startup cost is of interest while the other
one's isn't. This is written to let a path win on startup cost
as long as its startup cost is actually of interest.

> If not
> (if I get it right, path1->parent == path2->parent anyway), then
> maybe passing that as a function parameter (just like before)
> would be better. The macro probably won't be as nice, but I find
> it easier to understand.

Well, we could pass the parent RelOptInfo explicitly instead of passing
the two flags, but I don't find that to be an improvement particularly.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-06-02 20:19:25 Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Tomas Vondra 2015-06-02 19:44:15 Re: nested loop semijoin estimates