Re: Perfomance bug in v10

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)postgrespro(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Perfomance bug in v10
Date: 2017-06-02 01:13:09
Message-ID: 14902.1496365989@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On 1 June 2017 at 04:16, Teodor Sigaev <teodor(at)postgrespro(dot)ru> wrote:
>> I found an example where v10 chooses extremely non-optimal plan:
>> ...

> This is all caused by get_variable_numdistinct() deciding that all
> values are distinct because ntuples < DEFAULT_NUM_DISTINCT.

Uh, no. I traced through this and found that with your hack in place,
final_cost_nestloop was costing the desired nestloop paths at less
than they were costed in HEAD. That makes no sense: surely, accounting
for the fact that the executor might stop early should never result in
a higher cost estimate than ignoring that possibility does. After
some navel-gazing I realized that there is an ancient oversight in
final_cost_nestloop's cost estimate for semi/anti-join cases. To wit,
in its attempt to ensure that it always charges inner_run_cost at least
once, it may end up charging that twice. Specifically, what we get in
this case is outer_path_rows = 1, outer_matched_rows = 0 (implying one
unmatched outer row) which will cause the existing logic to add both
inner_run_cost and inner_rescan_run_cost to the cost estimate, as if
we needed to run the inner plan twice not once. Correcting that, as in
the attached draft patch, fixes Teodor's example.

Now, this patch also causes some changes in the regression test outputs
that are a bit like your patch's side-effects, but on close inspection
I am not at all convinced that these changes are wrong. As an example,
looking at the first change without "costs off":

regression=# explain (verbose)
select * from j1
inner join (select distinct id from j3) j3 on j1.id = j3.id;
QUERY PLAN
---------------------------------------------------------------------------
Nested Loop (cost=1.03..2.12 rows=1 width=8)
Output: j1.id, j3.id
Inner Unique: true
Join Filter: (j1.id = j3.id)
-> Unique (cost=1.03..1.04 rows=1 width=4)
Output: j3.id
-> Sort (cost=1.03..1.03 rows=2 width=4)
Output: j3.id
Sort Key: j3.id
-> Seq Scan on public.j3 (cost=0.00..1.02 rows=2 width=4)
Output: j3.id
-> Seq Scan on public.j1 (cost=0.00..1.03 rows=3 width=4)
Output: j1.id
(13 rows)

Note that both sides of this join are known unique, so that we'd produce
an inner-unique-true join from either direction of joining. I don't think
it's so insane to put the larger rel on the inside, because with a size-1
rel on the inside, there is nothing to be gained from stop-early behavior.
Moreover, this way we don't need a Materialize node (since we're not
predicting the inner to be scanned more than once anyway). So the fact
that this way is estimated to be cheaper than the other way is not that
surprising after all. Yeah, it's a bit brittle in the face of the outer
rel producing more rows than we expect, but that's true of every nestloop
join plan we ever make. Fixing that is not a task for v10.

Teodor, could you check if this patch fixes your real-world problem?

regards, tom lane

Attachment Content-Type Size
bogus-nestloop-semi-join-cost.patch text/x-diff 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-06-02 01:13:54 Re: [JDBC] Channel binding support for SCRAM-SHA-256
Previous Message Stephen Frost 2017-06-02 01:08:26 Re: [HACKERS] Channel binding support for SCRAM-SHA-256