Re: Reference to parent query from ANY sublink

From: Antonin Houska <antonin(dot)houska(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reference to parent query from ANY sublink
Date: 2013-12-11 23:31:37
Message-ID: 52A8F5D9.4090801@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/11/2013 10:15 PM, Tom Lane wrote:
>
> FWIW, that plan isn't obviously wrong; if it is broken, most likely the
> reason is that the HashAggregate is incorrectly unique-ifying the lower
> table. (Unfortunately, EXPLAIN doesn't show enough about the HashAgg
> to know what it's doing exactly.) The given query is, I think, in
> principle equivalent to
>
> SELECT ...
> FROM SUBSELECT_TBL upper
> WHERE (f1, f2::float) IN
> (SELECT f2, f3 FROM SUBSELECT_TBL);
>
> and if you ask unmodified HEAD to plan that you get
>
> Hash Join (cost=41.55..84.83 rows=442 width=16)
> Hash Cond: ((upper.f1 = subselect_tbl.f2) AND ((upper.f2)::double precision = subselect_tbl.f3))
> -> Seq Scan on subselect_tbl upper (cost=0.00..27.70 rows=1770 width=16)
> -> Hash (cost=38.55..38.55 rows=200 width=12)
> -> HashAggregate (cost=36.55..38.55 rows=200 width=12)
> -> Seq Scan on subselect_tbl (cost=0.00..27.70 rows=1770 width=12)

Before I opened your mail, I also recalled the technique that I noticed
in the planner code, to evaluate SEMI JOIN as INNER JOIN with the RHS
uniquified, so also thought it could be about the uniquification.

> which is the same thing at the visible level of detail ... but this
> version computes the correct result. The cost of the HashAggregate is
> estimated higher, though, which suggests that maybe it's distinct'ing on
> two columns where the bogus plan only does one.

debug_print_plan output contains
:grpColIdx 2
in the AGG node. I think this corresponds to the join condition, which
IMO should be
(upper.f1 = subselect_tbl.f2)
while the other condition was not in the list of join clauses and
therefore ignored for the uniquification's purpose.

And gdb tells me that create_unique_path() never gets more than 1
clause. I can't tell whether it should do just for this special purpose.

> Not sure about where Antonin's patch is going off the rails. I suspect
> it's too simple somehow, but it's also possible that it's OK and the
> real issue is some previously undetected bug in LATERAL processing.

So far I have no idea how to achieve such conditions without this patch.
Thanks for your comments.

// Antonin Houska (Tony)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-12-11 23:49:18 Re: Reference to parent query from ANY sublink
Previous Message Tom Lane 2013-12-11 23:13:23 Re: Reference to parent query from ANY sublink