From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Performance improvement for joins where outer side is unique |
Date: | 2017-04-07 01:17:52 |
Message-ID: | CAKJS1f9O+gFk-uNuvY6-kvu-40iHwYc_zH-1sNs3XouxUhX-CA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 7 April 2017 at 11:47, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> On 7 April 2017 at 07:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm looking through this, and I'm failing to see where it deals with
>>> the problem we discussed last time, namely that you can't apply the
>>> optimization unless all clauses that were used in the uniqueness
>>> proof are included in the join's merge/hash conditions + joinquals.
>
>> The code in question is:
>> mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual &&
>> mergestate->js.first_inner_tuple_only;
>
> Uh, AFAICS that only protects the skip-mark-and-restore logic.
> What I'm on about is that you can't do the early advance to the
> next outer tuple unless you're sure that all the quals that were
> relevant to the uniqueness proof have been checked for the current
> inner tuple. That affects all three join types not only merge.
Well, look at the join code and you'll see this only happens after the
joinqual is evaulated. I didn't make a special effort here. I just
borrowed the location that JOIN_SEMI was already using.
For example, from hash join:
if (joinqual == NULL || ExecQual(joinqual, econtext))
{
node->hj_MatchedOuter = true;
HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
/* In an antijoin, we never return a matched tuple */
if (node->js.jointype == JOIN_ANTI)
{
node->hj_JoinState = HJ_NEED_NEW_OUTER;
continue;
}
/*
* Skip to the next outer tuple if we only need to join to
* the first inner matching tuple.
*/
if (node->js.first_inner_tuple_only)
node->hj_JoinState = HJ_NEED_NEW_OUTER;
Note the first line and the final two lines.
Here's the one from nested loop:
if (ExecQual(joinqual, econtext))
{
node->nl_MatchedOuter = true;
/* In an antijoin, we never return a matched tuple */
if (node->js.jointype == JOIN_ANTI)
{
node->nl_NeedNewOuter = true;
continue; /* return to top of loop */
}
/*
* Skip to the next outer tuple if we only need to join to the
* first inner matching tuple.
*/
if (node->js.first_inner_tuple_only)
node->nl_NeedNewOuter = true;
Again, note the first line and final 2 lines.
> The case that would be relevant to this is, eg,
>
> create table t1 (f1 int, f2 int, primary key(f1,f2));
>
> select * from t_outer left join t1 on (t_outer.f1 = t1.f1)
> where t_outer.f2 = t2.f2;
hmm, that query is not valid, unless you have created some table named
t_outer. I don't know how you've defined that. So I guess you must
have meant:
postgres=# explain verbose select * from t1 t_outer left join t1 on
(t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2;
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=66.50..133.57 rows=128 width=16)
Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2
Inner Unique: Yes
Hash Cond: ((t_outer.f1 = t1.f1) AND (t_outer.f2 = t1.f2))
-> Seq Scan on public.t1 t_outer (cost=0.00..32.60 rows=2260 width=8)
Output: t_outer.f1, t_outer.f2
-> Hash (cost=32.60..32.60 rows=2260 width=8)
Output: t1.f1, t1.f2
-> Seq Scan on public.t1 (cost=0.00..32.60 rows=2260 width=8)
Output: t1.f1, t1.f2
(10 rows)
Which did become an INNER JOIN due to the strict W
If you'd had done:
postgres=# explain verbose select * from t1 t_outer left join t1 on
(t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2 or t1.f1 is null;
QUERY PLAN
------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.31..608.67 rows=255 width=16)
Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2
Inner Unique: No
Merge Cond: (t_outer.f1 = t1.f1)
Filter: ((t_outer.f2 = t1.f2) OR (t1.f1 IS NULL))
-> Index Only Scan using t1_pkey on public.t1 t_outer
(cost=0.16..78.06 rows=2260 width=8)
Output: t_outer.f1, t_outer.f2
-> Materialize (cost=0.16..83.71 rows=2260 width=8)
Output: t1.f1, t1.f2
-> Index Only Scan using t1_pkey on public.t1
(cost=0.16..78.06 rows=2260 width=8)
Output: t1.f1, t1.f2
(11 rows)
You'll notice that "Inner Unique: No"
> Your existing patch would think t1 is unique-inner, but the qual pushed
> down from WHERE would not be a joinqual so the wrong thing would happen
> at runtime.
>
> (Hm ... actually, this example wouldn't fail as written because
> the WHERE qual is probably strict, so the left join would get
> reduced to an inner join and then pushed-down-ness no longer
> matters. But hopefully you get my drift.)
Oh yeah. I get it, but that's why we ignore !can_join clauses
/* Ignore if it's not a mergejoinable clause */
if (!restrictinfo->can_join ||
restrictinfo->mergeopfamilies == NIL)
continue; /* not mergejoinable */
no?
>> I don't really think the List idea would be nearly as efficient.
>
> No, what I'm saying is that each RelOptInfo would contain a single List of
> Relids of proven-unique-for outer rels (and another one for the negative
> cache). No array, no more searching than you have now, just removal of an
> uncertainly-safe array fetch.
That's way cleaner. Thanks. I've changed it that way. Feeling silly
now for having done it the original way.
Updated patch is attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
unique_joins_2017-04-07a.patch | application/octet-stream | 95.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Vaishnavi Prabakaran | 2017-04-07 01:25:55 | Question about one of the old Autonomous Transaction approach |
Previous Message | Peter Eisentraut | 2017-04-07 01:17:32 | Re: Time to change pg_regress diffs to unified by default? |