From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Performance improvement for joins where outer side is unique |
Date: | 2017-01-31 00:10:02 |
Message-ID: | CAKJS1f8VF4aAASK51qmNjDXnahoJTUHqPes6EfJiA+KthmcqJQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28 January 2017 at 05:44, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>>> hmm. I'm having trouble understanding why this is a problem for Unique
>>> joins, but not for join removal?
>
>> Ah, you know what, that's just mistaken. I was thinking that we
>> short-circuited the join on the strength of the hash (or merge) quals
>> only, but actually we check all the joinquals first. As long as the
>> uniqueness proof uses only joinquals and not conditions that will end up
>> as otherquals, it's fine.
>
> Actually, after thinking about that some more, it seems to me that there
> is a performance (not correctness) issue here: suppose that we have
> something like
>
> select ... from t1 left join t2 on t1.x = t2.x and t1.y < t2.y
>
> If there's a unique index on t2.x, we'll be able to mark the join
> inner-unique. However, short-circuiting would only occur after
> finding a row that passes both joinquals. If the y condition is
> true for only a few rows, this would pretty nearly disable the
> optimization. Ideally we would short-circuit after testing the x
> condition only, but there's no provision for that.
I've attached a patch which implements this, though only for
MergeJoin, else I'd imagine we'd also need to ensure all proofs used
for testing the uniqueness were also hash-able too. I added some XXX
comments in analyzejoin.c around the mergeopfamilies == NIL tests to
mention that Merge Join depends on all the unique proof quals having
mergeopfamilies. This also assumes we'll never use some subset of
mergejoin-able quals for a merge join, which could be an interesting
area in the future, as we might have some btree index on a subset of
those columns to provide pre-sorted input. In short, it's a great
optimisation, but I'm a little scared we might break it one day.
Implementing this meant removing the match_first_row_only being set
for JOIN_SEMI, as this optimisation only applies to unique_inner and
not JOIN_SEMI alone. This also means we should be checking for unique
properties on JOIN_SEMI now too, which I've enabled.
Also, I've removed all jointype checks from innerrel_is_unique(). I
took a bit of time to think about JOIN_UNIQUE_OUTER and
JOIN_UNIQUE_INNER. I think these are fine too, in fact one of the
regression test plans moved away from using a Semi Join due to proving
that the inner side was unique. That could perhaps allow a better
plan, since the join order can be swapped. I'd really like someone
else to have a think about that too, just to make sure I've not
blundered that.
I've put the join removal code back to the way it was before. As I
mentioned, we can't use the caches here since we're also using
additional quals for proofs in this case.
I wasn't sure if I should add some regression tests which exercises
MergeJoin a bit to test the new optimisation. Any thoughts on that?
David
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
unique_joins_2017-01-31.patch | application/octet-stream | 86.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-01-31 00:13:40 | Re: Improvements in psql hooks for variables |
Previous Message | Tom Lane | 2017-01-30 23:54:50 | Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f) |