From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Performance improvement for joins where outer side is unique |
Date: | 2015-01-30 11:37:19 |
Message-ID: | CAApHDvod_uCMoUPovdpXbNkw50O14x3wwKoJmZLxkbBn71zdEg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 1 January 2015 at 02:47, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Hi,
>
> I've been hacking a bit at the join code again... This time I've been
> putting some effort into optimising the case where the inner side of the
> join is known to be unique.
> For example, given the tables:
>
> create table t1 (id int primary key);
> create table t2 (id int primary key);
>
> And query such as:
>
> select * from t1 left outer join t2 on t1.id=t2.id;
>
> It is possible to deduce at planning time that "for each row in the outer
> relation, only 0 or 1 rows can exist in the inner relation", (inner being
> t2)
>
I've been hacking at this unique join idea again and I've now got it
working for all join types -- Patch attached.
Here's how the performance is looking:
postgres=# create table t1 (id int primary key);
CREATE TABLE
postgres=# create table t2 (id int primary key);
CREATE TABLE
postgres=# insert into t1 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# insert into t2 select x.x from generate_series(1,1000000) x(x);
INSERT 0 1000000
postgres=# vacuum analyze;
VACUUM
postgres=# \q
Query: select count(t1.id) from t1 inner join t2 on t1.id=t2.id;
With Patch on master as of 32bf6ee
D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 78
latency average: 769.231 ms
tps = 1.288260 (including connections establishing)
tps = 1.288635 (excluding connections establishing)
Master as of 32bf6ee
D:\Postgres\install\bin>pgbench -f d:\unijoin3.sql -T 60 -n postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 70
latency average: 857.143 ms
tps = 1.158905 (including connections establishing)
tps = 1.159264 (excluding connections establishing)
That's a 10% performance increase.
I still need to perform more thorough benchmarking with different data
types.
One weird thing that I noticed before is that in an earlier revision of the
patch in the executor's join Initialise node code, I had set the
unique_inner to true for semi joins and replaced the SEMI_JOIN check for a
unique_join check in the execute node for each join method. With this the
performance results barely changed from standard... I've yet to find out
why.
The patch also has added a property to the EXPLAIN (VERBOSE) output which
states if the join was found to be unique or not.
The patch also still requires a final pass of comment fix-ups. I've just
plain run out of time for now.
I'll pick this up in 2 weeks time.
Regards
David Rowley
Attachment | Content-Type | Size |
---|---|---|
unijoin_2015-01-31_9af9cfa.patch | application/octet-stream | 55.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2015-01-30 12:06:45 | NULL checks of deferenced pointers in picksplit method of intarray |
Previous Message | Etsuro Fujita | 2015-01-30 10:20:52 | Odd behavior of updatable security barrier views on foreign tables |