From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Performance improvement for joins where outer side is unique |
Date: | 2015-10-13 21:28:24 |
Message-ID: | CAKJS1f-t3s5NN-PxNCWoMviOwOFFeNQMefHcQP28iDEqnSGV1g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4 September 2015 at 04:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Also: very nice performance results.
>
>
Thanks.
On following a thread in [General] [1] it occurred to me that this patch
can give a massive improvement on Merge joins where the mark and restore
causes an index scan to have to skip over many filtered rows again and
again.
I mocked up some tables and some data from the scenario on the [General]
thread:
create table customers (id bigint, group_id bigint not null);
insert into customers select x.x,x.x%27724+1 from generate_series(1,473733)
x(x);
alter table customers add constraint customer_pkey primary key (id);
create table balances (id bigint, balance int not null, tracking_number int
not null, customer_id bigint not null);
insert into balances select x.x, 100, 12345, x.x % 45 + 1 from
generate_Series(1,16876) x(x);
create index balance_customer_id_index on balances (customer_id);
create index balances_customer_tracking_number_index on balances
(customer_id,tracking_number);
analyze;
Unpatched I get:
test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id
= ac.customer_id WHERE o.group_id = 45;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual
time=6.110..1491.408 rows=375 loops=1)
Merge Cond: (ac.customer_id = o.id)
-> Index Scan using balance_customer_id_index on balances ac
(cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..5.206
rows=16876 loops=1)
-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75
rows=17 width=8) (actual time=0.014..1484.382 rows=376 loops=1)
Filter: (group_id = 45)
Rows Removed by Filter: 10396168
Planning time: 0.207 ms
Execution time: 1491.469 ms
(8 rows)
Patched:
test=# explain analyze SELECT ac.* FROM balances ac join customers o ON o.id
= ac.customer_id WHERE o.group_id = 45;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=164.87..1868.70 rows=1 width=24) (actual
time=6.037..11.528 rows=375 loops=1)
Merge Cond: (ac.customer_id = o.id)
-> Index Scan using balance_customer_id_index on balances ac
(cost=0.29..881.24 rows=16876 width=24) (actual time=0.009..4.978
rows=16876 loops=1)
-> Index Scan using customer_pkey on customers o (cost=0.42..16062.75
rows=17 width=8) (actual time=0.015..5.141 rows=2 loops=1)
Filter: (group_id = 45)
Rows Removed by Filter: 27766
Planning time: 0.204 ms
Execution time: 11.575 ms
(8 rows)
Now it could well be that the merge join costs need a bit more work to
avoid a merge join in this case, but as it stands as of today, this is your
performance gain.
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-10-13 21:59:56 | Re: Parallel Seq Scan |
Previous Message | Alvaro Herrera | 2015-10-13 21:14:19 | Re: Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files |