From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com |
Subject: | Re: Get more from indices. |
Date: | 2014-01-08 04:22:57 |
Message-ID: | 20140108.132257.201466486.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
tgl> I'm pretty sure that IsA test prevents the optimization from ever firing.
Thank you.
tgl> But aside from hasty typos,
Oops! I've picked up wrong node. It always denies pathkeys extension.
| !IsA(member, Var))
is a mistake of the following.
| !IsA(member->em_expr, Var))
tgl> is there enough left of this optimization to
tgl> be worth the complication?
Certainly.
This patch is intended to be with my another UNION-ALL
optimization pathce at first. It does very much with
UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned
tables)-DISTINCT-ORDERBY-LIMIT.
===== test tables
create table pu1 (a int not null, b int not null, c int, d text);
create unique index i_pu1_ab on pu1 (a, b);
create table cu11 (like pu1 including all) inherits (pu1);
create table cu12 (like pu1 including all) inherits (pu1);
create table pu2 (like pu1 including all);
create table cu21 (like pu2 including all) inherits (pu2);
create table cu22 (like pu2 including all) inherits (pu2);
insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
====
Following example is an implicit union derived from partitioned
tables created as above. Limit is added to highlight the effect.
9.4dev with no patch,
| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
| QUERY PLAN
| ----------------------------------------------------------------------------
| Limit (actual time=246.653..246.730 rows=100 loops=1)
| -> Unique (actual time=246.646..246.713 rows=100 loops=1)
| -> Sort (actual time=246.645..246.666 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| Sort Method: external sort Disk: 5280kB
| -> Append (actual time=0.017..52.608 rows=200000 loops=1)
| -> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
| -> Seq Scan on cu11 (actual time=0.015..17.047 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.007..15.474 rows=100000 loops=1)
| Total runtime: 247.854 ms
Fairly common query. It will get following plan with this patch.
| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
| QUERY PLAN
| -----------------------------------------------------------------------------
| Limit (actual time=0.108..0.350 rows=100 loops=1)
| -> Unique (actual time=0.104..0.329 rows=100 loops=1)
| -> Merge Append (actual time=0.103..0.214 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| -> Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1)
| -> Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1)
| -> Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
| Total runtime: 0.666 ms
The same could be said on union with my another union-pullup
patch. We will get the following result with only the
union-pullup patch.
|=# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 10000;
| QUERY PLAN
|---------------------------------------------------------------------------
| Limit (actual time=474.825..482.326 rows=10000 loops=1)
| -> Unique (actual time=474.824..481.357 rows=10000 loops=1)
| -> Sort (actual time=474.823..477.101 rows=10000 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| Sort Method: external sort Disk: 10568kB
| -> Append (actual time=0.018..99.310 rows=400000 loops=1)
| -> Seq Scan on cu11 (actual time=0.017..16.263 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.006..14.831 rows=100000 loops=1)
| -> Seq Scan on cu21 (actual time=0.006..14.766 rows=100000 loops=1)
| -> Seq Scan on cu22 (actual time=0.006..14.721 rows=100000 loops=1)
| Total runtime: 484.555 ms
This is also a quite common operation but implicit DISTINCT
prevents planner from selecting index scans. (ORDER BY is not
essential but needed because planner does not consult
distinct_pathkeys on planning scans and I've left it as it is.)
The planner gives the following plan with this patch.
| =# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 10000;
| QUERY PLAN
| -------------------------------------------------------------------------
| Limit (actual time=0.197..14.739 rows=10000 loops=1)
| -> Unique (actual time=0.191..13.527 rows=10000 loops=1)
| -> Merge Append (actual time=0.191..7.894 rows=10000 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| -> Index Scan .. on cu11 (actual time=0.054..4.676 rows=10000 loops=1)
| -> Index Scan .. on cu12 (actual time=0.045..0.045 rows=1 loops=1)
| -> Index Scan .. on cu21 (actual time=0.044..0.044 rows=1 loops=1)
| -> Index Scan .. on cu22 (actual time=0.043..0.043 rows=1 loops=1)
| Total runtime: 15.872 ms
This seems for me worth enough to do.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-01-08 04:46:26 | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Previous Message | Dilip kumar | 2014-01-08 04:02:34 | TODO: machine-readable pg_controldata |