Quick Links

Re: Get more from indices.

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, robertmhaas(at)gmail(dot)com
Subject:	Re: Get more from indices.
Date:	2014-01-08 04:22:57
Message-ID:	20140108.132257.201466486.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

tgl> I'm pretty sure that IsA test prevents the optimization from ever firing.

Thank you.

tgl> But aside from hasty typos,

Oops! I've picked up wrong node. It always denies pathkeys extension.

| !IsA(member, Var))

is a mistake of the following.

| !IsA(member->em_expr, Var))

tgl> is there enough left of this optimization to
tgl> be worth the complication?

Certainly.

This patch is intended to be with my another UNION-ALL
optimization pathce at first. It does very much with
UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned
tables)-DISTINCT-ORDERBY-LIMIT.

===== test tables
create table pu1 (a int not null, b int not null, c int, d text);
create unique index i_pu1_ab on pu1 (a, b);
create table cu11 (like pu1 including all) inherits (pu1);
create table cu12 (like pu1 including all) inherits (pu1);
create table pu2 (like pu1 including all);
create table cu21 (like pu2 including all) inherits (pu2);
create table cu22 (like pu2 including all) inherits (pu2);
insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from generate_series(000000, 099999) a);
insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from generate_series(100000, 199999) a);
insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from generate_series(200000, 299999) a);
insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from generate_series(300000, 399999) a);
====

Following example is an implicit union derived from partitioned
tables created as above. Limit is added to highlight the effect.

9.4dev with no patch,

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
| QUERY PLAN
| ----------------------------------------------------------------------------
| Limit (actual time=246.653..246.730 rows=100 loops=1)
| -> Unique (actual time=246.646..246.713 rows=100 loops=1)
| -> Sort (actual time=246.645..246.666 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| Sort Method: external sort Disk: 5280kB
| -> Append (actual time=0.017..52.608 rows=200000 loops=1)
| -> Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
| -> Seq Scan on cu11 (actual time=0.015..17.047 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.007..15.474 rows=100000 loops=1)
| Total runtime: 247.854 ms

Fairly common query. It will get following plan with this patch.

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b limit 100;
| QUERY PLAN
| -----------------------------------------------------------------------------
| Limit (actual time=0.108..0.350 rows=100 loops=1)
| -> Unique (actual time=0.104..0.329 rows=100 loops=1)
| -> Merge Append (actual time=0.103..0.214 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| -> Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1)
| -> Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1)
| -> Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
| Total runtime: 0.666 ms

The same could be said on union with my another union-pullup
patch. We will get the following result with only the
union-pullup patch.

|=# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 10000;
| QUERY PLAN
|---------------------------------------------------------------------------
| Limit (actual time=474.825..482.326 rows=10000 loops=1)
| -> Unique (actual time=474.824..481.357 rows=10000 loops=1)
| -> Sort (actual time=474.823..477.101 rows=10000 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| Sort Method: external sort Disk: 10568kB
| -> Append (actual time=0.018..99.310 rows=400000 loops=1)
| -> Seq Scan on cu11 (actual time=0.017..16.263 rows=100000 loops=1)
| -> Seq Scan on cu12 (actual time=0.006..14.831 rows=100000 loops=1)
| -> Seq Scan on cu21 (actual time=0.006..14.766 rows=100000 loops=1)
| -> Seq Scan on cu22 (actual time=0.006..14.721 rows=100000 loops=1)
| Total runtime: 484.555 ms

This is also a quite common operation but implicit DISTINCT
prevents planner from selecting index scans. (ORDER BY is not
essential but needed because planner does not consult
distinct_pathkeys on planning scans and I've left it as it is.)

The planner gives the following plan with this patch.

| =# explain (analyze on, costs off) select * from cu11 union select * from cu12 union select * from cu21 union select * from cu22 order by a limit 10000;
| QUERY PLAN
| -------------------------------------------------------------------------
| Limit (actual time=0.197..14.739 rows=10000 loops=1)
| -> Unique (actual time=0.191..13.527 rows=10000 loops=1)
| -> Merge Append (actual time=0.191..7.894 rows=10000 loops=1)
| Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
| -> Index Scan .. on cu11 (actual time=0.054..4.676 rows=10000 loops=1)
| -> Index Scan .. on cu12 (actual time=0.045..0.045 rows=1 loops=1)
| -> Index Scan .. on cu21 (actual time=0.044..0.044 rows=1 loops=1)
| -> Index Scan .. on cu22 (actual time=0.043..0.043 rows=1 loops=1)
| Total runtime: 15.872 ms

This seems for me worth enough to do.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: Get more from indices. at 2014-01-07 21:00:30 from Tom Lane

Responses

Re: Get more from indices. at 2014-01-14 09:08:10 from Kyotaro HORIGUCHI

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2014-01-08 04:46:26	Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message	Dilip kumar	2014-01-08 04:02:34	TODO: machine-readable pg_controldata