Quick Links

Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc:	Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Date:	2013-12-28 21:56:42
Message-ID:	19001.1388267802@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com> wrote:
>> it is clear for me why t1_b_a_idx is better. The question is: Is
>> postgresql able to see that?

> For a number of reasons I never consider a bulk load complete until
> I run VACUUM FREEZE ANALYZE on the table(s) involved. When I try
> your test case without that, I get the bad index choice. When I
> then run VACUUM FREEZE ANALYZE on the database I get the good index
> choice.

I think that's just chance, because AFAICS the cost estimates are exactly
the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible. What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
index pages are touched for either index. So I think Michael's claim that
the one index is better is at best unproven.

regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 loops=1)
Buffers: shared hit=30
-> Index Only Scan using t1_b_a_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.044..0.085 rows=7 loops=1)
Index Cond: ((b = 333333) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[])))
Heap Fetches: 0
Buffers: shared hit=30
Total runtime: 0.174 ms
(7 rows)

regression=# begin; drop index t1_b_a_idx;
BEGIN
DROP INDEX
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Buffers: shared hit=30
-> Index Only Scan using t1_a_b_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.039..0.101 rows=7 loops=1)
Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
Heap Fetches: 0
Buffers: shared hit=30
Total runtime: 0.199 ms
(7 rows)

regression=# abort;
ROLLBACK

I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't
attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.

regards, tom lane

In response to

Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order. at 2013-12-28 21:03:28 from Kevin Grittner

Responses

Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order. at 2013-12-29 00:52:07 from Michael Kolomeitsev

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Michael Kolomeitsev	2013-12-29 00:52:07	Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Previous Message	Gavin Flower	2013-12-28 21:19:20	Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.