Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Date: 2013-12-28 21:56:42
Message-ID: 19001.1388267802@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com> wrote:
>> it is clear for me why t1_b_a_idx is better. The question is: Is
>> postgresql able to see that?

> For a number of reasons I never consider a bulk load complete until
> I run VACUUM FREEZE ANALYZE on the table(s) involved. When I try
> your test case without that, I get the bad index choice. When I
> then run VACUUM FREEZE ANALYZE on the database I get the good index
> choice.

I think that's just chance, because AFAICS the cost estimates are exactly
the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible. What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
index pages are touched for either index. So I think Michael's claim that
the one index is better is at best unproven.

regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 loops=1)
Buffers: shared hit=30
-> Index Only Scan using t1_b_a_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.044..0.085 rows=7 loops=1)
Index Cond: ((b = 333333) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[])))
Heap Fetches: 0
Buffers: shared hit=30
Total runtime: 0.174 ms
(7 rows)

regression=# begin; drop index t1_b_a_idx;
BEGIN
DROP INDEX
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Buffers: shared hit=30
-> Index Only Scan using t1_a_b_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.039..0.101 rows=7 loops=1)
Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
Heap Fetches: 0
Buffers: shared hit=30
Total runtime: 0.199 ms
(7 rows)

regression=# abort;
ROLLBACK

I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't
attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Kolomeitsev 2013-12-29 00:52:07 Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Previous Message Gavin Flower 2013-12-28 21:19:20 Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.