From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order. |
Date: | 2013-12-28 21:56:42 |
Message-ID: | 19001.1388267802@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com> wrote:
>> it is clear for me why t1_b_a_idx is better. The question is: Is
>> postgresql able to see that?
> For a number of reasons I never consider a bulk load complete until
> I run VACUUM FREEZE ANALYZE on the table(s) involved. When I try
> your test case without that, I get the bad index choice. When I
> then run VACUUM FREEZE ANALYZE on the database I get the good index
> choice.
I think that's just chance, because AFAICS the cost estimates are exactly
the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible. What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
index pages are touched for either index. So I think Michael's claim that
the one index is better is at best unproven.
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 loops=1)
Buffers: shared hit=30
-> Index Only Scan using t1_b_a_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.044..0.085 rows=7 loops=1)
Index Cond: ((b = 333333) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[])))
Heap Fetches: 0
Buffers: shared hit=30
Total runtime: 0.174 ms
(7 rows)
regression=# begin; drop index t1_b_a_idx;
BEGIN
DROP INDEX
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Buffers: shared hit=30
-> Index Only Scan using t1_a_b_idx on t1 (cost=0.57..32.10 rows=7 width=0) (actual time=0.039..0.101 rows=7 loops=1)
Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
Heap Fetches: 0
Buffers: shared hit=30
Total runtime: 0.199 ms
(7 rows)
regression=# abort;
ROLLBACK
I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't
attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Kolomeitsev | 2013-12-29 00:52:07 | Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order. |
Previous Message | Gavin Flower | 2013-12-28 21:19:20 | Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order. |