Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

From: Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.
Date: 2013-12-27 06:35:06
Message-ID: CAABbzO01zK4Ab=vd3UD1Jd++-Hju9bLerzS1+=JnpMsXCLtg8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It seems postgresql is unable to choose correct index in such cases.
(my pg version is 9.3.2)

Let's see example:
create table t1 as select a.a, b.b from generate_series(1, 100) a(a),
generate_series(1,500000) b(b);
create index t1_a_idx on t1(a);
create index t1_b_idx on t1(b);
create index t1_a_b_idx on t1(a,b);
create index t1_b_a_idx on t1(b,a);
alter table t1 alter a set statistics 10000;
alter table t1 alter b set statistics 10000;
analyze t1;

test=> explain select count(*) from t1 where a in (1, 9, 17, 26, 35, 41,
50) and b = 333333;
QUERY PLAN

--------------------------------------------------------------------------------------
Aggregate (cost=46.62..46.63 rows=1 width=0)
-> Index Only Scan using t1_a_b_idx on t1 (cost=0.57..46.60 rows=7
width=0)
Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b
= 333333))
(3 rows)

Rows estimation is exact.
But I think using t1_a_b_idx index is not the best choice.
Let's check:
# drop pg and disc buffers/caches
systemctl stop postgresql.service ; echo 3 >/proc/sys/vm/drop_caches ;
systemctl start postgresql.service ; sleep 2
# warm up pg and check the plan
{ echo '\\timing' && echo "explain select count(*) from t1 where a in (1,
9, 17, 26, 35, 41, 50) and b = 333333;" ; } | psql test
# do the benchmark
{ echo '\\timing' && echo "select count(*) from t1 where a in (1, 9, 17,
26, 35, 41, 50) and b = 333333;" ; } | psql test

I have 200-210ms timing for the last query and t1_a_b_idx is used always. I
checked several times.

Ok. Now 'drop index t1_a_b_idx;' and check again.
Pg now uses t1_b_a_idx and I have 90-100ms for that control query. This is
much better.

I took pageinspect contrib module, learnt btree structure and it is clear
for me
why t1_b_a_idx is better. The question is: Is postgresql able to see that?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message GR Vishwanath 2013-12-27 15:55:44 Does fsync on/off for wal AND Checkpoint?
Previous Message Ronaldo Maia 2013-12-26 19:49:29 Possible regression (slow query on 9.2/9.3 when compared to 9.1)