Re: Wrong index selection

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Piotr Gasidło <quaker(at)barbara(dot)eu(dot)org>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Wrong index selection
Date: 2014-01-18 00:33:44
Message-ID: 29113.1390005224@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

=?UTF-8?Q?Piotr_Gasid=C5=82o?= <quaker(at)barbara(dot)eu(dot)org> writes:
> [ planner prefers this: ]

> -> Index Scan using cookies2tags_key3 on cookies2tags
> co2ta (cost=0.57..49.34 rows=1 width=57) (actual time=38.339..38.982
> rows=8 loops=1)
> Index Cond: ((co2ta_cl_id = 97) AND (co2ta_ta_id =
> ANY ('{142}'::integer[])))
> Filter: (co2ta_co_id = ANY
> ('{1,123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
> Rows Removed by Filter: 32120

> [ over this: ]

> -> Index Scan using cookies2tags_co_id_key2 on
> cookies2tags co2ta (cost=0.58..45.24 rows=1 width=57) (actual
> time=0.031..0.215 rows=8 loops=1)
> Index Cond: ((co2ta_co_id = ANY
> ('{123567429,123872617,123929118,123930244,123935996,123937156,123944495,123944999,123945469}'::integer[]))
> AND (co2ta_cl_id = 97))
> Filter: (co2ta_ta_id = ANY ('{142}'::integer[]))
> Rows Removed by Filter: 187

Well, as you can see the planner thinks these are going to cost about the
same, but actually the first one fetches a lot of rows that end up getting
rejected by the filter condition. That means the co2ta_ta_id condition
is somewhat redundant given the other two, much more so than the
co2ta_co_id condition is given the other two. If the individual
conditions are estimated about right (have you checked?), then that means
that this is an artifact of cross-column correlation statistics, which
unfortunately Postgres doesn't know anything about. Is there any way of
normalizing the data to reduce the cross-column correlations?

My other advice would be to simplify and reduce the set of indexes ---
IMO someone's gone way overboard with index creation here. It's unlikely
that those indexes are all pulling their weight for their maintenance
costs, and you can reduce problems with choosing the "wrong" index if
that index simply isn't there.

On the other hand, if this is a near-read-only table such that having lots
of indexes is basically free, you could fix the problem by creating an
index on all three columns, which should dominate both of these choices.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message ARCEnergo 2014-01-20 13:55:50 Time of query result delivery
Previous Message Piotr Gasidło 2014-01-17 22:57:54 Wrong index selection