Re: Bad query plan with high-cardinality column

From: Alexander Staubo <alex(at)bengler(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad query plan with high-cardinality column
Date: 2013-02-22 21:31:48
Message-ID: F8FDDA2A2FCB47E8A13F0411BA37311E@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Friday, February 22, 2013 at 21:33 , Tom Lane wrote:
> The reason is that the LIMIT may stop the query before it's scanned all
> of the index. The planner estimates on the assumption that the desired
> rows are roughly uniformly distributed within the created_at index, and
> on that assumption, it looks like this query will stop fairly soon ...
> but evidently, that's wrong. On the other hand, it knows quite well
> that the other plan will require pulling out 5000-some rows and then
> sorting them before it can return anything, so that's not going to be
> exactly instantaneous either.
>
> In this example, I'll bet that conversation_id and created_at are pretty
> strongly correlated, and that most or all of the rows with that specific
> conversation_id are quite far down the created_at ordering, so that the
> search through the index takes a long time to run. OTOH, with another
> conversation_id the same plan might run almost instantaneously.

That's right. So I created a composite index, and not only does this make the plan correct, but the planner now chooses a much more efficient plan than the previous index that indexed only on "conversation_id":

Limit (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 rows=13 loops=1)
Buffers: shared hit=8
-> Index Scan using index_comments_on_conversation_id_and_created_at on comments (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 rows=13 loops=1)
Index Cond: (conversation_id = 3975979)
Buffers: shared hit=8
Total runtime: 0.094 ms

Is this because it can get the value of "created_at" from the index, or is it because it can know that the index is pre-sorted, or both?

Very impressed that Postgres can use a multi-column index for this. I just assumed, wrongly, that it couldn't. I will have to go review my other tables now and see if they can benefit from multi-column indexes.

Thanks!

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alexander Staubo 2013-02-22 21:34:21 Re: Bad query plan with high-cardinality column
Previous Message Kevin Grittner 2013-02-22 20:47:56 Re: Bad query plan with high-cardinality column