Re: plan question - query with order by and limit not choosing index depends on size of limit, table

From: Mike Broers <mbroers(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: plan question - query with order by and limit not choosing index depends on size of limit, table
Date: 2011-01-14 16:36:43
Message-ID: AANLkTikQ8p4V=g5heXGK1vg-7NCWAnkydYtybmqjraT+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Robert, this is what I was looking for. I will try these suggestions
and follow up if any of them are the silver bullet.

On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote:

> On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers <mbroers(at)gmail(dot)com> wrote:
> > Thanks for the assistance.
> > Here is an explain analyze of the query with the problem limit:
> > production=# explain analyze select * from landing_page.messages where
> > ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY
> > messages.created_at ASC limit 10;
> >
> > QUERY PLAN
> >
> >
> --------------------------------------------------------------------------------------------------------------
> > ------------------------------------------------------------
> > Limit (cost=0.00..2891.06 rows=10 width=1340) (actual
> > time=207922.586..207922.586 rows=0 loops=1)
> > -> Index Scan using idx_landing_page_messages_created_at on messages
> > (cost=0.00..449560.48 rows=1555 widt
> > h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
> > Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
> > Total runtime: 207949.413 ms
> > (4 rows)
>
> You're not the first person to have been bitten by this. The
> optimizer thinks that rows WHERE NOT processed and topic = 'x' are
> reasonably common, so it figures that it can just index scan until it
> finds 10 of them. But when it turns out that there are none at all,
> it ends up having to scan the entire index, which stinks big-time.
>
> The alternative plan is to use a different index to find ALL the
> relevant rows, sort them, and then take the top 10. That would suck
> if there actually were tons of rows like this, but there aren't.
>
> So the root of the problem, in some sense, is that the planner's
> estimate of the selectivity of "NOT processed and topic = 'x'" is not
> very good. Some things to try:
>
> - increase the statistics target for the "processed" and "topic"
> columns even higher
> - put the processed rows in one table and the not processed rows in
> another table
> - do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT
> 10 to try to fool the planner into planning based on the higher, inner
> limit
> - create a partial index on messages (topic) WHERE NOT processed and
> see if the planner will use it
>
> ...Robert
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2011-01-14 16:54:51 Re: The good, old times
Previous Message Vitalii Tymchyshyn 2011-01-14 11:39:04 Re: queries with lots of UNIONed relations