From: | Mike Broers <mbroers(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: plan question - query with order by and limit not choosing index depends on size of limit, table |
Date: | 2011-01-14 16:36:43 |
Message-ID: | AANLkTikQ8p4V=g5heXGK1vg-7NCWAnkydYtybmqjraT+@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Thanks Robert, this is what I was looking for. I will try these suggestions
and follow up if any of them are the silver bullet.
On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote:
> On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers <mbroers(at)gmail(dot)com> wrote:
> > Thanks for the assistance.
> > Here is an explain analyze of the query with the problem limit:
> > production=# explain analyze select * from landing_page.messages where
> > ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY
> > messages.created_at ASC limit 10;
> >
> > QUERY PLAN
> >
> >
> --------------------------------------------------------------------------------------------------------------
> > ------------------------------------------------------------
> > Limit (cost=0.00..2891.06 rows=10 width=1340) (actual
> > time=207922.586..207922.586 rows=0 loops=1)
> > -> Index Scan using idx_landing_page_messages_created_at on messages
> > (cost=0.00..449560.48 rows=1555 widt
> > h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
> > Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
> > Total runtime: 207949.413 ms
> > (4 rows)
>
> You're not the first person to have been bitten by this. The
> optimizer thinks that rows WHERE NOT processed and topic = 'x' are
> reasonably common, so it figures that it can just index scan until it
> finds 10 of them. But when it turns out that there are none at all,
> it ends up having to scan the entire index, which stinks big-time.
>
> The alternative plan is to use a different index to find ALL the
> relevant rows, sort them, and then take the top 10. That would suck
> if there actually were tons of rows like this, but there aren't.
>
> So the root of the problem, in some sense, is that the planner's
> estimate of the selectivity of "NOT processed and topic = 'x'" is not
> very good. Some things to try:
>
> - increase the statistics target for the "processed" and "topic"
> columns even higher
> - put the processed rows in one table and the not processed rows in
> another table
> - do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT
> 10 to try to fool the planner into planning based on the higher, inner
> limit
> - create a partial index on messages (topic) WHERE NOT processed and
> see if the planner will use it
>
> ...Robert
>
From | Date | Subject | |
---|---|---|---|
Next Message | Mladen Gogala | 2011-01-14 16:54:51 | Re: The good, old times |
Previous Message | Vitalii Tymchyshyn | 2011-01-14 11:39:04 | Re: queries with lots of UNIONed relations |