Re: Bad Query Plan with Range Query

From: Mark Williams <mark(dot)williams(at)jivesoftware(dot)com>
To:
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bad Query Plan with Range Query
Date: 2011-04-15 18:06:51
Message-ID: 4DA8893B.8050403@jivesoftware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the response guys. There is something else which confuses me.
If I re-write the query like this:

explain SELECT messageID FROM Message WHERE modificationDate >=
1302627793988 ORDER BY modificationDate LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.97 rows=1 width=16)
-> Index Scan using jvmssg_mdate_idx on message
(cost=0.00..3705.59 rows=1249 width=16)
Index Cond: (modificationdate >= 1302627793988::bigint)
(3 rows)

I also get a better plan. However, this is not always the case. On some
other instances we still get a sequential scan on the primary key.

On 04/15/2011 10:54 AM, Kevin Grittner wrote:
> Mark Williams<mark(dot)williams(at)jivesoftware(dot)com> wrote:
>
>> explain SELECT min(messageID) FROM Message
>> WHERE modificationDate>= 1302627793988;
>
>> For some reason it is deciding to scan the primary key column of
>> the table. This results in scanning the entire table
>
> No, it scans until it finds the first row where modificationDate>=
> 1302627793988, at which point the scan is done because it's doing an
> ascending scan on what you want the min() of. You might have a clue
> that the first such row will be ten million rows into the scan, but
> the optimizer doesn't know that. It's assuming that rows which meet
> that condition are scattered randomly through the primary key range.
> It thinks that it will, on average, need to scan 1249 rows to find a
> match.
>
> The patch Ken referenced causes the alternative to be assigned a
> more accurate (and lower) cost, which tips the scales in favor of
> that plan -- at least for the case you've tried; but this seems to
> me to be another case related to the correlation of values. It's a
> new and different form of it, but it seems at least somewhat
> related. It might be a good example for those working on
> multi-column statistics to keep in mind.
>
> -Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-04-15 18:13:26 Re: Bad Query Plan with Range Query
Previous Message Kevin Grittner 2011-04-15 17:54:18 Re: Bad Query Plan with Range Query