Re: Bad Query Plan with Range Query

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Mark Williams" <mark(dot)williams(at)jivesoftware(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bad Query Plan with Range Query
Date: 2011-04-15 17:54:18
Message-ID: 4DA83FFA020000250003C89D@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mark Williams <mark(dot)williams(at)jivesoftware(dot)com> wrote:

> explain SELECT min(messageID) FROM Message
> WHERE modificationDate >= 1302627793988;

> For some reason it is deciding to scan the primary key column of
> the table. This results in scanning the entire table

No, it scans until it finds the first row where modificationDate >=
1302627793988, at which point the scan is done because it's doing an
ascending scan on what you want the min() of. You might have a clue
that the first such row will be ten million rows into the scan, but
the optimizer doesn't know that. It's assuming that rows which meet
that condition are scattered randomly through the primary key range.
It thinks that it will, on average, need to scan 1249 rows to find a
match.

The patch Ken referenced causes the alternative to be assigned a
more accurate (and lower) cost, which tips the scales in favor of
that plan -- at least for the case you've tried; but this seems to
me to be another case related to the correlation of values. It's a
new and different form of it, but it seems at least somewhat
related. It might be a good example for those working on
multi-column statistics to keep in mind.

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Williams 2011-04-15 18:06:51 Re: Bad Query Plan with Range Query
Previous Message Kenneth Marshall 2011-04-15 17:38:05 Re: Bad Query Plan with Range Query