Re: Index not being used in sorting of simple table

From: Robins <tharakan(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: "Paul Smith" <paullocal(at)pscs(dot)co(dot)uk>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Index not being used in sorting of simple table
Date: 2007-05-06 12:37:45
Message-ID: 36af4bed0705060537n38996591u1cab44fa067a4fea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

Paul:
Quite like Tom, I too think that its the first query that is more intriguing
than the second one. (The expected cost for the indexscan (A) query is 4x
the expected time for the 'Sequential Scan' (B) query !!)

Could you provide with the (complete output of) EXPLAIN ANALYZE times for
both of these queries ? That would tell how much time it actually took as
compared to the expected times.

Tom:
There is one thing though, that I couldn't really understand. Considering
that A's correlation in pg_stats being very high compared to B, isn't it 'a
better candidate' for a sequential scan as compared to B in this scenario ?
Or is it the other way around ?

Regards,
Robins Tharakan

On 5/4/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Paul Smith <paullocal(at)pscs(dot)co(dot)uk> writes:
> > If I do
> > EXPLAIN SELECT * FROM x ORDER BY a;
> > it says
> > Index Scan using y on x (cost=0.00..2903824.15 rows=1508057
> width=152)
>
> > That's what I'd expect
>
> > However, if I do
> > EXPLAIN SELECT * FROM x ORDER BY b;
> > it says
> > Sort (cost=711557.34..715327.48 rows=1508057
> > width=152)
> > Sort Key:
> > b
> > -> Seq Scan on x (cost=0.00..53203.57 rows=1508057 width=152)
>
> > Why doesn't it use the other index?
>
> You have the question backwards: given those cost estimates, I'd wonder
> why it doesn't do a sort in both cases. Offhand I think the sort cost
> estimate is pretty much independent of the data itself, so it should
> have come out with a cost near 715327 for sorting on A, so why's it
> using an indexscan that costs more than 4x as much?
>
> The indexscan cost estimate varies quite a bit depending on the
> estimated correlation (physical ordering) of the column, so seeing
> it do different things in the two cases isn't surprising in itself.
> But I think there's some relevant factor you've left out of the example.
>
> As for getting the estimates more in line with reality, you probably
> need to play with random_page_cost and/or effective_cache_size.
>
> > Any ideas? To me it looks like a bug in the planner. I can't think of
> > any logical reason not to use an existing index to retrieve a sorted
> > listing of the data.
>
> Sorry, but using a forced sort frequently *is* faster than a full-table
> indexscan. It all depends on how much locality of reference there is,
> ie how well the index order and physical table order match up. The
> planner's statistical correlation estimate and cost parameters may be
> far enough off to make it pick the wrong choice, but it's not a bug that
> it considers the options.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Robins

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-05-07 00:35:56 Re: Index not being used in sorting of simple table
Previous Message Yudhvir Singh Sidhu 2007-05-06 09:35:30 Re: How to Find Cause of Long Vacuum Times - NOOB Question